Qwen3.6 35B A3B (Reasoning) by Alibaba: LLM Benchmarks, Rankings & Specs

Qwen3.6-35B-A3B is a sparse Mixture-of-Experts (MoE) large language model developed by Alibaba's Qwen team. Released as the first open-weight variant of the Qwen3.6 generation, the model features a total of 35 billion parameters, with only 3 billion parameters activated per token during inference. This architecture is designed to provide the knowledge capacity of a mid-sized model with the inference efficiency of a much smaller one, specifically targeting agentic coding and complex reasoning tasks.

The model introduces a specialized architecture composed of 40 layers using a repeating 10-block pattern. Each block integrates three instances of Gated DeltaNet—a linear attention mechanism designed for computational efficiency—followed by a single Gated Attention instance. The MoE component utilizes 256 experts with a routing strategy that activates 8 routed experts and 1 shared expert per token, significantly reducing KV-cache pressure and inference latency compared to dense counterparts.

Capabilities and Performance

Qwen3.6-35B-A3B is natively multimodal, incorporating a vision encoder that enables sophisticated understanding of images and videos. It excels in agentic coding, achieving high scores on benchmarks such as SWE-bench Verified (73.4) and Terminal-Bench 2.0 (51.5). These metrics reflect its ability to perform repository-level reasoning, handle multi-file refactors, and execute tasks within a live terminal environment. The model also demonstrates strong scientific reasoning, scoring 92.7 on AIME 2026 and 86.0 on GPQA Diamond.

Key Features

Thinking Preservation: A novel feature that allows the model to retain reasoning context (chain-of-thought) across historical conversation turns, enhancing consistency in long-running agent workflows.
Thinking Mode: By default, the model generates reasoning content enclosed within <think> tags before providing a final response. This behavior can be toggled via the enable_thinking parameter in supported APIs.
Extended Context: Supports a native context window of 262,144 tokens, which can be extended up to 1,010,000 tokens using RoPE scaling techniques like YaRN.

For optimal results in agentic tasks, it is recommended to maintain a context length of at least 128K tokens to ensure the "thinking" capabilities remain stable. The model is released under the Apache 2.0 license, permitting broad commercial and research use.

Capabilities and Performance

Key Features

Thinking Preservation: A novel feature that allows the model to retain reasoning context (chain-of-thought) across historical conversation turns, enhancing consistency in long-running agent workflows.

Thinking Mode: By default, the model generates reasoning content enclosed within <think> tags before providing a final response. This behavior can be toggled via the enable_thinking parameter in supported APIs.

Extended Context: Supports a native context window of 262,144 tokens, which can be extended up to 1,010,000 tokens using RoPE scaling techniques like YaRN.

Qwen3.6 35B A3B (Reasoning)

Capabilities and Performance

Key Features

Explore AI Studio

Rankings & Comparison

Qwen3.6 35B A3B (Reasoning)

Capabilities and Performance

Key Features

Explore AI Studio

Rankings & Comparison