Alibaba logo
Alibaba
Open Weights

Qwen3.5 35B A3B (Non-reasoning)

Released Feb 2026

Intelligence
#113
Coding
#189
Context262K
Parameters35B (3B active)

Qwen3.5 35B A3B is an efficient multimodal language model developed by Alibaba's Qwen team. It utilizes a sophisticated hybrid architecture that combines Gated Delta Networks (a form of linear attention) with a sparse Mixture-of-Experts (MoE) framework. This design allows the model to maintain the extensive knowledge base and parametric breadth of a 35-billion-parameter model while only activating approximately 3 billion parameters per token during inference, significantly reducing computational overhead and increasing throughput.

As a native vision-language foundation, the model employs early fusion training on trillions of multimodal tokens. This approach enables unified understanding across text and visual inputs, achieving high performance in tasks such as visual reasoning, document parsing, and agentic workflows. It supports a native context window of 262,144 tokens, which is extensible to over 1 million tokens, making it suitable for processing extensive codebases and long-form documents.

Architecture and Efficiency

The model's architecture is characterized by its high expert count, featuring 256 total experts with 8 routed and 1 shared expert active per token. The integration of Gated Delta Networks allows for near-constant memory usage and flat throughput scaling even as the context length increases, a significant improvement over traditional pure-attention models. This efficiency allows the model to run effectively on consumer-grade hardware with 24GB of VRAM when using standard quantization methods.

Key Capabilities

  • Multilingual Support: Trained to support over 201 languages and dialects, ensuring nuanced cultural understanding and global accessibility.
  • Agentic Performance: Optimized for tool use and complex multi-step reasoning, frequently outperforming much larger dense models in coding and autonomous agent benchmarks.
  • Instruction Following: The instruct-tuned version (Non-reasoning) is designed for direct interaction and high-fidelity adherence to user prompts without the specialized chain-of-thought verbosity found in specialized "Thinking" variants.

Rankings & Comparison