Alibaba logo
Alibaba
Open Weights

Qwen3.5 122B A10B (Non-reasoning)

Released Feb 2026

Intelligence
#78
Coding
#79
Context262K
Parameters122B

Qwen3.5-122B-A10B is a large-scale multimodal Mixture-of-Experts (MoE) language model developed by Alibaba's Qwen team. Released as part of the Qwen3.5 series, it is designed for native multimodal applications, supporting text, image, and video inputs within a single unified architecture. This specific model is categorized as a "non-reasoning" or standard instruct version, optimized for high throughput and direct response generation rather than the extended chain-of-thought (thinking) modes found in specialized reasoning variants.

Architecture and Efficiency

The model utilizes a hybrid architecture that combines Gated Delta Networks (a linear attention mechanism) with a sparse Mixture-of-Experts (MoE) structure. It possesses 122 billion total parameters, but only approximately 10 billion are activated per forward pass (indicated by the "A10B" suffix). This design allows the model to maintain the performance of a high-parameter foundation model while operating with the inference efficiency and latency of a much smaller model. The integration of Gated DeltaNet helps manage memory overhead at extreme sequence lengths, replacing traditional quadratic attention in several layers.

Capabilities and Context

Qwen3.5-122B-A10B supports a native context window of 262,144 tokens, which can be extended to over one million tokens via scaling techniques. It was trained using an early-fusion multimodal approach on trillions of tokens, achieving high proficiency in tasks such as visual understanding, long-document analysis, and complex tool-calling across 201 supported languages. The model is particularly suited for agentic workflows where low-latency, multimodal reasoning, and broad linguistic coverage are essential requirements.

Rankings & Comparison