Qwen3.5 122B A10B is a multimodal mixture-of-experts (MoE) language model developed by Alibaba's Qwen team. It is a core part of the Qwen3.5 Medium series, designed to provide frontier-level performance in a more efficient footprint by activating only 10 billion parameters per token. The model is a native vision-language foundation that integrates breakthroughs in multimodal learning, architectural efficiency, and reinforcement learning (RL) at scale to handle complex, agentic workflows.
The model utilizes a sophisticated hybrid architecture that combines Gated Delta Networks (a form of linear attention) with standard sparse Mixture-of-Experts blocks. This design, consisting of 48 layers and 256 total experts (with 8 active and 1 shared expert per token), allows for high-throughput inference and low latency while maintaining deep reasoning capacity. It supports a native context window of 262,144 tokens, which is extensible up to 1,010,000 tokens using techniques like YaRN, enabling the processing of hour-scale videos or massive codebases.
One of the defining features of Qwen3.5 122B A10B is its hybrid reasoning capability, which allows it to switch between a standard "non-thinking" mode and a high-cognitive "Thinking" mode. In the latter, the model utilizes long chain-of-thought (CoT) processes to solve intricate mathematical, coding, and logical problems. This reasoning depth is further enhanced by its native support for 201 languages and dialects, as well as robust function-calling capabilities that allow it to act as an autonomous agent within digital environments.