Qwen3-Next-80B-A3B-Thinking is a large-scale mixture-of-experts (MoE) language model developed by Alibaba. It is the reasoning-specialized variant of the Qwen3-Next series, designed to prioritize deep logic and step-by-step problem solving. The model architecture features 80 billion total parameters but utilizes a high-sparsity design that activates only 3 billion parameters per token during inference, significantly reducing computational overhead while maintaining the capacity of a larger system.
The model employs a hybrid attention mechanism that combines Gated DeltaNet and Gated Attention. This architecture allows for efficient ultra-long context modeling, with native support for context windows up to 262,144 tokens and extensibility beyond one million tokens. It incorporates 512 total experts (with 10 routed and 1 shared expert activated per step) and was trained on 15 trillion tokens, focusing on improving both training stability and inference throughput.
As a specialized reasoning model, it is post-trained using reinforcement learning and Generalized Step-level Policy Optimization (GSPO) to generate visible chain-of-thought traces. This "thinking mode" is optimized for complex mathematical proofs, intricate coding tasks, and multi-step logical deductions. The model is designed to automatically produce internal reasoning steps enclosed in tags to provide transparency in its logical process before delivering a final response.