Alibaba logo
Alibaba
Open Weights

Qwen3 Coder 480B A35B Instruct

Released Jul 2025

Intelligence
#160
Coding
#127
Math
#159
Context262K
Parameters480B

Qwen3-Coder-480B-A35B-Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Alibaba's Qwen team, specifically engineered for autonomous programming and agentic workflows. It contains a total of 480 billion parameters, with 35 billion active parameters per token. This sparse activation strategy utilizes 160 specialized experts, with 8 experts activated for each inference pass, allowing the model to achieve high performance with the computational footprint of a much smaller dense model.

The model is designed for complex, repository-scale tasks, featuring a native context window of 256,000 tokens that can be extended up to 1 million tokens via YaRN extrapolation. It was trained on 7.5 trillion tokens, including a 70% concentration of code data spanning over 300 programming languages. This extensive training enables the model to handle multi-step problem-solving, such as cross-file reasoning, iterative debugging, and autonomous documentation browsing.

Architecturally, the model consists of 62 transformer layers using Grouped Query Attention (GQA) with 96 query heads and 8 key-value heads. Unlike some other high-parameter models, it is optimized for a "non-thinking" mode, focusing on direct execution and structured function calling for integration with agentic platforms and command-line interfaces. Post-training involved long-horizon reinforcement learning (Agent RL) to improve success rates in real-world software engineering scenarios.

Rankings & Comparison