Qwen3 4B (Reasoning) is a dense transformer-based large language model developed by Alibaba Cloud as part of the Qwen3 series. Designed for high-efficiency intelligence, it is notable for its dual-mode reasoning architecture, which allows users to switch between a thinking mode that generates explicit chain-of-thought intermediate steps and a non-thinking mode for faster, direct output. This flexibility is intended to optimize the model for both complex logical deduction and rapid conversational response.
Architecturally, the model comprises 4 billion parameters across 36 layers and utilizes Grouped Query Attention (GQA) to enhance inference throughput. It natively supports a context window of 32,768 tokens, which can be extended to 131,072 tokens via YaRN scaling techniques. The model was pretrained on a massive corpus of 36 trillion tokens—a significant increase over previous generations—incorporating a high density of mathematics, programming, and multilingual data across more than 100 languages.
The model demonstrates competitive performance in mathematical reasoning, code generation, and academic benchmarks, often rivaling the capabilities of much larger dense models. It is optimized for agentic workflows, featuring native support for the Model Context Protocol (MCP) and robust function-calling. The model weights are released under the Apache 2.0 license, facilitating open-source research and commercial integration.