Alibaba logo
Alibaba
Open Weights

Qwen3 14B (Non-reasoning)

Released Apr 2025

Intelligence
#335
Coding
#256
Math
#120
Context33K
Parameters14.8B

Qwen3 14B (Non-reasoning) refers to the general-purpose operational mode of the Qwen3-14B model, a dense large language model developed by Alibaba Cloud and released on April 29, 2025. As part of the third-generation Qwen family, the model is built with 14.8 billion total parameters (13.2 billion non-embedding parameters) and is designed to provide a balance between high-level intelligence and computational efficiency. It was pre-trained on a massive corpus of 36 trillion tokens across 119 languages and dialects.\n\n## Architecture and Capabilities\nThe model utilizes a dense, causal transformer architecture featuring 40 layers and Grouped Query Attention (GQA) with 40 query heads and 8 key/value heads. This configuration optimizes memory usage and inference speed, particularly during long-context processing. It supports a native context window of 32,768 tokens, which can be extended to 131,072 tokens using YaRN (Yet another RoPE N) scaling. Other architectural refinements include SwiGLU activation, RMSNorm with pre-normalization, and the implementation of qk layernorm for enhanced training stability.\n\n## Hybrid Reasoning and Performance\nA defining feature of the Qwen3 series is its hybrid reasoning architecture, which allows a single model to switch between a "thinking" mode for complex logical tasks and a "non-thinking" (non-reasoning) mode for standard dialogue. In its non-reasoning configuration, the model is optimized for lower latency and higher throughput, making it suitable for creative writing, multi-turn chat, and multilingual instruction following. This mode prioritizes immediate response generation by bypassing the explicit stepwise chain-of-thought processing used for advanced mathematics and coding logic. Performance benchmarks indicate that the 14B model in this mode excels in general question-answering (MMLU-Pro) and agentic tool-use tasks compared to earlier generations and similarly sized open-source models.

Rankings & Comparison