Alibaba logo
Alibaba
Open Weights

Qwen3 8B (Non-reasoning)

Released Apr 2025

Intelligence
#373
Coding
#319
Math
#199
Context33K
Parameters8.2B

Qwen3 8B is a dense large language model developed by Alibaba's Qwen team, featuring 8.2 billion parameters. It is part of the Qwen3 family, which introduces a hybrid reasoning architecture that allows the model to operate in two distinct states: a high-efficiency non-thinking mode for general-purpose dialogue and a reasoning-heavy thinking mode for complex logical tasks. This specific configuration focuses on the model's capabilities in its non-reasoning state, where it is optimized for low-latency instruction following and conversational tasks.

The model is built on a decoder-only Transformer architecture with 36 layers and utilizes Group-Query Attention (GQA) to enhance inference efficiency. It was pre-trained on a massive corpus of 36 trillion tokens spanning 119 languages, providing significant improvements in multilingual understanding and cross-cultural knowledge over the Qwen2.5 series. The training regimen involves a three-stage process that prioritizes general knowledge acquisition, reasoning skills, and long-context comprehension.

In its non-thinking configuration, Qwen3 8B excels at human preference alignment, creative writing, and multi-turn interactions. It supports a native context window of 32,768 tokens, which can be extended to 131,072 tokens via YaRN, facilitating the processing of long documents. The model is designed to be highly versatile, supporting advanced agentic capabilities and tool integration across a wide range of natural and programming languages.

Rankings & Comparison