Alibaba logo
Alibaba
Open Weights

Qwen3 1.7B (Reasoning)

Released Apr 2025

Intelligence
#434
Coding
#360
Math
#162
Context32K
Parameters1.7B

Qwen3 1.7B is a dense transformer-based language model developed by Alibaba Cloud's Qwen Team. Released as part of the Qwen3 series, this model is designed to provide high-performance reasoning in a compact form factor. It is trained on approximately 36 trillion tokens of data and supports 119 languages, making it suitable for a wide range of multilingual and cross-domain applications.

Dual-Mode Reasoning

A defining feature of the Qwen3 series is its native support for "dual-mode reasoning." The model can operate in a Thinking Mode, where it generates step-by-step intermediate computations wrapped in <think> tags before providing a final answer. This mode is specifically optimized for complex tasks such as mathematics, logical deduction, and programming. Alternatively, its Non-Thinking Mode provides direct responses for general conversational tasks and simple information retrieval, allowing users to balance intelligence and compute efficiency.

Architecture

The model utilizes a dense causal transformer architecture with 28 layers and a 32,768-token context window. Technical specifications include the use of Grouped-Query Attention (GQA) with 16 query heads and 8 key-value heads to optimize memory usage and inference speed. It incorporates modern architectural components such as SwiGLU activation, Rotary Positional Embeddings (RoPE), and RMSNorm with pre-normalization to maintain stability across its training stages.

Agentic Capabilities

Qwen3 1.7B includes native support for agentic workflows through the Model Context Protocol (MCP) and improved function-calling abilities. It is engineered to be efficient for deployment on edge devices and resource-constrained environments while maintaining competitive performance on benchmarks for logic, STEM, and code generation relative to its parameter scale.

Rankings & Comparison