DeepSeek logo
DeepSeek
Open Weights

DeepSeek R1 Distill Qwen 1.5B

Released Jan 2025

Intelligence
#406
Math
#204
Context128K
Parameters1.5B

DeepSeek-R1-Distill-Qwen-1.5B is a lightweight reasoning model developed by DeepSeek. It is part of a series of distilled models designed to bring the complex reasoning and chain-of-thought capabilities of the full-scale DeepSeek-R1 to smaller, more efficient architectures. This specific variant is based on the Qwen2.5-1.5B foundation and was fine-tuned using approximately 800,000 reasoning samples curated from the larger R1 model.

Capabilities and Performance

Despite its small parameter count, the model is optimized for logic-heavy tasks such as mathematics, scientific reasoning, and programming. It leverages Supervised Fine-Tuning (SFT) to emulate the reasoning patterns discovered by DeepSeek-R1 through large-scale reinforcement learning. Benchmarks indicate that it achieves competitive results on reasoning-specific datasets like AIME and MATH-500, outperforming some much larger general-purpose models in these specialized domains.

Architecture

The model utilizes a dense transformer architecture inherited from the Qwen2.5 series. It incorporates features such as Grouped-Query Attention (GQA) for efficient inference and Rotary Position Embedding (RoPE) for sequence handling. Its compact size makes it suitable for environments with constrained computational resources where logical inference and step-by-step problem solving are required.

Rankings & Comparison