DeepSeek logo
DeepSeek
Open Weights

DeepSeek R1 Distill Qwen 32B

Released Jan 2025

Intelligence
#248
Math
#111
Context128K
Parameters32B

DeepSeek-R1-Distill-Qwen-32B is a reasoning model developed by distilling knowledge from the larger DeepSeek-R1 into the Qwen2.5-32B base architecture. It is designed to provide high-level logical reasoning, mathematical problem-solving, and coding capabilities in a more computationally efficient format than the original Mixture-of-Experts (MoE) R1 model.

The model was trained using a supervised fine-tuning (SFT) pipeline on approximately 800,000 high-quality reasoning samples generated by DeepSeek-R1. This distillation process allows the student model to inherit the chain-of-thought (CoT) and self-reflection behaviors of its teacher while maintaining a dense transformer architecture for simpler deployment and faster inference.

Technical Capabilities

In performance evaluations, DeepSeek-R1-Distill-Qwen-32B achieves results comparable to specialized reasoning models on complex benchmarks. It demonstrates strong proficiency in competitive programming and advanced mathematics, scoring significantly on the AIME 2024 and MATH-500 datasets. The model also features a context window of up to 128,000 tokens, supporting long-sequence analytical tasks.

Unlike the base R1 model which relies primarily on large-scale reinforcement learning, this distilled version focuses on knowledge transfer to a smaller dense architecture. It utilizes a standard decoder-only transformer structure with Rotary Position Embeddings (RoPE) and is optimized for memory efficiency and throughput compared to its larger counterparts.

Rankings & Comparison