DeepSeek R1 Distill Qwen 32B by DeepSeek: LLM Benchmarks, Rankings & Specs

DeepSeek-R1-Distill-Qwen-32B is a reasoning model developed by distilling knowledge from the larger DeepSeek-R1 into the Qwen2.5-32B base architecture. It is designed to provide high-level logical reasoning, mathematical problem-solving, and coding capabilities in a more computationally efficient format than the original Mixture-of-Experts (MoE) R1 model.

The model was trained using a supervised fine-tuning (SFT) pipeline on approximately 800,000 high-quality reasoning samples generated by DeepSeek-R1. This distillation process allows the student model to inherit the chain-of-thought (CoT) and self-reflection behaviors of its teacher while maintaining a dense transformer architecture for simpler deployment and faster inference.

Technical Capabilities

In performance evaluations, DeepSeek-R1-Distill-Qwen-32B achieves results comparable to specialized reasoning models on complex benchmarks. It demonstrates strong proficiency in competitive programming and advanced mathematics, scoring significantly on the AIME 2024 and MATH-500 datasets. The model also features a context window of up to 128,000 tokens, supporting long-sequence analytical tasks.

Unlike the base R1 model which relies primarily on large-scale reinforcement learning, this distilled version focuses on knowledge transfer to a smaller dense architecture. It utilizes a standard decoder-only transformer structure with Rotary Position Embeddings (RoPE) and is optimized for memory efficiency and throughput compared to its larger counterparts.

DeepSeek R1 Distill Qwen 32B

Technical Capabilities

Explore AI Studio

Rankings & Comparison

DeepSeek R1 Distill Qwen 32B

Technical Capabilities

Explore AI Studio

Rankings & Comparison