DeepSeek logo
DeepSeek
Open Weights

DeepSeek R1 Distill Llama 70B

Released Jan 2025

Intelligence
#265
Coding
#264
Math
#133
Context128K
Parameters70B

DeepSeek-R1-Distill-Llama-70B is a reasoning-focused language model developed by DeepSeek by distilling reasoning patterns from the flagship DeepSeek-R1 model into Meta's Llama-3.3-70B-Instruct architecture. This model combines the dense transformer structure of the Llama series with the advanced logic and multi-step problem-solving capabilities of the DeepSeek-R1 pipeline.

The model is trained using a large-scale reinforcement learning (RL) framework and fine-tuned on a dataset of 800,000 reasoning samples generated by the original DeepSeek-R1 model. This distillation process allows the smaller 70-billion-parameter model to perform complex reasoning tasks, such as mathematical proof generation and logical deduction, more effectively than standard instruction-tuned models of similar scale.

Performance and Benchmarks

DeepSeek-R1-Distill-Llama-70B demonstrates high proficiency on technical benchmarks, achieving a 70.0% Pass@1 on AIME 2024 and 94.5% on MATH-500. These results indicate a level of mathematical reasoning that rivals much larger models. It also performs strongly in coding tasks, reaching a 57.5% score on LiveCodeBench.

The model supports a context length of 128,000 tokens and is released under the Llama 3.3 Community License. It is intended for use cases requiring deep analytical thinking, specialized technical assistance, and structured data generation where a full-scale Mixture-of-Experts (MoE) model might be too computationally intensive.

Rankings & Comparison