DeepSeek logo
DeepSeek
Open Weights

DeepSeek R1 (Jan '25)

Released Jan 2025

Intelligence
#218
Coding
#202
Math
#101
Context128K
Parameters671B

DeepSeek-R1 is a large-scale reasoning model developed by DeepSeek that utilizes reinforcement learning (RL) to achieve performance comparable to leading proprietary reasoning models. It is designed to excel in complex logic, mathematics, and programming tasks by generating an internal Chain-of-Thought (CoT) during inference. This process allows the model to self-correct, reflect, and refine its reasoning steps before providing a final answer.\n\n## Architecture and Parameters\nThe model is built on a Mixture-of-Experts (MoE) architecture with a total of 671 billion parameters, of which approximately 37 billion are active per token. This structure, inherited from the DeepSeek-V3 base, enables high performance while maintaining computational efficiency. It supports a context window of 128,000 tokens and utilizes Multi-Head Latent Attention (MLA) to reduce memory overhead during inference.\n\n## Training and Distillation\nDeepSeek-R1 was developed through a multi-stage process that emphasizes reinforcement learning over traditional supervised fine-tuning. The model was initialized from a "cold-start" dataset and then optimized using Group Relative Policy Optimization (GRPO) to align with human preferences and improve readability. Alongside the primary 671B model, DeepSeek released several distilled versions ranging from 1.5B to 70B parameters. These smaller models are based on Llama and Qwen architectures and were fine-tuned using the reasoning traces of the full DeepSeek-R1 model to bring advanced reasoning capabilities to more accessible hardware.

Rankings & Comparison