DeepSeek-R1-Distill-Qwen-14B is a large language model fine-tuned from the Qwen2.5-14B architecture using distillation techniques. Developed by DeepSeek, it is part of a series of models designed to transfer the advanced reasoning capabilities of the 671B DeepSeek-R1 model into smaller, more efficient dense architectures. This distillation process allows the 14B model to provide high-level reasoning performance without the massive computational requirements of its progenitor.
The model was trained on 800,000 high-quality reasoning samples generated by the original DeepSeek-R1. This specialized training enables the model to utilize Chain-of-Thought (CoT) reasoning, significantly enhancing its ability to handle complex, multi-step tasks such as mathematical problem-solving, logic puzzles, and competitive programming.
In terms of performance, the 14B distilled variant is optimized for a balance between speed and accuracy. It maintains the core features of the Qwen 2.5 series, including a 128k token context window, while exhibiting marked improvements in benchmarks like AIME 2024 and MATH-500. It is primarily used for tasks requiring deep logical inference, data analysis, and technical synthesis.