DeepSeek logo
DeepSeek
Open Weights

DeepSeek-Coder-V2

Released Jun 2024

Intelligence
#373
Arena AI
#188
Context128K
Parameters236B

DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) language model specifically optimized for code generation and mathematical reasoning. Released by DeepSeek, it serves as an evolution of the DeepSeek-Coder series and is built upon the DeepSeek-V2 architecture. The model is available in a large-scale configuration with 236 billion total parameters, of which only 21 billion are active per token, as well as a more resource-efficient "Lite" version.

The model was trained on a corpus of 10.2 trillion tokens, incorporating an additional 6 trillion tokens of specialized code and mathematics data compared to the general DeepSeek-V2. This training enables support for 338 programming languages, significantly increasing the language coverage from the 86 supported by its predecessor. It also features an expanded context window of 128K tokens, facilitating the processing of extensive codebases and complex technical documents.

Architecturally, DeepSeek-Coder-V2 employs Multi-head Latent Attention (MLA) and the DeepSeekMoE framework. MLA is designed to reduce the Key-Value (KV) cache footprint, enhancing inference efficiency, while the MoE structure ensures that the model can achieve high-tier performance without the full computational cost of a dense 236B parameter model.

In standardized benchmarks such as HumanEval and MBPP, DeepSeek-Coder-V2 has shown performance comparable to major closed-source models. It is distributed under a permissive license, making the model weights available for both research and commercial applications.

Rankings & Comparison