DeepSeek-V2-Chat is a large-scale Mixture-of-Experts (MoE) language model optimized for conversational AI. It is the chat-tuned version of DeepSeek-V2, designed to provide high-quality responses while maintaining significant computational efficiency during both training and inference. The model supports a context window of up to 128K tokens.
The model's architecture consists of 236 billion total parameters, with only 21 billion parameters activated per token. It introduces two primary technical innovations: Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA uses low-rank compression for the Key-Value (KV) cache to reduce memory bottlenecks, while DeepSeekMoE employs a sparse computation strategy to optimize the feed-forward networks (FFNs).
DeepSeek-V2-Chat was pretrained on a multi-source corpus of 8.1 trillion tokens and underwent further refinement through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). These processes were aimed at enhancing its proficiency in general conversation, complex reasoning, and coding tasks while significantly reducing the costs associated with deployment and generation throughput compared to dense models of similar scale.