DeepSeek V4 Flash (Non-reasoning) by DeepSeek: LLM Benchmarks, Rankings & Specs

DeepSeek V4 Flash is a high-efficiency Mixture-of-Experts (MoE) language model optimized for low-latency and high-throughput tasks. Released on April 24, 2026, it serves as the economical counterpart to the larger V4-Pro model within the DeepSeek-V4 preview series. The model features a total of 284 billion parameters, of which only 13 billion are active per token during inference. This architecture allows it to maintain a high level of performance while significantly reducing the computational cost compared to dense models of similar capability.

A core technical innovation of the V4 series is the Hybrid Attention Architecture, which combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA). This mechanism, along with Manifold-Constrained Hyper-Connections (mHC), enables a native 1-million-token context window with drastically reduced KV cache memory requirements. These optimizations allow the model to process extensive documentation and complex codebases with approximately 27% of the inference FLOPs required by previous generations.

While the model supports an optional reasoning mode for complex problem-solving, the non-reasoning (or "Instant") configuration is designed for standard conversational tasks, summarization, and real-time agentic workflows. DeepSeek V4 Flash was trained on over 32 trillion tokens using the Muon optimizer and was developed using a specialized training stack on Huawei Ascend hardware. The model weights are released under the permissive MIT License, facilitating open commercial and research applications.

DeepSeek V4 Flash (Non-reasoning)

Explore AI Studio

Rankings & Comparison

DeepSeek V4 Flash (Non-reasoning)

Explore AI Studio

Rankings & Comparison