DeepSeek V3.1 Terminus is a large-scale hybrid language model that unifies general-purpose conversational capabilities and advanced reasoning within a single architecture. Released in September 2025 as an iterative update to the DeepSeek-V3 family, the Terminus version specifically addresses user-reported issues regarding language consistency—minimizing Chinese-English mixing—and significantly enhances the performance of autonomous agents. It utilizes a Mixture-of-Experts (MoE) architecture with 671 billion total parameters, of which 37 billion are activated per token.
The model features a dual-mode system that allows users to toggle between a thinking mode (Reasoning) and a non-thinking mode via chat templates. The reasoning configuration employs chain-of-thought processing to handle complex logical, mathematical, and programming tasks, while the non-thinking mode provides direct, lower-latency responses for general queries. This hybrid approach enables the model to achieve performance comparable to specialized reasoning models like DeepSeek-R1 while maintaining higher response speeds.
Key technical highlights include support for a 128K context window and the use of FP8 microscaling for efficient training and inference. DeepSeek V3.1 Terminus demonstrates substantial improvements in agentic benchmarks, particularly in tool-usage and multi-step reasoning scenarios such as SWE-bench and Terminal-bench. The model weights are released under the MIT License, supporting both research and commercial applications.