Xiaomi logo
Xiaomi

MiMo-V2.5-Pro (Non-reasoning)

Released Apr 2026

Intelligence
#112
Coding
#67
Context1M
Parameters1.02T

Xiaomi's MiMo-V2.5-Pro is a large-scale Mixture-of-Experts (MoE) language model optimized for complex agentic workflows, software engineering, and long-horizon tasks. Released in April 2026 under the MIT license, it features a total of 1.02 trillion parameters, with approximately 42 billion active parameters per token. The model is designed to handle thousands of tool calls autonomously, demonstrating high reliability in executing multi-hour technical workflows such as full-stack application development and compiler construction.

Technically, the model utilizes a hybrid attention architecture that interleaves local sliding-window attention (SWA) and global attention at a 6:1 ratio. This approach reduces KV-cache overhead by nearly 7x compared to standard global attention, facilitating a 1-million-token context window. To enhance inference speed, MiMo-V2.5-Pro integrates three native Multi-Token Prediction (MTP) modules, which roughly triple output throughput and accelerate the reinforcement learning (RL) rollout process.

The "Non-reasoning" designation, often used in benchmarks, distinguishes this variant as a standard completion and chat model that provides direct, high-speed responses. This is in contrast to "reasoning" models that expend extra tokens on an internal thinking or chain-of-thought phase. Despite this, the model scores highly on agentic benchmarks like ClawEval and SWE-bench Pro, where it is noted for its token efficiency, reportedly requiring 40% to 60% fewer tokens than competing frontier models to solve identical tasks.

Post-training for MiMo-V2.5-Pro involves a three-stage pipeline beginning with supervised fine-tuning (SFT) for instruction adherence, followed by domain-specialized RL for math and coding. The final stage uses Multi-Teacher On-Policy Distillation (MOPD) to refine its agentic behavior and multimodal perception.

Rankings & Comparison