MiniMax-M1 is a large-scale, open-weight reasoning model designed for long-context understanding and complex logical tasks. It utilizes a hybrid architecture that combines a Mixture-of-Experts (MoE) framework with a total capacity of 456 billion parameters, of which 45.9 billion are activated per token. The model is specifically optimized for "thinking" or test-time compute scaling, offering reasoning budgets of up to 80,000 tokens to solve intricate problems through internalized reasoning chains.
A key technical innovation of MiniMax-M1 is its hybrid-attention mechanism, which interleaves standard softmax attention with "lightning attention" (linear attention) blocks. This design enables the model to achieve a native context window of 1 million tokens while maintaining high computational efficiency. According to technical reports, this architecture allows the model to consume significantly fewer FLOPs than traditional Transformer models during long-sequence generation, making it suitable for processing entire codebases or extensive technical documentation.
The model was trained using CISPO (Clipped Importance Sampling Policy Optimization), a reinforcement learning algorithm developed by MiniMax to enhance training stability and convergence speed. MiniMax-M1 demonstrates strong capabilities in domains such as mathematical reasoning, software engineering, and multi-step tool utilization, frequently ranking alongside or above established open-weight and proprietary models in long-context benchmarks.