Ling-mini-2.0 is an open-source Mixture-of-Experts (MoE) large language model developed by InclusionAI, an artificial intelligence research initiative backed by Ant Group. The model is designed for high inference efficiency, utilizing a total of 16 billion parameters with a 1/32 activation ratio, which means only 1.4 billion parameters are activated per token. This architecture allows the model to achieve performance levels comparable to 7–8B dense models while maintaining significantly lower computational requirements.
Architecture and Training
The model was pre-trained on a corpus of over 20 trillion tokens and refined through multi-stage supervised fine-tuning and reinforcement learning. It incorporates several architectural optimizations, including Multi-Tower Prediction (MTP) layers, QK-Norm, and FP8 precision for both training and inference. These features contribute to high-speed text generation, with reported throughput exceeding 300 tokens per second in specific hardware configurations.
Performance and Capabilities
Ling-mini-2.0 supports an extended context window of 128,000 tokens using YaRN extrapolation. It is characterized by strong reasoning capabilities in specialized domains, particularly in mathematics and computer programming. Benchmarks such as AIME 2025 and LiveCodeBench indicate that the model performs competitively against both larger MoE models and dense models in the sub-10B parameter range. Its design is intended to serve as a high-efficiency baseline for MoE research and real-time application deployment.