Ling-flash-2.0 is a large-scale language model developed by InclusionAI, an artificial intelligence initiative originating from Ant Group. It is built on the Ling 2.0 architecture, which utilizes a highly sparse Mixture-of-Experts (MoE) design to balance high-level reasoning performance with inference efficiency. The model features 100B total parameters, but only 6.1B parameters are activated during inference, allowing it to achieve generation speeds of over 200 tokens per second on specialized hardware.
Architecture and Efficiency
The model employs a 1/32 expert activation ratio and was trained on over 20 trillion tokens of high-quality data. Technical refinements such as Multi-Token Prediction (MTP) layers and YaRN extrapolation enable the model to support a context window of 128K tokens. These optimizations allow Ling-flash-2.0 to match the performance of dense models in the 40B-parameter range while significantly reducing computational overhead and latency.
Capabilities
Ling-flash-2.0 is optimized for diverse tasks including complex logical reasoning, advanced mathematical solving, and code generation. It demonstrates proficiency in specialized areas such as frontend development and creative writing. In benchmarking, the model has been evaluated against both open-weight and proprietary models, showing competitive performance in knowledge-intensive domains like finance and healthcare.