Ring-flash-2.0 is a high-performance reasoning language model developed by InclusionAI, based on the Ling-flash-2.0-base architecture. It employs a highly sparse Mixture-of-Experts (MoE) design with a total of 100 billion parameters, of which 6.1 billion are activated per inference (4.8 billion non-embedding). This architecture is optimized with a 1/32 expert activation ratio and Multi-Task Processing (MTP) layers, enabling it to deliver performance comparable to 40B-level dense models while maintaining high-speed inference of over 200 tokens per second.
The model is trained using a multi-stage pipeline that includes Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) with Verifiable Rewards (RLVR). InclusionAI utilizes a proprietary icepop algorithm to mitigate training instability issues typically associated with reinforcement learning for MoE models. This process allows the model to continuously improve its complex reasoning capabilities across extended training cycles.
Ring-flash-2.0 is specifically tuned for complex reasoning tasks, demonstrating significant breakthroughs in benchmarks such as mathematics (AIME), code generation (LiveCodeBench), and logical reasoning (ARC-Prize). It supports a 128K context window through the use of YaRN extrapolation. While its primary focus is on deep reasoning and technical tasks, the model also maintains competitive performance in creative writing tasks.