Ling 2.6 Flash is a high-efficiency language model developed by InclusionAI, an initiative of Ant Group. Officially released on April 22, 2026, the model is built on a sparse Mixture-of-Experts (MoE) architecture featuring 104 billion total parameters, with only 7.4 billion active during any single inference. This design allows the model to deliver reasoning and intelligence levels comparable to much larger dense models while maintaining the speed and cost-effectiveness typical of small-scale networks.
A key focus of Ling 2.6 Flash is "token efficiency," a design philosophy intended to minimize the number of tokens required to complete complex tasks. According to benchmark analysis, the model generates significantly fewer tokens than its peers to reach similar intelligence scores, leading to reduced latency and lower inference costs. It achieved an Intelligence Index of 26 on Artificial Analysis evaluations, ranking at the top of its size class for speed with a stable output rate of approximately 215 tokens per second and peak speeds reaching 340 tokens per second under optimized conditions.
Prior to its official branding, the model was tested in the community under the stealth codename Elephant Alpha. It has been specifically enhanced for agentic workflows, demonstrating state-of-the-art performance on benchmarks such as BFCL-V4 for function calling and SWE-bench Verified for software engineering tasks. The model supports a large context window of 262,144 tokens, making it suitable for long-document analysis and complex multi-step agent interactions.