longcat-flash-chat by Meituan: LLM Benchmarks, Rankings & Specs

LongCat-Flash-Chat is a large-scale language model developed by the Chinese technology company Meituan. It is based on a Mixture-of-Experts (MoE) architecture with a total of 560 billion parameters. During inference, the model dynamically activates between 18.6 billion and 31.3 billion parameters (averaging approximately 27 billion) per token to optimize computational efficiency while maintaining high performance levels.

The model utilizes a Shortcut-connected MoE (ScMoE) design, which is engineered to reduce communication overhead and improve throughput during both training and inference. It also incorporates a "zero-computation experts" mechanism that allows for dynamic budget allocation to specific tokens based on their contextual significance. LongCat-Flash-Chat supports a context window of up to 128,000 tokens, enabling the processing of lengthy documents and complex multi-turn dialogues.

Developed primarily for conversational and agent-based tasks, the model demonstrates capabilities in multi-step reasoning, tool use, and coding. It was released under the MIT license, making its weights openly available for commercial and research use. Meituan optimized the model for high-speed inference, reportedly achieving speeds exceeding 100 tokens per second on compatible hardware platforms.

longcat-flash-chat

Explore AI Studio

Rankings & Comparison