Qwen2.5 Turbo is a large language model from Alibaba's Qwen team optimized for extremely long contexts. Released in November 2024, the model's primary advancement is its support for a 1 million token context window, a significant expansion from the 128k tokens supported by standard Qwen2.5 models. This allows the model to process approximately 1 million English words or 1.5 million Chinese characters, making it suitable for analyzing extensive codebases, multiple scientific papers, or full-length novels.
To manage the computational demands of such a large context, Qwen2.5 Turbo utilizes sparse attention mechanisms. This architectural optimization reportedly reduces the time to first token for million-token inputs by over four times compared to dense processing methods. In performance evaluations, the model achieved 100% accuracy on 1M-length passkey retrieval tasks and scored highly on the RULER benchmark, surpassing several contemporary proprietary models.
While specialized for long-context tasks, the model maintains competitive performance on short-text benchmarks, with capabilities comparable to GPT-4o-mini. It supports over 29 languages and features enhanced instruction following and structured output generation, particularly for JSON. Qwen2.5 Turbo is provided as an API-based service through Alibaba's model platforms.