Qwen2.5-72B-Instruct is a large-scale language model developed by Alibaba's Qwen team. As the flagship instruction-tuned variant in the Qwen2.5 series, it features 72.7 billion parameters and is built on a dense decoder-only Transformer architecture. It incorporates advanced techniques such as Rotary Positional Embedding (RoPE), SwiGLU activation, and Grouped Query Attention (GQA) to enhance inference efficiency and performance.
The model was pre-trained on a massive dataset of 18 trillion tokens, providing a significant leap in general knowledge and specialized capabilities compared to its predecessor, Qwen2. It demonstrates marked improvements in mathematics, programming, and logical reasoning. The instruction-tuning process further optimizes the model for high-quality human-like interaction and the ability to follow complex system prompts and multi-step instructions.
Equipped with a context window of up to 128,000 tokens, Qwen2.5-72B-Instruct can process extensive documents and generate responses of up to 8,000 tokens. It offers robust multilingual support for over 29 languages and excels at understanding structured data, such as tables, and generating structured outputs like JSON. The model is released under the Qwen Research License.