Qwen-72B-Chat is a large-scale language model developed by Alibaba Cloud, optimized for dialogue and instruction-following tasks. It is the chat-aligned variant of the 72-billion parameter Qwen base model, part of the first generation of the Qwen series. The model was trained on a diverse dataset comprising over 3 trillion tokens, which includes a wide range of web text, professional books, code, and multilingual content.
Architecture and Capabilities
Constructed on a transformer-based decoder-only architecture, Qwen-72B-Chat utilizes a comprehensive vocabulary of over 150,000 tokens. This large vocabulary enhances its proficiency across multiple languages, with a particular emphasis on Chinese and English. The model supports a context window of 32,768 tokens, allowing it to process and maintain coherence over relatively long conversations and documents.
Qwen-72B-Chat is designed for a variety of complex tasks, including common-sense reasoning, mathematical problem-solving, and code generation. It also supports advanced system prompt capabilities, enabling functions such as role-playing, language style transfer, and specific task or behavior setting within a conversational context.