QwQ-32B-Preview is an experimental large language model developed by the Qwen team at Alibaba Cloud, specifically designed to advance reasoning capabilities. Built on the Qwen2.5-32B architecture, it utilizes large-scale reinforcement learning (RL) to enhance performance in complex domains such as mathematics and programming. The model is a 32-billion parameter causal language model trained to engage in deeper introspection and self-correction during the inference process.
The model demonstrates a "slow thinking" approach, characterized by a curiosity-driven and reflective analysis of problems. This methodology allows it to achieve high scores on specialized benchmarks, including MATH-500, AIME, and GPQA. It features a context window of 32,768 tokens, making it suitable for processing technical documents and multi-step logical tasks.
As a preview release, the model exhibits certain experimental behaviors. It may occasionally switch languages unexpectedly or fall into recursive reasoning loops, leading to lengthy responses that may not always reach a definitive conclusion. It is released under the Apache 2.0 license, facilitating both research and commercial applications.