Qwen3.5-397B-A17B is a large-scale multimodal language model released by Alibaba's Qwen team in February 2026. Built on a hybrid Mixture-of-Experts (MoE) architecture, it features 397 billion total parameters with only 17 billion active per token, enabling a balance between high-capacity knowledge and inference efficiency. The model is a native vision-language foundation, trained through early fusion to process text, images, and video within a single unified pipeline.
The model's architecture introduces Gated Delta Networks combined with linear attention mechanisms, which significantly reduces computational overhead compared to traditional quadratic attention. This design allows for a decoding throughput reported to be up to 19 times faster than previous flagship models like Qwen3-Max. By utilizing a high ratio of total-to-active parameters, the model maintains deep specialized knowledge across 201 languages and dialects while remaining tractable for local deployment on high-end consumer hardware using advanced quantization.
Optimized for complex problem-solving, the model includes a native Thinking Mode enabled by large-scale reinforcement learning (RL). When active, the model generates internal chain-of-thought reasoning within specific tags before providing a final response. This reasoning process excels in STEM fields, agentic workflows, and multi-step coding tasks, where it demonstrates improved performance in verifying its own logic and following intricate instructions.
It natively supports a context window of 262,144 tokens, which can be extended to over 1,000,000 tokens for long-context applications such as document analysis or processing up to two hours of video content. The model is released under an Apache 2.0 license, supporting both research and commercial applications.