Qwen3 VL 235B A22B (Reasoning) by Alibaba: LLM Benchmarks, Rankings & Specs

Qwen3 VL 235B A22B (Reasoning) is a flagship multimodal vision-language model developed by Alibaba’s Qwen team. Part of the third-generation Qwen series, it is specifically engineered to bridge the gap between visual perception and complex cognitive reasoning. The model supports interleaved text, image, and video inputs, enabling it to handle diverse tasks from high-resolution document parsing to long-horizon video understanding.

The model utilizes a Mixture-of-Experts (MoE) architecture with 235 billion total parameters and approximately 22 billion active parameters per token. This design allows for high-capacity performance in specialized reasoning domains while maintaining inference efficiency. It features a native context window of 256K tokens, expandable to 1 million, which facilitates the analysis of entire books or hours-long videos with precise temporal grounding.

A defining characteristic of the Reasoning (also referred to as "Thinking") variant is its integration of specialized training for multi-step logical deduction. Through a dedicated thinking mode, the model performs internal chain-of-thought processing to solve complex STEM problems, verify mathematical proofs, and execute visual coding tasks, such as generating web code from hand-drawn sketches. Unlike standard vision-language models, it is optimized to verify its own logic and explore multiple deductive paths.

In addition to reasoning, the model functions as a Visual Agent capable of interacting with computer and mobile graphical user interfaces (GUIs). It can recognize UI elements, understand functional buttons, and invoke external tools to automate workflows. Its spatial perception includes both 2D and 3D grounding, supporting advanced applications in embodied AI and robotic control environments.

Qwen3 VL 235B A22B (Reasoning)

Explore AI Studio

Rankings & Comparison

Qwen3 VL 235B A22B (Reasoning)

Explore AI Studio

Rankings & Comparison