Z AI logo
Z AI
Open Weights

GLM-4.5V (Non-reasoning)

Released Aug 2025

GLM-4.5V is a large-scale multimodal vision-language model developed by Zhipu AI (Z.ai). It is built upon the GLM-4.5-Air architecture, utilizing a Mixture-of-Experts (MoE) design with 106 billion total parameters, of which approximately 12 billion are active per token. The model is designed to natively process text, images, and video, providing high-performance capabilities for visual reasoning and document understanding.

A defining feature of the GLM-4.5 series is its dual execution capability: Thinking Mode for deep, step-by-step chain-of-thought (CoT) reasoning, and Non-thinking Mode (the non-reasoning version) for rapid, direct responses. The non-reasoning configuration is optimized for speed and token efficiency, making it suitable for standard VLM tasks such as image captioning, basic OCR, and immediate visual question answering where extensive deliberation is not required.

Technically, GLM-4.5V incorporates innovations including 3D Rotary Positional Encoding (3D-RoPE) to enhance spatial awareness and a 3D convolutional vision encoder for efficient video analysis. It supports a multimodal context window of up to 64,000 tokens, enabling the interpretation of long videos and complex, multi-page documents. The model has demonstrated competitive performance across benchmarks for GUI automation, chart parsing, and precise visual grounding.

Rankings & Comparison