GLM-4.5-Air is an efficient large-scale language model developed by Zhipu AI (also known as Z.ai), released in July 2025. Positioned as a lightweight yet high-performance variant of the GLM-4.5 flagship series, it is specifically designed to handle agentic tasks, complex reasoning, and coding workflows. The model is open-sourced under the MIT license, facilitating broad commercial and research adoption.
The model utilizes a Mixture-of-Experts (MoE) architecture, featuring 106 billion total parameters with 12 billion active parameters per forward pass. This design allows the model to maintain high reasoning quality while significantly reducing computational overhead during inference. A signature feature is its dual-mode capability, offering a "Thinking Mode" for deep logical analysis and multi-step tool use, and a "Non-Thinking Mode" for instantaneous, low-latency responses.
GLM-4.5-Air supports a context window of 128,000 tokens, making it capable of processing long documents and complex multi-turn dialogues. It was trained on a massive corpus of approximately 22 trillion tokens, with specific optimization for mathematical logic, software development, and autonomous agent behavior.