GLM-4.6 is an open-weight large language model developed by Zhipu AI (branded as Z.ai). It is built on a Mixture-of-Experts (MoE) architecture with 355 billion total parameters, of which approximately 32 billion are active per forward pass. The model is specifically optimized for complex reasoning, software engineering, and autonomous agent tasks.
A key architectural feature of the model is its Thinking Mode, which enables multi-step reasoning by generating internal chain-of-thought before providing a final response. This mode utilizes a multi-token prediction head to plan and execute logical solutions, resulting in improved performance on advanced mathematics such as the AIME 2025 competition and difficult logic puzzles.
GLM-4.6 supports a 200,000-token context window, allowing it to analyze entire codebases or long-form documentation in a single pass. In practical coding evaluations, the model has demonstrated near-parity with frontier proprietary systems on benchmarks like LiveCodeBench and SWE-bench Verified. It also natively supports structured function calling and tool invocation, facilitating its use in search-based agents and multi-turn developer workflows.