GLM-4.7 (Non-reasoning) is a flagship large language model developed by Z AI, released on December 22, 2025. It utilizes a Mixture-of-Experts (MoE) architecture with a total of 358 billion parameters, with approximately 32 billion active during inference. This variant is optimized for direct responses, offering the same knowledge base and core capabilities as its reasoning-enabled counterpart while reducing latency for general tasks and standard coding outputs.
The model supports a 200,000-token context window, enabling the analysis of large codebases and complex technical documentation. It is specifically engineered for agentic workflows, providing stable performance in terminal-based automation and multi-language software engineering. Performance highlights include a 73.8% score on SWE-bench Verified and high accuracy on mathematical benchmarks like AIME 2025.
Additional features of the GLM-4.7 series include Preserved Thinking and Turn-level Thinking, which allow the model to maintain state across multi-turn interactions. This non-reasoning version is intended for applications where speed and direct execution are prioritized, such as real-time technical support, content generation, and structured data extraction.