GLM-4.7-Flash (Reasoning) by Z AI: LLM Benchmarks, Rankings & Specs

GLM-4.7-Flash (Reasoning) is a lightweight, high-speed language model developed by Zhipu AI (Z.ai) as part of the GLM-4.7 flagship series. It is built on a Mixture-of-Experts (MoE) Transformer architecture featuring 30 billion total parameters, with approximately 3 billion active parameters per token. The model is specifically optimized for reasoning, coding, and tool-calling tasks, balancing performance with the low latency required for real-time applications and local deployment.

The model introduces advanced reasoning capabilities through a native "thinking" process, which includes interleaved thinking before responses or tool invocations and preserved reasoning across multi-turn dialogues. These features allow the model to maintain context and logic in complex, multi-step agentic workflows. It supports a long context window of up to 200,000 tokens, enabling the analysis of extensive codebases and research datasets.

GLM-4.7-Flash demonstrates strong performance on technical benchmarks, particularly in the coding and mathematical domains. It has achieved high scores on evaluations such as SWE-bench Verified, AIME 2025, and GPQA, competing with larger models in its category. The model is provided as an open-weight release, supporting local inference on consumer-grade hardware.

GLM-4.7-Flash (Reasoning)

Explore AI Studio

Rankings & Comparison

GLM-4.7-Flash (Reasoning)

Explore AI Studio

Rankings & Comparison