GLM-4.7-Flash (Non-reasoning) by Z AI: LLM Benchmarks, Rankings & Specs

GLM-4.7-Flash is a lightweight, high-speed large language model developed by Zhipu AI (Z.ai). It is designed as a cost-efficient variant of the GLM-4.7 series, optimized for low-latency inference while maintaining strong performance in bilingual (Chinese and English) communication and complex task execution.

The model utilizes a Mixture-of-Experts (MoE) architecture, featuring approximately 31 billion total parameters with roughly 3 billion active parameters per token. This architectural choice allows it to achieve reasoning and coding capabilities comparable to larger dense models while remaining practical for local deployment and high-throughput API applications.

Technical enhancements in the GLM-4.7 series include Interleaved Thinking, which enables the model to process logical steps before generating final responses or calling tools. It also supports Preserved Thinking, a feature that maintains the model's internal reasoning chain across multi-turn interactions to improve consistency and instruction following. GLM-4.7-Flash demonstrates significant proficiency on industry benchmarks, particularly in software engineering (SWE-bench Verified) and mathematical problem-solving (AIME 2025).

GLM-4.7-Flash (Non-reasoning)

Explore AI Studio

Rankings & Comparison