Google logo
Google
Open Weights

Gemma 4 31B (Non-reasoning)

Released Apr 2026

Intelligence
#100
Coding
#66
Context256K
Parameters31B

Gemma 4 31B is a dense multimodal open-weight model developed by Google DeepMind as the flagship variant of the Gemma 4 family. Designed to provide frontier-level intelligence on consumer-grade hardware and workstations, the 31B model supports text and image inputs while generating text outputs. It achieves high benchmarks in coding, mathematical reasoning, and multilingual tasks across more than 140 languages, leveraging the same research and technology used in the Gemini 3 model series.

Technically, the model features a 31-billion-parameter dense architecture—distinguishing it from the Mixture-of-Experts (MoE) 26B variant—and utilizes a hybrid attention mechanism that interleaves local sliding-window attention with full global attention. It incorporates Proportional RoPE (p-RoPE) to maintain stability across its 256,000-token context window. This architecture is optimized for long-context retrieval and complex document understanding, including native support for variable aspect ratio images and video processed as frame sequences.

Reasoning and Thinking Modes

A key feature of the Gemma 4 family is the configurable "Thinking" mode, which allows the model to perform internal step-by-step reasoning before providing a final response. The Non-reasoning designation refers to the model's operation when this feature is disabled or bypassed. In this mode, the model functions as a traditional instruction-tuned assistant, providing direct answers without the increased latency or token overhead associated with chain-of-thought processing. Users can control this behavior via the system prompt; including the <|think|> token enables reasoning, while its absence causes the model to generate a standard response.

For optimal performance, it is recommended to place multimodal content, such as images or audio-derived tokens, before the text in a prompt. The model also supports configurable visual token budgets (ranging from 70 to 1120 tokens), allowing users to balance visual detail against inference speed for tasks like OCR or high-level image classification.

Rankings & Comparison