Gemma 4 31B is a dense, open-weight multimodal language model developed by Google DeepMind, released in April 2026. Built on the architectural research of the Gemini 3 series, it is designed to provide frontier-level reasoning, coding, and agentic capabilities on workstation-class hardware. It is distributed under the permissive Apache 2.0 license, marking a significant shift toward open-source accessibility for Google’s high-capability models.
The model natively supports text, image, and video inputs, processing video as a sequence of frames at up to one frame per second. It employs a hybrid attention mechanism that interleaves local sliding-window attention with global attention layers, utilizing Proportional RoPE (p-RoPE) to maintain performance across its 256,000-token context window. This architecture enables the model to handle complex multimodal tasks such as document parsing, GUI detection, and long-form video analysis.
A key feature of the 31B variant is its configurable Thinking Mode, which enables internal step-by-step reasoning. By including the <|think|> token in the system prompt, the model generates an internal chain-of-thought before providing its final response. This reasoning process is encapsulated within specific tags (<|channel>thought), allowing developers to either expose or hide the model's logic during interaction to improve performance on complex logic and mathematical problems.
Architecturally, Gemma 4 31B incorporates Per-Layer Embeddings (PLE) and a Shared KV Cache to optimize memory efficiency and inference speed. These optimizations allow the model to achieve competitive results on benchmarks like MMLU Pro and AIME 2026, often rivaling the performance of much larger models while remaining deployable on a single high-end consumer GPU.