Gemini 3.5 Flash (high) is the high-reasoning configuration of Google's Gemini 3.5 Flash model, which was released on May 19, 2026. As the initial entry in the Gemini 3.5 series, it is architected to balance frontier-level intelligence with the extreme operational speed and low-latency profile characteristic of the Flash tier. The "(high)" designation specifically identifies the model's performance when utilizing the maximum thinking level, a feature introduced in this series that allows the system to dedicate additional internal processing to complex queries.
The model is optimized for agentic execution and long-horizon tasks, representing a shift in Google's model development toward autonomous workflows. According to internal benchmarks, Gemini 3.5 Flash (high) outperforms the previous flagship Gemini 3.1 Pro on specialized coding and agentic benchmarks, including Terminal-Bench 2.1 and MCP Atlas. It introduces a "thought preservation" capability that maintains intermediate reasoning context across multi-turn conversations, allowing for greater consistency in complex, iterative tasks such as software debugging or multi-step research.
Technical Capabilities
Gemini 3.5 Flash is natively multimodal, supporting the concurrent processing of text, images, audio, and video. It features a 1 million token context window and a 64,000-token output limit. The model was co-developed alongside Antigravity, Google's agent-first development stack, where it is tuned to handle high-throughput tool use and the coordination of subagents. Despite its increased reasoning depth, the model remains approximately four times faster than comparable frontier models in tokens-per-second output.
Prompting and Thinking Levels
A core innovation of the 3.5 family is the thinking level parameter (minimal, low, medium, and high), which enables users to modulate the trade-off between reasoning quality and latency. The "high" setting is recommended for rigorous challenges such as hard mathematics, advanced coding pipelines, or complex logic. Google suggests that developers maintain default temperature and sampling parameters, as the model's internal reasoning logic is specifically calibrated to these baseline settings for optimal stability.