Google logo
Google

Gemini 2.5 Flash Preview (Non-reasoning)

Released Apr 2025

Gemini 2.5 Flash Preview is an efficient multimodal model developed by Google, designed to balance speed, cost, and intelligence. As part of the Gemini 2.5 series, it features a hybrid reasoning architecture that allows developers to toggle "thinking" capabilities on or off. The non-reasoning configuration (often referred to as "thinking off") focuses on delivering low-latency responses at a reduced cost, maintaining the rapid performance of the Flash-tier lineage while utilizing the upgraded 2.5-series architecture.

The model is optimized for high-throughput applications and is particularly effective for agentic tool use, long-context summarization, and following complex instructions. It supports a context window of 1 million tokens, enabling it to process massive datasets across various modalities including text, code, images, audio, and video. By bypassing the extended internal reasoning process used in its "thinking" mode, the non-reasoning version provides a more direct and cost-efficient output for standard tasks.

In benchmarks, the non-reasoning variant is noted for its high output speed, often exceeding 250 tokens per second. While it shares the same underlying foundation as the reasoning-enabled version, this configuration is intended for workflows where immediate execution and token economy are prioritized over the enhanced logical deduction provided by dedicated reasoning compute.

Rankings & Comparison