Gemini 3.1 Flash-Lite Preview by Google: LLM Benchmarks, Rankings & Specs

Gemini 3.1 Flash-Lite Preview is a high-efficiency multimodal large language model developed by Google, released on March 3, 2026. Designed as the most cost-efficient and fastest entry in the Gemini 3 series, it is optimized for high-volume, latency-sensitive workloads. The model is intended for developers and enterprises managing large-scale applications such as real-time translation, content moderation, and lightweight agentic tasks, where operational speed and cost per token are critical constraints.

Built on the Gemini 3 Pro architecture, the model maintains a massive 1 million token context window and natively supports text, image, audio, and video inputs. A significant feature of this version is the introduction of adjustable "Thinking Levels," which grant developers granular control over the model's reasoning intensity. This allows for a flexible trade-off between higher-quality reasoning for complex logic and faster, lower-cost responses for straightforward data extraction or classification tasks.

In terms of performance, Gemini 3.1 Flash-Lite achieves notable results on reasoning benchmarks, including 86.9% on GPQA Diamond and 76.8% on MMMU Pro, surpassing several larger models from previous generations. It is engineered for rapid inference, delivering a 2.5x faster Time to First Token (TTFT) and a 45% increase in output throughput compared to Gemini 2.5 Flash. The model is commonly utilized for building responsive real-time experiences, generating user interfaces, and serving as an efficient router for multi-model pipelines.

Gemini 3.1 Flash-Lite Preview

Explore AI Studio

Rankings & Comparison