Gemini 2.0 Flash-Lite is a multimodal language model developed by Google, optimized for high cost-efficiency and low latency. Positioned as a lightweight variant within the Gemini 2.0 family, it is designed for high-volume, real-time applications where speed and operational cost are primary constraints. The model was introduced to provide a performance profile that exceeds previous generations of Flash models while maintaining a similar resource footprint.
The model supports a 1 million token context window and features native multimodal capabilities, allowing it to process and reason across text, images, audio, and video inputs. It is specifically optimized for large-scale text generation tasks and provides a significantly faster time-to-first-token (TTFT) compared to its predecessors.
Benchmark data from Google indicates that Gemini 2.0 Flash-Lite outperforms Gemini 1.5 Flash on various metrics, including MMLU Pro and Bird SQL programming. Despite these performance gains, the model is designed to operate at the same price point as earlier Flash variants, making it a specialized choice for developers focusing on balancing quality with high throughput.