Gemini 2.5 Flash Lite TTS is a specialized text-to-speech model developed by Google as part of the Gemini 2.5 model family. It is engineered for high-efficiency and low-latency audio generation, serving as the most cost-effective option within Google's generative speech lineup. The model is designed for high-volume tasks such as real-time conversational agents, customer service automation, and large-scale content narration.
Departing from traditional concatenative or neural TTS systems, Gemini 2.5 Flash Lite TTS offers granular steerability through natural language prompts. Users can specify attributes such as tone, pace, accent, and emotional expression (e.g., whispering or excitement) directly in the input. This capability allows for more expressive and contextually appropriate speech synthesis that can be adapted dynamically for different use cases.
The model supports both single-speaker and multi-speaker synthesis, enabling the creation of complex dialogues and multi-character narratives from a single text prompt. It is part of the broader Gemini-TTS ecosystem, which leverages Google's multimodal architecture to achieve more human-like rhythm and phrasing compared to previous generations of speech models.