TTS-1 is a text-to-speech model developed by OpenAI, specifically optimized for real-time applications where low latency is a primary requirement. Introduced at OpenAI's inaugural DevDay in November 2023, the model is designed to convert text into natural-sounding spoken audio. It is the standard performance version of OpenAI's speech synthesis technology, contrasting with TTS-1-HD, which is optimized for higher audio quality at the expense of higher latency.
The model supports a variety of preset voices, including Alloy, Echo, Fable, Onyx, Nova, and Shimmer, each possessing distinct tonal characteristics. It also features multilingual capabilities, allowing for the generation of audio in numerous languages based on text input. TTS-1 provides support for multiple output formats such as MP3, Opus, AAC, and FLAC, making it suitable for streaming and diverse content delivery platforms.