Amazon logo
Amazon

Polly Generative

Released May 2024

AA Arena
#33
Parameters1B

Amazon Polly Generative is a text-to-speech (TTS) engine launched by Amazon Web Services (AWS) in May 2024. It represents the most advanced tier of voice synthesis within the Polly service, following the standard, neural, and long-form iterations. The engine is designed to produce highly expressive, emotionally adept, and human-like voices that mimic natural conversational patterns.

The model architecture is built upon the Big Adaptive Streamable TTS with Emergent abilities (BASE) research. It utilizes a 1-billion-parameter autoregressive Transformer that converts raw text into discrete "speechcodes," followed by a convolution-based decoder that transforms these codes into audio waveforms in a streamable manner. This large-scale training allows the model to exhibit emergent abilities similar to large language models, such as sophisticated handling of context-dependent prosody, pausing, and the pronunciation of complex or foreign terms.

Key capabilities of the Generative engine include the ability to render speech with assertive and colloquial tones, making it suitable for interactive voice assistants and customer service applications. It supports a growing library of voices and includes polyglot features that allow a single voice to maintain a consistent vocal identity while speaking multiple languages natively. The engine is optimized for low-latency synthesis to support real-time conversational AI workflows.

Rankings & Comparison