Multilingual v2 is a generative text-to-speech model developed by ElevenLabs, designed to produce lifelike audio across dozens of languages. Launched as the platform transitioned out of its beta phase, the model serves as a foundational tool for high-fidelity speech synthesis, prioritizing emotional range and contextual awareness over generation speed.
The model supports 29 languages, including English, Chinese, Spanish, Hindi, Portuguese, and Japanese. A core capability of Multilingual v2 is its ability to maintain a speaker's unique voice characteristics and accent across different languages. This facilitates cross-lingual voice cloning and content localization, allowing a single voice profile to be used for global audiences while preserving the original persona's identity.
Designed for professional-grade applications, the model is frequently used for audiobook narration, video game character dialogue, and film dubbing. It utilizes a proprietary architecture that interprets intent and tone within the input text to provide natural prosody. While ElevenLabs has since introduced faster low-latency models, Multilingual v2 remains optimized for scenarios where emotional nuance and stable voice quality are the primary requirements.