OpenVoice v2 by OpenVoice: Benchmarks, Rankings & Model Details

OpenVoice v2 is an instant voice cloning model developed by MyShell in collaboration with researchers from MIT and Tsinghua University. As an evolution of the original OpenVoice framework, v2 introduced significant improvements in audio quality and expanded native support for six languages: English, Spanish, French, Chinese, Japanese, and Korean. The model is designed to replicate a speaker's unique voice using only a short audio snippet without requiring extensive fine-tuning.

The system utilizes a decoupled architecture that separates the components of speech. It consists of a base speaker text-to-speech (TTS) model, which manages language, rhythm, and style, and a separate tone color converter that extracts and applies the unique identity of the reference voice. This modular design enables zero-shot cross-lingual voice cloning, allowing a voice to be cloned and used in languages that were not necessarily present in the original training data for that specific speaker.

Beyond basic replication, OpenVoice v2 provides granular control over various speech parameters. Users can manipulate attributes such as emotion, accent, intonation, and pauses independently of the cloned voice identity. Released under the MIT License, the model weights and source code are available for both research and commercial applications.

OpenVoice v2

Explore AI Studio

Rankings & Comparison

OpenVoice v2

Explore AI Studio

Rankings & Comparison