Neuphonic logo
Neuphonic
Open Weights

Neuphonic TTS

Released Oct 2024

AA Arena
#64
Parameters748M

Neuphonic TTS is a speech generation system developed by the London-based startup Neuphonic, designed for ultra-low latency applications. The system achieves speech synthesis with a reported response time of under 25 milliseconds, enabling real-time conversational interactions. It is primarily delivered as an API for enterprise use, but the company also provides specialized versions for on-device deployment at the edge.

The model architecture utilizes a hybrid language model and neural codec approach. It pairs a lightweight transformer-based backbone, such as Qwen2, with a proprietary 24 kHz neural audio codec known as NeuCodec. This design allows the system to generate speech incrementally word-by-word, which significantly reduces the time-to-first-audio compared to traditional batch-processing speech models.

Neuphonic's open-weight offerings, branded as NeuTTS, include the NeuTTS Air and NeuTTS Nano models. These models are capable of zero-shot voice cloning, which allows for the replication of a speaker's voice using approximately three seconds of reference audio. The models are optimized for CPU-based inference and include built-in watermarking to assist in the identification of synthetic content.

Rankings & Comparison