Maya Research logo
Maya Research
Open Weights

Maya1

Released Nov 2025

AA Arena
#34
Parameters3B

Maya1 is a 3-billion-parameter text-to-speech (TTS) model developed by Maya Research, designed for expressive voice generation. Built on a Llama-style transformer architecture, the model utilizes the SNAC (Split Nonlinear Audio Codec) to predict hierarchical neural codec tokens rather than raw audio waveforms. This design enables compact audio representation and supports real-time streaming with sub-100ms latency on consumer-grade hardware.

The model distinguishes itself through its natural language "voice design" capability, allowing users to define voice characteristics—such as age, accent, and tone—using descriptive prompts. It supports precise control over emotional expression via more than 20 inline emotion tags, including laughter, whispers, sighs, and cries. Maya1 was trained on an internet-scale English speech corpus and fine-tuned on a proprietary dataset of high-quality studio recordings featuring human-verified descriptions and multi-accent coverage.

Released under the Apache 2.0 license, Maya1 is an open-weights model that generates 24 kHz mono audio. It is compatible with inference frameworks like vLLM and is optimized for deployment on single GPUs with 16GB or more of VRAM. While primarily focused on English, the model supports various regional accents and includes role variations for diverse character generation.

Rankings & Comparison