Stability.ai logo
Stability.ai

Stable Audio 2.0

Released Apr 2024

Stable Audio 2.0 is an advanced music generation model developed by Stability AI, designed to create full-length, high-quality audio tracks. Built on a Diffusion Transformer (DiT) architecture, the model is capable of generating up to three minutes of stereo audio at 44.1kHz from text prompts. It is specifically optimized to produce music with consistent structures, including coherent intros, developments, and outros.

The model introduces audio-to-audio generation, allowing users to upload existing audio samples and transform them using natural language prompts. Other key capabilities include style transfer, sound effect generation, and the creation of cinematic soundscapes. Stable Audio 2.0 was trained exclusively on a licensed dataset from AudioSparx, which included over 800,000 audio files and honored creator opt-out requests to ensure copyright compliance.

Rankings & Comparison