Google logo
Google

Standard

Released Mar 2018

Standard refers to the baseline category of speech synthesis and recognition models offered through Google Cloud's audio services. In the domain of text-to-speech (TTS), Standard voices utilize traditional parametric synthesis techniques to generate audio. These models are designed as a cost-efficient alternative to Google's neural-based options like WaveNet or Neural2, providing broad language support and low-latency performance for basic synthesis tasks.

Within the speech-to-text (STT) ecosystem, Standard models comprise a suite of recognition engines optimized for general transcription, such as the default, command_and_search, and video models. These engines are tailored for specific use cases, including voice-activated commands and long-form video subtitling. While they do not incorporate the advanced sequence-to-sequence neural architectures found in newer models like Chirp or the Universal Speech Model (USM), they serve as the foundational technology for high-volume, general-purpose transcription.

These models are integrated across the Google ecosystem, powering legacy voice interactions in services such as Google Translate and the basic tiers of the Google Assistant. They are characterized by their high reliability and are frequently updated to improve phonetic accuracy across more than 100 supported languages and variants.

Rankings & Comparison