Alibaba logo
Alibaba
Open Weights

Qwen3.5 Omni Flash

Released Mar 2026

Intelligence
#150
Coding
#226
Context256K
Parameters35B

Qwen3.5 Omni Flash is a natively omnimodal language model developed by Alibaba, released on March 30, 2026. Designed for low-latency, high-throughput applications, the Flash variant serves as the mid-tier model in the Qwen3.5 Omni series, balancing reasoning performance with inference speed. It is capable of processing text, images, audio, and video inputs in a single forward pass, while generating both text and real-time streaming speech as output.

The model utilizes a Thinker-Talker architecture built on a Hybrid-Attention Mixture-of-Experts (MoE) design. The "Thinker" component manages multimodal input reasoning and understanding, while the "Talker" component handles the synthesis of contextual speech tokens. This architecture allows the model to maintain high-quality single-modal performance while enabling emergent capabilities such as Audio-Visual Vibe Coding, where the model generates code based on spoken instructions paired with visual references like screen recordings.

Key features of Qwen3.5 Omni Flash include a 262,144-token context window, which supports the analysis of up to 10 hours of continuous audio or approximately 400 seconds of 720p video. It incorporates Adaptive Rate Interleave Alignment (ARIA) technology to improve the naturalness and stability of speech synthesis, reducing common issues like misread digits or skipped words. The model also supports semantic interruption, allowing it to distinguish between user interjections and background noise during real-time voice conversations.

In terms of linguistic coverage, the model supports speech recognition across 113 languages and dialects and speech generation in 36 languages. It also includes native voice cloning capabilities via API, enabling users to replicate specific voice identities from short audio samples. The model was trained on a dataset including over 100 million hours of native audio-visual data, ensuring robust perception across diverse environmental conditions.

Rankings & Comparison