Wan 2.6 is a multimodal video generation model developed by Alibaba’s Tongyi Lab, designed for cinematic-quality content production. As an evolution of the Wan-Video series, it transitions from a short-clip generator into a narrative storytelling engine. The model supports text-to-video (T2V), image-to-video (I2V), and reference-to-video (R2V) workflows, delivering high-definition output at up to 1080p resolution.
A central feature of Wan 2.6 is its multi-shot narrative system, which allows the model to plan and execute complex sequences with logical camera transitions and scene continuity within a single generation. It supports an extended duration of up to 15 seconds, significantly increasing the temporal capacity for story arcs and character actions compared to previous iterations.
The model introduces native audio-visual synchronization, generating synchronized dialogue with precise lip-sync, sound effects, and ambient backgrounds in a single pass. To address character consistency, Wan 2.6 utilizes an Identity Lock mechanism that allows creators to provide subject references, ensuring that character appearances and voices remain stable across different shots and camera angles. It is available in multiple scales, including a flagship 14B parameter version for professional quality and an efficient 5B variant optimized for faster processing.