Wan 2.2 5B is an open-source video generation model developed by Alibaba's Tongyi Wanxiang Lab. As a major upgrade to the Wan series, it introduces a Mixture-of-Experts (MoE) architecture specifically designed for video diffusion models—a first for the category. This 5-billion parameter variant is a hybrid Text-Image-to-Video (TI2V) model optimized for high-definition output and computational efficiency.
The model supports the generation of 720p resolution video at 24 frames per second. Compared to previous versions, Wan 2.2 features significantly improved motion consistency and semantic understanding, having been trained on a dataset containing over 65% more images and 83% more videos than Wan 2.1. These enhancements allow the model to handle complex physical simulations and nuanced body movements with greater realism.
A primary focus of the Wan 2.2 series is cinematic aesthetics. The model integrates professional cinematography principles, including multi-dimensional control over lighting, color grading, and composition. This allows creators to generate stylistically diverse content with precise aesthetic preferences through detailed prompt instructions.
Wan 2.2 5B is designed for accessibility, capable of running on consumer-grade hardware such as a single NVIDIA RTX 4090. It natively supports text-to-video, image-to-video, and hybrid multi-modal inputs within a single unified framework, facilitating high-speed generation for both researchers and creative professionals.