Vidu Q2 Pro is a high-fidelity video generation model developed by Shengshu Technology in collaboration with Tsinghua University. As the flagship, cinematic-tier version of the Vidu Q2 family, it is optimized for professional production workflows where visual detail and temporal stability are prioritized. The model is built on a Diffusion Transformer (U-ViT) architecture and supports multimodal inputs, including text-to-video, image-to-video, and multi-reference generation.
The model generates video content at resolutions up to 1080p with flexible durations ranging from 2 to 8 seconds. Its "Pro" mode features enhanced motion coherence and grounded physics, effectively reducing the "rubbery" or distorted movements often associated with AI video. It provides granular controls for creators, such as movement amplitude settings and the ability to define specific start and end frames to ensure precise transitions.
Notable capabilities of Vidu Q2 Pro include "micro-acting," which renders subtle human expressions like natural eye shifts and blinks with high realism. It also supports complex cinematic camera movements, including pans, zooms, and tracking shots, without warping the scene layout. The model’s universal reference system allows for the deep integration of multiple reference images and videos to maintain strict consistency for characters, styles, and textures across different shots.