HunyuanVideo-1.5 is a lightweight video generation foundation model developed by Tencent's Hunyuan team. Designed to deliver high-quality video synthesis with lower computational requirements, the model features 8.3 billion parameters, making it capable of running on consumer-grade GPUs with as little as 14GB of VRAM. It represents a significant optimization over its predecessors, balancing visual fidelity with inference efficiency.

The model's architecture is built on a Diffusion Transformer (DiT) integrated with a 3D Causal VAE. A defining feature is the Selective and Sliding Tile Attention (SSTA) mechanism, which reduces computational overhead by pruning redundant spatiotemporal blocks. This innovation allows the model to generate 5–10 second clips with superior motion coherence and prompt adherence compared to earlier sparse attention implementations.

HunyuanVideo-1.5 supports both text-to-video and image-to-video workflows. While its base generation occurs at 480p or 720p, it includes an integrated video super-resolution (VSR) network that upscales outputs to 1080p. This VSR module is specifically trained to stabilize motion and refine cinematic textures, such as film grain and lighting, ensuring a professional aesthetic across various styles including photorealism, anime, and 3D animation.

Rankings & Comparison