Kandinsky
Open Weights

kandinsky-5.0-t2v-lite

Released Sep 2025

Kandinsky 5.0 T2V Lite is a lightweight open-source video generation model with 2 billion parameters, designed for high-speed text-to-video synthesis. It is part of the Kandinsky 5.0 family and is optimized for efficiency, allowing for high-quality video generation on consumer-grade hardware. The model is noted for its strong semantic alignment and its ability to understand complex prompts in both English and Russian.

The model's architecture utilizes a Latent Diffusion pipeline and the Flow Matching paradigm with a Diffusion Transformer (DiT) backbone. It incorporates NABLA (Neighborhood Adaptive Block-Level Attention) to manage the computational complexity of generating videos up to 10 seconds in length. For text encoding, it leverages a combination of Qwen2.5-VL and CLIP to ensure precise adherence to user prompts.

Kandinsky 5.0 T2V Lite is released in several optimized versions, including supervised fine-tuned (SFT) variants for maximum aesthetic quality and distilled checkpoints that significantly reduce inference time. The model supports various resolutions and can also be adapted for image-to-video tasks, maintaining temporal consistency across generated frames.

Rankings & Comparison