LTX Video v0.9.7 13B is a high-performance latent video generation model developed by Lightricks, utilizing a Diffusion Transformer (DiT) architecture. This version represents a significant scale-up in the LTX model family, employing 13 billion parameters to achieve superior visual fidelity, realistic motion dynamics, and complex prompt adherence. It is designed to bridge the gap between professional-grade video synthesis and real-time creative accessibility.
Technical Architecture
The model features a specialized Video-VAE with a high pixel-to-latent compression ratio (1:192), which allows for efficient processing of high-resolution video data. The v0.9.7 iteration is notable for its distilled training approach, which enables the model to generate high-definition video (up to 1216×704 resolution) in approximately 10 seconds on professional hardware. It supports FP8 quantization, allowing the 13B architecture to run on consumer-grade GPUs with 24GB of VRAM.
Key Capabilities
LTX Video supports a versatile array of creative tasks, including text-to-video, image-to-video, and multi-keyframe conditioning for precise control over scene transitions. It is capable of producing 30 FPS video sequences with strong temporal consistency and realistic physical interactions. As an open-weight model, it is compatible with community tools like ComfyUI and Diffusers, supporting specialized fine-tuning via LoRA and IC-LoRA for localized motion and style control.