Lightricks logo
Lightricks
Open Weights

LTX-2.3 Fast

Released Mar 2026

LTX-2.3 Fast is a speed-optimized variant of the LTX-2.3 multimodal video generation engine developed by Lightricks. As a 22B parameter Diffusion Transformer (DiT), it is designed to generate synchronized high-resolution video and audio within a single model. The "Fast" version specifically utilizes a distilled architecture to significantly reduce inference latency and computational overhead, making it suitable for rapid prototyping and real-time creative workflows.

Building upon its predecessors, LTX-2.3 introduces a redesigned Variational Autoencoder (VAE) that enhances visual fidelity, particularly in textures, fine details, and edge clarity. A key feature of this release is the native support for portrait (9:16) aspect ratios alongside standard landscape formats, allowing for high-quality vertical content generation without the need for post-processing crops. The model also features a new gated attention text connector, which improves adherence to complex prompts regarding motion, timing, and character expression.

Technical Capabilities

LTX-2.3 Fast supports various generative tasks, including text-to-video, image-to-video, and audio-to-video. It is capable of generating clips up to 20 seconds in length with options for 24 or 48 FPS. The model's native audio generation produces synchronized ambient sound and dialogue, with cleaner output and reduced noise compared to earlier versions. Furthermore, the model is fully open-source under the Apache 2.0 license, supporting local execution and LoRA fine-tuning for custom characters or styles.

Prompting and Implementation

For optimal results, users are advised to follow specific technical constraints: video dimensions (width and height) should be divisible by 32, and the frame count should typically follow the formula of (8n + 1) (e.g., 65 or 121 frames). While the model excels at cinematic motion and high-detail textures, prompt following is heavily influenced by the style of the input text; descriptive, motion-focused prompts generally yield the most coherent temporal results.

Rankings & Comparison