LTX-2.3 Pro is a high-fidelity video generation model developed by Lightricks, serving as the production-quality variant within the LTX-2.3 multimodal ecosystem. Built on a Diffusion Transformer (DiT) architecture with 22 billion parameters, it is designed to generate synchronized audio and video in a single forward pass. The model is capable of producing cinematic-grade content at resolutions up to 4K at 50 frames per second (FPS) with single-clip durations of up to 20 seconds.
Technical Specifications
This version introduces a rebuilt Variational Autoencoder (VAE) and a redesigned latent space that significantly enhances the preservation of fine textures such as skin pores, fabric weaves, and environmental reflections. It utilizes a gated attention text connector to achieve higher prompt adherence and compositional accuracy compared to previous versions. For audio output, the model integrates an upgraded vocoder that reduces artifacts and improves the clarity of synchronized dialogue and ambient sound.
Key Capabilities
LTX-2.3 Pro supports native portrait (9:16) and landscape (16:9) aspect ratios, eliminating the need for post-generation cropping for mobile platforms. Beyond text-to-video and image-to-video, it features an audio-to-video mode where the input audio determines the motion and pacing of the generated visuals. Users can also utilize the Retake function to re-generate specific segments of a video or the Extend function to increase the length of a clip while maintaining visual and stylistic consistency.
Control and Fine-tuning
The model provides granular creative control through camera motion parameters, including dolly, jib, and focus shifts, as well as last-frame interpolation for creating smooth transitions between two reference images. As part of Lightricks' commitment to open science, the model is released under an Apache 2.0 license, allowing for community-driven LoRA fine-tuning for specific artistic styles, characters, or motion patterns.