Krea Realtime (specifically the 14B video model) is an open-source video generation model designed for high-speed, interactive creative workflows. Developed by Krea, it utilizes a 14-billion parameter autoregressive architecture that enables users to generate and manipulate video content in near real-time. The model supports various modalities, including text-to-video, image-to-video, and live video-to-video transformations for stylizing webcam or screen feeds.
The model's design is distilled from the Wan 2.1 14B base model using a technique called Self-Forcing. This method converts standard bidirectional diffusion models into autoregressive ones, allowing for the streaming of video frames as they are generated. This architecture provides a significant reduction in latency, achieving a "time to first frame" of roughly one second and maintaining inference speeds up to 11 frames per second on high-end hardware.
To ensure stability and temporal consistency, Krea Realtime implements specialized memory and sampling optimizations. Key technical features include KV Cache Recomputation and KV Cache Attention Bias, which mitigate the exposure bias and error accumulation common in autoregressive video generation. These innovations allow the model to maintain visual coherence over long-form generations while remaining responsive to real-time prompt adjustments and style changes.