Sora is a text-to-video generative AI model developed by OpenAI, designed to simulate the physical world in motion. It can generate videos up to one minute long while maintaining high visual quality and adherence to the user's prompt. The model is capable of creating complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background, demonstrating a deep understanding of how objects exist and interact in the physical world.

Architecture and Training

Sora uses a diffusion transformer (DiT) architecture, a departure from the traditional U-Net-based diffusion models. It represents video and image data as collections of smaller units called "patches," which are analogous to tokens in large language models. By treating video frames as spacetime patches, the model can be trained on a diverse range of visual data across various durations, resolutions, and aspect ratios. The architecture also leverages the recaptioning technique originally developed for DALL·E 3, which helps the model follow complex textual instructions more faithfully.

Capabilities and Limitations

The model supports the generation of full-resolution 1080p video in multiple formats, including widescreen and vertical aspect ratios. Beyond text-to-video, Sora can animate static images, extend existing videos in time, and fill in missing frames. While it demonstrates emergent abilities like object permanence and consistent world simulation, the model still faces challenges with complex physical interactions, such as correctly modeling the cause and effect of a cookie being bitten or maintaining spatial consistency during rapid camera movements.

Safety and Deployment

To address potential risks, OpenAI integrated safety measures including adversarial testing by red teamers and the development of tools to detect generated content. The model incorporates C2PA metadata and watermarking to indicate that videos were produced by AI. Sora was initially released as a research preview before moving into broader availability through the dedicated sora.com platform and mobile applications.

Rankings & Comparison