Open Source logo
Open Source
Open Weights

Pyramid Flow

Released Oct 2024

AA Text→Video
#80
Parameters2B

Pyramid Flow is an open-source video generation model that employs a pyramidal flow matching architecture to produce high-resolution video content. Developed by a research consortium including Peking University, the Beijing University of Posts and Telecommunications, and Kuaishou Technology, the model is designed to generate videos from text or image prompts with high temporal consistency and cinematic detail. It is capable of producing videos up to 10 seconds in length at 768p resolution and 24 frames per second.

Architecture and Capabilities

The model's core innovation is its hierarchical generation process, which progressively refines video latents from low to high resolution. This pyramidal approach allows the model to capture global motion and fine-grained visual features more efficiently than standard full-resolution diffusion models. Initially released with a 2B parameter Diffusion Transformer (DiT) backbone based on Stable Diffusion 3, the project later introduced "miniFLUX" variants to improve human anatomy representation and motion stability.

Pyramid Flow supports both text-to-video and image-to-video generation tasks. It was trained exclusively on open-source datasets containing approximately 10 million video clips, making it a fully open-source alternative to proprietary video generators. The model is released under the MIT License, enabling broad community use and modification.

Rankings & Comparison