Seedance 1.0 Mini is an inference-efficient video generation model developed by ByteDance Seed. As a lightweight variant in the Seedance 1.0 family, it is optimized for high-speed content creation while maintaining strong semantic understanding and visual quality. The model supports both text-to-video (T2V) and image-to-video (I2V) generation, capable of producing 1080p video with cinematic aesthetics and stable motion.
The model is built on a Diffusion Transformer (DiT) architecture featuring decoupled spatial and temporal layers and a time-causal VAE decoder. This design enables native multi-shot storytelling, allowing for the generation of narrative videos with cohesive shots and consistent subject representation across transitions. To achieve its performance, the model utilizes multi-stage distillation and system-level optimizations that significantly reduce inference latency compared to traditional video diffusion frameworks.
Training for Seedance 1.0 Mini involved a multi-source data curation process augmented with precision video captioning. It was further refined using a video-tailored RLHF (Reinforcement Learning from Human Feedback) algorithm with multi-dimensional reward mechanisms. These optimizations focus on improving motion naturalness, prompt adherence, and spatiotemporal fluidity in complex multi-subject contexts.