TeleVideo 2.0 is a multimodal video generation model developed by TeleAI, the Artificial Intelligence Research Institute of China Telecom. Part of the Xingchen (Starry Sky) large model ecosystem, it is designed for high-resolution video synthesis, supporting resolutions up to 2K and extending video durations to the minute-level. The model facilitates a variety of creative workflows including text-to-video, image-to-video, and character-driven generation.
The architecture of TeleVideo 2.0 utilizes a novel "global planning + local refinement" framework. Unlike traditional frame-by-frame generation methods that often suffer from error accumulation and visual artifacts, this model employs a next-frame prediction paradigm. This approach is intended to ensure smoother motion trajectories and maintain consistent character expressions and background details throughout the sequence.
TeleVideo 2.0 integrates a unified video post-training framework that leverages reinforcement learning to align model outputs with human visual preferences. This system focuses on feedback modeling and training scheduling to enhance content consistency and temporal stability. In international evaluations, such as the Artificial Analysis leaderboard, the model has been recognized for its performance in image-to-video consistency and overall visual quality.