Vivago 2.0 is an all-in-one AI creative platform developed by HiDream, designed to facilitate an end-to-end creative workflow through high-quality generative models. Launched as a significant update to the original Vivago suite, version 2.0 integrates multiple core capabilities, including text-to-image generation, image-to-video transformation, and an AI-powered podcast generator that produces lip-synced videos from portrait images and voice recordings.
The platform's visual foundation is built upon the HiDream-I1 series, a set of high-efficiency generative foundation models. The primary model utilizes a Diffusion Transformer (DiT) architecture featuring 17 billion parameters and a LLaMA 3.1 8B backbone. It incorporates a Mixture-of-Experts (MoE) approach within its Feed-Forward Network (FFN) to enhance prompt adherence and rendering precision while maintaining global consistency across generated frames.
Vivago 2.0 supports diverse artistic styles, ranging from photorealistic and 3D rendering to illustrated aesthetics. Its video generation features allow users to animate static visuals with motion effects and background music, while its image editing tools support dynamic resolution and instruction-based modifications. The model series has demonstrated competitive performance on benchmarks such as GenEval and DPG-Bench, often rivaling or exceeding other open-source diffusion models in prompt accuracy.