HiDream logo
HiDream

HiDream-O1-Image-Dev-2604

Released May 2026

AA Text→Image
#10
Parameters8B

HiDream-O1-Image-Dev-2604 is a distilled development variant of the HiDream-O1 image generation series, a natively unified foundation model developed by HiDream-ai. Released in May 2026, this specific version is optimized for high-efficiency text-to-image synthesis and serves as a developer-focused checkpoint within the "O1" ecosystem, which emphasizes reasoning-integrated visual generation.

Architecture and Design

The model is built on a Pixel-level Unified Transformer (UiT) architecture. This framework represents a significant departure from standard latent diffusion models by eliminating the need for an external Variational Autoencoder (VAE) or disjoint text encoders. Instead, HiDream-O1-Image-Dev-2604 natively encodes raw pixels, text, and task-specific conditions into a single shared token space. By operating directly in raw pixel space, the model aims to minimize artifacts typically introduced by latent space compression and improve the fidelity of fine-grained visual details.

Reasoning-Driven Synthesis

The "O1" designation refers to the integration of a Reasoning-Driven Prompt Agent. This built-in mechanism acts as a "thinking" layer that resolves implicit knowledge, complex spatial layouts, and attribute grounding before the generation process begins. This allows the model to handle intricate prompts more effectively than standard one-shot generators. Despite its relatively compact 8B parameter scale, the model achieved a debut rank within the top 10 on the Artificial Analysis Text-to-Image Arena at launch, outperforming several larger open-source and proprietary models in prompt adherence and text rendering.

Key Capabilities

HiDream-O1-Image-Dev-2604 supports high-resolution image synthesis at up to 2,048 !! 2,048 pixels. Its unified architecture allows it to handle diverse tasks beyond text-to-image generation, including instruction-based image editing and multi-reference subject-driven personalization. As a distilled variant, it is specifically tuned for faster inference, typically achieving high-quality results in approximately 28 steps. The model is released under the MIT License, facilitating its use in both research and commercial environments.

Rankings & Comparison