HiDream logo
HiDream
Open Weights

HiDream-E1-Full

Released Apr 2025

AA Editing
#53
Parameters17B

HiDream-E1-Full is an instruction-based image editing model developed by HiDream.ai. It is designed to perform fine-grained modifications to existing images through natural language commands, such as adding or removing objects, changing styles, and adjusting backgrounds. The model is built as an extension of the HiDream-I1 foundation model, focusing on balancing editing fidelity with the preservation of unmodified image regions.

The architecture of HiDream-E1-Full features a 17-billion parameter sparse Diffusion Transformer (DiT) backbone, which incorporates Mixture-of-Experts (MoE) components to optimize performance. It operates within a learned latent space and integrates a hybrid text encoding module that combines representations from multiple sources: Long-Context CLIP (CLIP-L/14 and CLIP-G/14), T5-XXL, and Llama-3.1-8B-Instruct. This multi-encoder setup allows the model to interpret complex semantic instructions with high precision.

Key capabilities of the model include photorealistic style transfer, wardrobe and accessory changes, and complex scene adjustments while maintaining compositional and lighting consistency. For optimal results, the model typically processes images at a fixed resolution of 768x768 pixels. It utilizes a spatially weighted loss function to ensure that only the areas specified by the user's instructions are altered, leaving the surrounding image context stable and intact.

Rankings & Comparison