Qwen Image Edit Plus 2509 is a 20-billion parameter image editing foundation model developed by Alibaba’s Qwen team and released in September 2025. Built on a specialized Multimodal Diffusion Transformer (MMDiT) architecture, the model is designed to handle complex, instruction-based image manipulations. It provides a framework for editing existing visuals with high precision, allowing for modifications such as object addition, background replacement, and detailed restoration while following natural language prompts.
Core Features and Capabilities
The model’s defining capability is its multi-image editing support, which enables users to combine and edit elements from up to three input images. This supports combinations such as "person + person" for group photo synthesis, "person + product" for promotional assets, and "person + scene" for character environment transitions. A primary focus of the 2509 iteration is identity consistency, which preserves the specific facial features of individuals and the structural integrity of products across different poses and contexts. Additionally, the model features advanced text editing functions, allowing users to modify in-image text content, fonts, and textures while maintaining the original artistic style.
Architecture and Control
Technically, the model integrates visual semantic control from the Qwen2.5-VL series with visual appearance control via VAE encoding. This dual-pathway approach allows the model to understand high-level scene context while maintaining pixel-level details and textures. For granular structural control, it includes native ControlNet support, accepting conditioning inputs such as depth maps, edge maps, and keypoint maps to guide character poses and composition. The model is released under the Apache 2.0 license.