Qwen Image Edit Max 2601 is a high-performance image editing foundation model developed by Alibaba's Qwen team. Released as part of the January 2026 model suite, it represents the proprietary "Max" tier of the Qwen-Image ecosystem, optimized for high-fidelity transformations and complex instruction following via API. The model is built on a multi-modal diffusion transformer (MMDiT) architecture and employs a dual-path input system that independently processes visual semantics and appearance to ensure precise control over edited regions.
Key capabilities of the model include both semantic editing, such as subject rotation and style transfer, and appearance editing, which allows for the addition or removal of specific objects while keeping the surrounding pixels unchanged. It is particularly noted for its advanced bilingual text editing functionality, enabling users to modify, add, or delete text in both Chinese and English while preserving the original font, size, and stylistic attributes of the image.
Technical Features
- Dual-Path Control: Feeds input images through a vision-language encoder (likely based on the Qwen-VL series) for semantic understanding and a VAE encoder for low-level visual appearance, fusing them within the diffusion core.
- Multi-Image Support: The model can handle multiple reference images, allowing for tasks such as character consistency preservation across different scenes or combining elements from separate photos into a single composition.
- Bilingual Proficiency: Optimized for high-fidelity text rendering in both Western and East Asian scripts, making it effective for localized advertising and poster design.
For optimal results, users can provide short, direct instructions like "change the car color to blue" or "replace the text with 'Grand Opening' while keeping the font." The model is designed to handle multi-step instructions and can be integrated into complex workflows involving LoRAs for specific camera movements or lighting restorations.