FLUX.1 Kontext [pro] is a high-performance multimodal image generation and editing model developed by Black Forest Labs. It introduces an in-context architecture that allows users to provide both text prompts and reference images as simultaneous inputs. This approach enables the model to extract and modify visual concepts seamlessly, facilitating complex tasks such as character preservation across multiple scenes and iterative image refinement without requiring specialized fine-tuning or complex manual workflows.
The model is built on a 12 billion parameter rectified flow matching transformer. Unlike traditional diffusion models, this architecture optimizes the path from noise to imagery, resulting in significant performance gains and inference speeds reported to be up to 8x faster than previous state-of-the-art editors. It supports flexible aspect ratios ranging from 1:4 to 4:1 and is optimized for high-quality synthesis at 1-megapixel resolutions.
Core Capabilities
One of the model's primary strengths is its character consistency, which preserves the unique features of a reference subject or object across different environments and poses. It also features robust local editing capabilities, allowing for targeted modifications—such as changing an object's color or swapping clothing—without affecting the surrounding composition. Furthermore, the model handles typography with high precision, enabling the accurate generation and modification of text within images.
FLUX.1 Kontext [pro] is designed for professional creative pipelines, supporting an iterative, multi-turn workflow where users can build upon previous edits while maintaining visual coherence. This makes it particularly suitable for sequential storytelling, brand design, and rapid prototyping where maintaining identities and styles across various assets is essential.