FLUX.2 [klein] 9B is a high-speed text-to-image and image-editing model developed by Black Forest Labs. As part of the FLUX.2 family, it is designed to define the balance between generation quality and latency, offering a compact alternative to larger models while maintaining competitive visual synthesis capabilities. The model's name, derived from the German word for "small," highlights its focus on efficiency and sub-second inference speeds on modern hardware.\n\nThe architecture is based on a rectified flow transformer and incorporates an 8B Qwen3 text embedder. It is characterized by its 9-billion parameter count and a step-distilled pipeline that allows it to generate high-quality images in as few as 4 inference steps. This optimization supports near real-time interaction, making it suitable for applications requiring immediate visual feedback. The model unifies text-to-image generation and complex image editing—such as multi-reference composition and style transfer—within a single architecture.\n\n## Performance and Usage\nFLUX.2 [klein] 9B excels in prompt adherence, photorealism, and legible text rendering. It handles complex spatial layouts and anatomical details effectively, and it can produce high-definition outputs up to 2048x2048 resolution. The model is trained to interpret long, descriptive natural language prompts, reducing the necessity for specialized prompt engineering.\n\nFor best results, users are encouraged to describe scenes with natural phrasing, specifying elements like lighting, camera perspective, and subject interactions. The model also supports the use of multiple reference images to guide style or subject consistency in editing workflows. It is released under the FLUX Non-Commercial License, intended for research, creative exploration, and local development on hardware with approximately 29GB of VRAM.

Rankings & Comparison