Google logo
Google

Nano Banana (Gemini 2.5 Flash Image)

Released Aug 2025

Nano Banana is the popular codename for Gemini 2.5 Flash Image, an image generation and editing model developed by Google DeepMind. Formally released in August 2025, the model serves as a high-speed, multimodal successor to earlier vision capabilities in the Gemini family. It is optimized for low-latency, high-volume tasks, providing a balance of photorealistic generation and complex spatial editing through natural language dialogue.

The model utilizes a Multimodal Diffusion Transformer architecture, which allows it to process text and multiple image inputs natively rather than treating generation as a modular post-processing step. It features a scalable parameter count ranging from 450 million to 8 billion, employing a sparse mixture-of-experts (MoE) approach to maintain efficiency. This design enables generation speeds of approximately 1–2 seconds for standard requests, which is significantly faster than traditional standalone diffusion models.

Key Capabilities and Features

A defining characteristic of Nano Banana is its support for character and style consistency. This allows the model to maintain the visual identity of a person, pet, or object across different scenes and prompts without the need for extensive fine-tuning. Additionally, the model supports multi-image fusion, capable of blending up to three input images into a single composition while maintaining coherent lighting, shadows, and depth of field.

Beyond basic text-to-image tasks, the model excels in conversational editing. Users can perform targeted transformations—such as removing objects, altering a subject's pose, or changing backgrounds—using simple natural language instructions. The model also integrates Google's world knowledge to interpret context-heavy prompts, such as understanding hand-drawn diagrams or rendering accurate text within an image. To ensure transparency, all outputs include an invisible digital watermark via SynthID.

Rankings & Comparison