Gemini 1.5 Pro is a multimodal model developed by Google, designed to balance high-level performance with computational efficiency. It utilizes a Mixture-of-Experts (MoE) architecture, which allows it to selectively activate different pathways within the neural network to process information more efficiently than traditional dense models. This architecture enables the model to match or exceed the capabilities of larger models like Gemini 1.0 Ultra while requiring less compute.
The May 2024 update, announced at Google I/O, introduced significant quality improvements across various domains, including reasoning, creative writing, and coding. This version became the default model for Gemini Advanced and was made globally available to developers. It features a standard 1 million token context window, allowing it to process and reason across massive datasets such as hour-long videos, extensive codebases, or thousand-page documents.
As a natively multimodal model, Gemini 1.5 Pro was trained to handle text, images, audio, and video simultaneously from the outset. This allows for complex cross-modal reasoning, such as extracting specific details from a video file or generating code based on visual diagrams. During the May 2024 release, Google also opened a private preview for an expanded 2 million token context window, the largest available in a consumer-facing AI model at that time.