Gemini 1.0 Ultra is a multimodal large language model developed by Google DeepMind. It is the largest and most capable model in the initial Gemini 1.0 series, designed to handle highly complex tasks across a wide variety of domains. Unlike traditional multimodal models that often combine separate components for different senses, Gemini was trained natively on a diverse dataset encompassing text, images, audio, video, and computer code.
The model is noted for its advanced reasoning and problem-solving capabilities. It was the first model to surpass human expert performance on MMLU (Massive Multitask Language Understanding), a benchmark testing knowledge and reasoning across 57 subjects. In technical evaluations, it demonstrated state-of-the-art results in several categories, including mathematical reasoning, coding, and complex image interpretation.
Gemini 1.0 Ultra's architecture is built on top of Transformer decoders and optimized for Google's proprietary Tensor Processing Units (TPUv4 and TPUv5p). This infrastructure supports the model's ability to process and reason across different modalities simultaneously, such as interpreting a video sequence or generating complex code based on combined visual and textual prompts.