Grok 3 by SpaceXAI: LLM Benchmarks, Rankings & Specs

Grok 3 is a large language model developed by xAI, released in February 2025. It was trained on the "Colossus" supercomputer cluster using 100,000 Nvidia H100 GPUs, utilizing approximately ten times the computational resources of its predecessor, Grok-2. The model is designed for high-performance reasoning, mathematical problem-solving, and advanced software engineering tasks.

The model features a context window of 1 million tokens, enabling it to process extensive documents and maintain coherence over long-form conversations. It introduces a specialized reasoning version, Grok 3 (Think), which employs reinforcement learning and test-time computation to perform deep logical analysis, allowing the model to self-correct and explore alternative solution paths before providing an answer.

In technical evaluations, Grok 3 demonstrated high performance across multiple benchmarks, including the 2025 American Invitational Mathematics Examination (AIME) and the GPQA Diamond benchmark for graduate-level reasoning. The model also supports multimodal capabilities, including visual and video understanding, and maintains real-time information access through integration with the X platform.

Grok 3

Explore AI Studio

Rankings & Comparison

Grok 3

Explore AI Studio

Rankings & Comparison