Claude 2.0 by Anthropic: LLM Benchmarks, Rankings & Specs

Claude 2.0 is a large language model developed by Anthropic, released as an iterative advancement of the Claude model family. It introduced a significant expansion in context capacity, featuring a 100,000-token context window that allows the model to process extensive documents, technical manuals, and long-form literature in a single query. The model was designed to produce more nuanced and coherent long-form text compared to its predecessors.

Development and Safety

Anthropic utilized Constitutional AI to train Claude 2.0, a method where the model is guided by a set of high-level principles to self-correct and align its behavior. This approach aims to ensure the model remains helpful, honest, and harmless without relying solely on manual human moderation. According to internal evaluations released at launch, Claude 2.0 demonstrated a twofold improvement in providing harmless responses compared to Claude 1.3.

Performance and Capabilities

The model showed marked improvements in technical domains, particularly in coding, mathematics, and complex reasoning. On official benchmarks, Claude 2.0 achieved a score of 71.2% on the HumanEval Python coding test and 88.0% on the GSM8k grade-school math dataset. It also performed competitively on professional and academic examinations, scoring in the 90th percentile on GRE reading and writing tests and achieving 76.5% on the multiple-choice section of the Bar exam.

Claude 2.0

Development and Safety

Performance and Capabilities

Explore AI Studio

Rankings & Comparison