Llama 65B by Meta: LLM Benchmarks, Rankings & Specs

Llama 65B is the largest variant in the first generation of the Large Language Model Meta AI (LLaMA) suite. Developed by Meta AI, it was designed as a foundational model trained on 1.4 trillion tokens from publicly available data sources. The model was part of a research initiative to demonstrate that high-performance large language models could be trained using publicly accessible datasets rather than proprietary or restricted information.

The architecture of Llama 65B is based on the Transformer framework with several specific modifications, including the use of Rotary Positional Embeddings (RoPE), RMSNorm for pre-normalization, and the SwiGLU activation function. These design choices were intended to improve training stability and model performance.

At its release, Llama 65B was noted for its ability to perform competitively against significantly larger proprietary models. It was optimized for inference efficiency and served as a base for numerous subsequent open-source fine-tuning projects and research in model alignment and instruction following.

Llama 65B

Explore AI Studio

Rankings & Comparison