Llama 2 Chat 7B is a generative text model optimized for dialogue and conversational use cases, developed by Meta as part of the Llama 2 family. It is an auto-regressive transformer model designed to provide a high-performance open-weights alternative for local execution and research. It represents the smallest scale in the Llama 2 chat-tuned series, balancing computational efficiency with sophisticated reasoning capabilities.
Training and Architecture
The model was pretrained on approximately 2 trillion tokens of publicly available online data, marking a significant increase in training volume over the previous generation. It utilizes a context window of 4096 tokens and a standard transformer architecture employing multi-head attention (MHA). To adapt the base model for conversational tasks, Meta used Supervised Fine-Tuning (SFT) followed by Reinforcement Learning from Human Feedback (RLHF).
Fine-Tuning and Safety
The alignment process involved over 1 million human annotations to ensure the model produces helpful and safe responses. Key techniques used during fine-tuning include Ghost Attention (GAtt), which helps the model maintain consistency with system instructions across multiple turns of a conversation. The model underwent extensive red-teaming and safety evaluations to minimize toxic or biased outputs while maintaining conversational utility.