Allen AI logo
Allen AI
Open Weights

tulu-2-dpo-70b

Released Nov 2023

Arena AI
#228
Parameters70B

Tulu-2-DPO-70B is an instruction-tuned language model developed by the Allen Institute for AI (AI2). As part of the Tulu 2 suite, it is designed to study and improve the adaptation of pretrained models to human instructions and user preferences. It is built on the Llama 2 70B architecture.

Training and Alignment

The model was developed using a multi-stage process, beginning with supervised fine-tuning on the Tulu V2 mix, a diverse dataset of human and synthetic instructions. It was further refined using Direct Preference Optimization (DPO) on the UltraFeedback dataset. This method allows the model to align with human preferences more efficiently than traditional Reinforcement Learning from Human Feedback (RLHF), providing a robust open-source alternative for high-capacity chat and reasoning tasks.

Research Significance

AI2 released Tulu-2-DPO-70B as a transparent research artifact, providing the full training data, evaluation framework, and model weights. At the time of its release, it represented one of the first successful applications of the DPO algorithm at the 70-billion-parameter scale, aiming to facilitate open research into the best practices of post-pretraining adaptation for large language models.

Rankings & Comparison