Zephyr 7B Beta is a 7-billion parameter language model fine-tuned from Mistral 7B v0.1 by the Hugging Face H4 team. It is designed to act as a helpful assistant and follows a series of models focused on aligning smaller models with human preferences through advanced distillation and alignment techniques rather than extensive human-annotated reinforcement learning.\n\nThe model was trained using a pipeline involving distilled Supervised Fine-Tuning (dSFT) followed by Direct Preference Optimization (DPO). The alignment process utilized the UltraChat and UltraFeedback datasets, where DPO was used to optimize the model's policy directly against preferences ranked by a larger teacher model (GPT-4). This approach allowed Zephyr 7B Beta to achieve conversational performance that challenged much larger proprietary models on benchmarks like MT-Bench and AlpacaEval at the time of its release.\n\n## Technical Characteristics\nZephyr 7B Beta inherits the architectural features of its base model, including Grouped-Query Attention (GQA) and a context window of 8,192 tokens. It is released under the MIT License, permitting broad use for research and commercial applications. Its development was part of a larger research effort to demonstrate that smaller models can achieve high intent alignment and helpfulness through efficient distillation pipelines.
Arena AI
#252
Parameters7B
Explore AI Studio
Access 50+ top AI models for image, 3D, and audio generation in one unified workspace.
Open AI Studio