NousResearch logo
NousResearch
Open Weights

nous-hermes-2-mixtral-8x7b-dpo

Released Jan 2024

Arena AI
#236
Context33K
Parameters46.7B

Nous Hermes 2 Mixtral 8x7B DPO is a large language model developed by NousResearch. It is based on the Mixtral-8x7B-v0.1 Mixture of Experts (MoE) architecture and represents a refinement in the Hermes series through the application of Direct Preference Optimization (DPO).

The model was fine-tuned on the OpenHermes-2.5 dataset, a collection of approximately one million synthetic examples primarily generated by GPT-4. This training was followed by a DPO stage, which aligns the model's outputs with human preferences more effectively than standard supervised fine-tuning alone. The resulting model demonstrates improved performance in reasoning, coding, and creative writing tasks.

Built on the 8x7B MoE framework, the model contains roughly 46.7 billion total parameters, though it only activates a fraction of these (approximately 12.9 billion) during each inference step. It supports a context window of 32,768 tokens, making it suitable for processing long documents and maintaining complex conversational state.

Rankings & Comparison