Mistral logo
Mistral
Open Weights

Mixtral 8x7B Instruct

Released Dec 2023

Intelligence
#440
Context33K
Parameters46.7B

Mixtral 8x7B Instruct is a sparse mixture-of-experts (SMoE) language model developed by Mistral AI. It is an instruction-tuned version of the Mixtral 8x7B base model, optimized for conversational accuracy and task completion through supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). The model is designed to follow complex prompts and perform well across diverse reasoning benchmarks.

Architecture and Efficiency

The model architecture features a total of 46.7 billion parameters, but utilizes a sparse routing mechanism that activates only 12.9 billion parameters per token during inference. This approach allows the model to achieve processing speeds and costs comparable to a 12.9B dense model while maintaining the performance levels of much larger systems. It natively supports a context window of 32,768 tokens, enabling it to handle long-form documents and extended multi-turn dialogues.

Capabilities

Mixtral 8x7B Instruct provides multilingual support for English, French, Italian, German, and Spanish. It demonstrates strong proficiency in code generation and mathematics, frequently matching or exceeding the performance of larger proprietary models in these specialized domains. The model weights are released under the Apache 2.0 license, facilitating broad use in both research and commercial applications.

Rankings & Comparison