Mixtral 8x7B Instruct v0.1 is a sparse mixture-of-experts (SMoE) language model developed by Mistral AI. It utilizes a decoder-only transformer architecture where each layer's feedforward block chooses from eight distinct groups of parameters (experts). For every token processed, a routing network selects two experts to compute the output, allowing the model to leverage a total of 46.7 billion parameters while only using approximately 12.9 billion active parameters per token during inference.\n\nThis version is fine-tuned for instruction following and dialogue using supervised fine-tuning and Direct Preference Optimization (DPO). It supports a context window of 32,768 tokens and is proficient in multiple languages, including English, French, Italian, German, and Spanish. The model demonstrates strong capabilities in reasoning, mathematics, and code generation, matching or exceeding the performance of significantly larger models in its category.\n\nThe model weights are released under the Apache 2.0 license, permitting broad use for both commercial and research purposes. Its design is optimized for high-throughput and cost-efficient inference, effectively operating at the speed and cost of a 12.9B parameter model while providing the capacity of a much larger network.