Mistral 7B Instruct v0.2 is a fine-tuned version of the Mistral 7B v0.2 generative model, optimized for conversation and instruction-following. It serves as a direct upgrade to the initial v0.1 release, focusing on improved context handling and architectural refinements for better dialogue performance.
The model is built on a 7-billion parameter decoder-only transformer architecture. A primary distinction of the v0.2 iteration is the expansion of the context window to 32,768 tokens, quadrupling the 8,192-token limit of its predecessor. Additionally, this version removes the sliding window attention mechanism found in v0.1, employing a standard attention approach across its full context window to simplify integration and improve long-range dependencies.
Mistral 7B Instruct v0.2 utilizes Grouped-query attention (GQA) to achieve faster inference speeds and reduced memory overhead during generation. It is released under the Apache 2.0 license, facilitating both research and commercial applications. The model is designed to maintain high efficiency and performance in tasks such as reasoning, mathematics, and code generation.