Microsoft logo
Microsoft
Open Weights

phi-3-mini-128k-instruct

Released Apr 2024

Arena AI
#253
Context128K
Parameters3.8B

Phi-3-mini-128k-instruct is a lightweight, open-weight language model developed by Microsoft as part of the Phi-3 family. It contains 3.8 billion parameters and is specifically fine-tuned for instruction-following. This model variant is distinguished by its 128,000-token context window, enabling it to process and reason over significantly longer documents compared to standard small language models (SLMs). ## Architecture and Training The model utilizes a dense decoder-only transformer architecture. It was trained using a data-centric approach, focusing on high-quality, reasoning-dense information. The training dataset consists of 3.3 trillion tokens comprising heavily filtered web data and synthetic, "textbook-quality" data designed to teach mathematics, coding, and logical reasoning. This methodology allows the model to achieve performance levels often associated with much larger models. ## Capabilities Phi-3-mini-128k-instruct is optimized for reasoning, mathematics, and code generation. Its expanded context window makes it particularly suitable for long-form document summarization, retrieval-augmented generation (RAG) applications, and maintaining coherence over extended multi-turn conversations. To ensure alignment and safety, the model underwent a post-training process involving supervised fine-tuning (SFT) and direct preference optimization (DPO).

Rankings & Comparison