Phi-3-small-8k-instruct is a 7-billion parameter language model developed by Microsoft as part of the Phi-3 family of small language models (SLMs). It is designed to deliver high performance in reasoning-dense tasks such as mathematics, coding, and logical deduction, while maintaining a footprint small enough for efficiency in compute-constrained environments. This specific version supports a context window of 8,000 tokens.
Architecture and Training
The model utilizes a dense decoder-only Transformer architecture that incorporates an alternating dense and blocksparse attention mechanism. It was trained on 4.8 trillion tokens of data, comprising a mixture of high-quality synthetic datasets and filtered publicly available web content. The training process for the instruct version included post-training refinements using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to enhance instruction following and safety alignment.
Capabilities
Despite its relatively small parameter count, the model is designed to compete with larger models in benchmarks measuring common sense, language understanding, and logic. It utilizes the tiktoken tokenizer and supports a vocabulary size of 100,352 tokens. Its primary use cases include scenarios requiring low latency and strong reasoning capabilities within a limited memory budget.