Phi-4 by Microsoft Azure: LLM Benchmarks, Rankings & Specs

Phi-4 is a 14.7 billion parameter language model developed by Microsoft, designed to provide high-level reasoning and linguistic performance while maintaining a relatively small footprint. It is the fourth generation in the Phi series of small language models (SLMs) and utilizes a dense decoder-only transformer architecture. The model is built to excel in environments with memory or compute constraints where high logical accuracy is required.## Training and DataThe training of Phi-4 utilized a combination of synthetic datasets, filtered public domain websites, and acquired academic content. This "textbook-quality" data approach is intended to enhance the model's proficiency in complex tasks such as mathematics, coding, and logical reasoning. The training process involved 9.8 trillion tokens over 21 days using a cluster of 1,920 H100 GPUs. Post-training techniques including supervised fine-tuning (SFT) and direct preference optimization (DPO) were applied to ensure precise instruction adherence.## Capabilities and BenchmarksPhi-4 is optimized for advanced reasoning and demonstrates competitive performance on benchmarks like GPQA, MMLU, and HumanEval, often outperforming much larger models in specialized logic tasks. It supports a context window of 16K tokens and is specifically designed for general-purpose AI applications that prioritize reasoning and logic over sheer parameter count. Since its initial release, Microsoft has expanded the Phi-4 family to include specialized multimodal and miniaturized versions.

Phi-4

Explore AI Studio

Rankings & Comparison