Mistral Small 4 is a 119-billion parameter Mixture-of-Experts (MoE) model released by Mistral AI in March 2026. It represents a significant evolution in the Mistral Small family by unifying the capabilities of three previously distinct model lines: Magistral (reasoning), Pixtral (vision), and Devstral (coding). This consolidation allows a single deployment to handle diverse workflows ranging from document analysis and agentic software development to complex mathematical reasoning within a single architecture.
The model's architecture utilizes 128 experts, with 4 active per token, resulting in approximately 6 billion active parameters during inference (8 billion including embedding layers). This sparse design enables the model to deliver the performance of a large-scale system while maintaining the inference costs and latency profiles of much smaller models. Compared to the previous generation, Mistral Small 4 offers a 40% reduction in end-to-end completion time and up to three times higher throughput in optimized serving environments.
A defining feature of Mistral Small 4 is its configurable reasoning_effort parameter, which allows users to toggle between a "fast" instruct mode and a deeper reasoning mode. In its non-reasoning or standard instruct configuration, the model is optimized for concise, low-latency instruction following. Benchmarks indicate that in this mode, it produces significantly shorter outputs than comparable models while maintaining high accuracy across coding and general knowledge tasks.
Released under the Apache 2.0 license, Mistral Small 4 is natively multimodal, supporting the simultaneous processing of text and image inputs within a 256k-token context window. It provides robust support for system prompts, native function calling, and structured JSON output, making it highly suitable for agentic applications and enterprise-grade tool use.