Mistral Small 3 (also known as Mistral Small 2501) is a 24-billion parameter language model developed by Mistral AI, designed for high-efficiency and low-latency performance. Released under the Apache 2.0 license, it aims to provide a high-performance open-weight option that can be deployed on consumer-grade hardware, such as a single RTX 4090, while rivaling the capabilities of significantly larger models.
The model utilizes a transformer architecture optimized for speed, featuring a reduced number of layers compared to standard configurations to minimize the time per forward pass. It employs the Tekken tokenizer, which has a vocabulary size of 131,072, allowing for more efficient text compression and improved multilingual processing across dozens of languages including English, French, German, Spanish, Chinese, and Japanese.
Mistral Small 3 is engineered for agentic workflows, featuring native support for function calling and structured JSON outputs. In benchmark evaluations, the model demonstrated strong reasoning and instruction-following capabilities, achieving over 81% on the MMLU benchmark. It supports a context window of 32,768 tokens, making it suitable for fast-response conversational agents and subject-matter-specific fine-tuning.