Sarvam M (also known as sarvam-m) is a multilingual hybrid-reasoning language model developed by the Indian AI startup Sarvam. Built upon the Mistral-Small 24B architecture, the model is specialized for the Indian context, supporting 11 major Indic languages including Hindi, Bengali, Gujarati, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, and Telugu. It was designed to address the gap in high-quality reasoning capabilities for Indian languages, particularly in technical domains like mathematics and programming.
The model features a unique Hybrid Thinking Mode, allowing users to toggle between "think" and "non-think" states. The thinking mode is tailored for complex logical chains, mathematical problem-solving, and coding tasks, while the non-think mode provides efficient, general-purpose conversational output. This dual-mode approach is managed via post-training techniques including Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR).
Architecturally, Sarvam M utilizes a decoder-only Transformer design with Sliding Window Attention (SWA) and Rotary Positional Embeddings (RoPE). It has been heavily optimized for Indic scripts, using a custom tokenizer that significantly reduces token usage for Indian languages compared to standard Western models. This efficiency results in faster inference and lower costs when processing native script or romanized Indic text.
Sarvam M demonstrates significant performance improvements over its base architecture, with a reported 20% average gain on Indian language benchmarks and over 21% improvement in mathematical reasoning. It is particularly effective at the intersection of multilingualism and logic, showing an 86% performance boost in romanized Indian language versions of the GSM-8K benchmark. The model's alignment also includes specialized "character training" to ensure responses reflect Indian cultural values and mitigate political or geographic biases common in global datasets.