The Granite-4.0-H-Small is a hybrid mixture-of-experts (MoE) language model developed by IBM as part of the Granite 4.0 collection. It features a novel architecture that sequentially combines Mamba-2 layers and conventional transformer blocks in a 9:1 ratio. This design is intended to leverage the linear-scaling efficiency of state-space models for global context processing alongside the precise local context parsing of transformers, resulting in significantly reduced memory requirements compared to traditional transformer-only models.
With 32 billion total parameters and 9 billion active parameters per inference request, the model is engineered for high-performance enterprise tasks while maintaining a compact enough footprint for deployment on cost-effective hardware. It is optimized for long-context reasoning, having been trained on data samples up to 512K tokens and validated for performance at a 128K token context window.
Key capabilities of Granite-4.0-H-Small include advanced tool calling, multi-session agentic workflows, and document summarization. It supports 12 languages, including English, German, Spanish, French, Japanese, and Chinese. The model was trained on a diverse dataset of 15 trillion tokens using a pipeline focused on security, governance, and transparency.
Released under the Apache 2.0 license, the Granite 4.0 family is the first open model collection to receive ISO 42001 certification, an international standard for Artificial Intelligence Management Systems. The model weights are cryptographically signed to ensure provenance and authenticity for enterprise users.