Granite 4.0 H 350M is a lightweight "nano" language model developed by IBM, specifically designed for on-device and edge AI applications. As part of the Granite 4.0 family, it is optimized for resource-constrained environments such as mobile devices, PCs, and enterprise edge infrastructure, prioritizing efficiency and low latency for high-volume tasks.
The model features a hybrid architecture that integrates Mamba-2 State Space Models (SSM) with traditional Transformer blocks, typically utilizing a 9:1 ratio of Mamba to attention layers. This design allows the model to achieve significant reductions in memory requirements and faster inference speeds compared to standard transformer-only models of a similar scale. While larger models in the Granite 4.0 family use Mixture-of-Experts (MoE), the 350M H variant is a dense hybrid model.
Despite its compact size, the model is instruction-tuned for robust performance in tool calling, function execution, and retrieval-augmented generation (RAG). It supports a context window of approximately 32,000 tokens and provides multilingual capabilities across 12 languages, including English, German, French, Spanish, Japanese, and Chinese. It is also capable of Fill-In-the-Middle (FIM) tasks for code-related applications.
Released under the Apache 2.0 license, Granite 4.0 H 350M is part of the first open model family to receive ISO/IEC 42001:2023 accreditation. This certification highlights IBM's adherence to international standards for responsible AI development, safety, and data governance in enterprise environments.