Jamba 1.6 Large is a hybrid large language model developed by AI21 Labs that combines State Space Model (SSM) and Transformer architectures. Built on the Mamba framework within a Mixture-of-Experts (MoE) structure, it is designed to offer high efficiency and performance for enterprise-grade applications. The model features a 256K context window and provides significantly faster inference speeds—up to 2.5 times faster—compared to traditional Transformer-only models of similar size.
Architecture and Capabilities
The model utilizes a total of 398 billion parameters, with 94 billion active during inference. This hybrid architecture leverages the long-context efficiency of SSMs alongside the high-quality reasoning and attention capabilities of Transformers. Jamba 1.6 Large is optimized for long-form text processing, citation-grounded question answering (RAG), and complex document synthesis. It includes native support for structured outputs (JSON mode), function calling, and tool use.
Performance and Multilingual Support
Jamba 1.6 Large is designed for global enterprise use, supporting languages including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew. In performance evaluations, it demonstrated strong results on benchmarks such as Arena-Hard, CRAG, and FinanceBench, specifically excelling in tasks requiring the maintenance of coherence across its expansive 256K token context. To facilitate deployment on standard hardware, it utilizes a specialized ExpertsInt8 quantization technique, allowing the model to be run more efficiently on multi-GPU setups.