NVIDIA Nemotron Nano 9B V2 (Reasoning) by NVIDIA: LLM Benchmarks, Rankings & Specs

NVIDIA Nemotron Nano 9B V2 (Reasoning) is a 9-billion parameter large language model designed for efficient inference and complex reasoning. It belongs to a family of models utilizing the Nemotron-H hybridization scheme, which integrates Mamba-2 state-space layers with a limited number of standard Transformer attention layers. This architecture is optimized to provide high throughput and low memory usage, particularly when processing long-context sequences.

The model is a unified system capable of both standard chat interactions and deep reasoning. When configured for reasoning tasks, it generates an internal reasoning trace—similar to a chain-of-thought—before providing a final answer. This behavior can be controlled via a system prompt or a unique thinking budget control mechanism, which allows developers to specify the number of tokens the model is allowed to use for its internal reasoning process.

Technically, Nemotron Nano 9B V2 was derived through the pruning and distillation of a larger 12-billion parameter parent model pre-trained on 20 trillion tokens. By employing the Minitron compression strategy, NVIDIA reduced the model's footprint while maintaining high performance on benchmarks such as AIME25 and MATH500. Its hybrid design enables up to 6x higher inference throughput in reasoning settings compared to traditional full-attention architectures of a similar size class.

With a context window of 128,000 tokens, the model is well-suited for long-document analysis and retrieval-augmented generation (RAG) tasks. It supports multiple languages, including English, German, Spanish, French, Italian, and Japanese, and is optimized for deployment on NVIDIA GPU-accelerated systems for both cloud and edge applications.

NVIDIA Nemotron Nano 9B V2 (Reasoning)

Explore AI Studio

Rankings & Comparison

NVIDIA Nemotron Nano 9B V2 (Reasoning)

Explore AI Studio

Rankings & Comparison