NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) by NVIDIA: LLM Benchmarks, Rankings & Specs

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) is a large language model optimized for agentic reasoning, long-context understanding, and high inference efficiency. Released in December 2025, it is part of NVIDIA's Nemotron 3 family of models designed to balance the reasoning depth of large models with the speed of lightweight architectures. The model is specifically fine-tuned for complex logical tasks, mathematical problem-solving, and code generation.

Architecture

The model utilizes a sophisticated Hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. It consists of 31.6 billion total parameters, but employs a sparse activation design that utilizes only approximately 3.6 billion active parameters per token. This hybrid design alternates between Mamba-2 layers, which scale linearly for efficient long-context processing, and standard Transformer attention layers with Grouped-Query Attention (GQA) for precise reasoning.

Key Capabilities

Agentic Reasoning: The model is designed for multi-step workflows, featuring a "Reasoning ON/OFF" mode that allows it to generate internal reasoning traces before providing a final answer. Users can configure a "thinking budget" to control the depth of reasoning versus inference cost.
Massive Context Window: It supports an exceptionally large context window of up to 1,000,000 tokens, making it suitable for long-document analysis, retrieval-augmented generation (RAG), and persistent memory in AI agent systems.
Training and Efficiency: Trained on a corpus of 25 trillion tokens, including high-quality synthetic data, the model achieves significant throughput advantages over dense models of similar size while maintaining competitive accuracy across reasoning and coding benchmarks.

Architecture

Key Capabilities

Agentic Reasoning: The model is designed for multi-step workflows, featuring a "Reasoning ON/OFF" mode that allows it to generate internal reasoning traces before providing a final answer. Users can configure a "thinking budget" to control the depth of reasoning versus inference cost.

Massive Context Window: It supports an exceptionally large context window of up to 1,000,000 tokens, making it suitable for long-document analysis, retrieval-augmented generation (RAG), and persistent memory in AI agent systems.

Training and Efficiency: Trained on a corpus of 25 trillion tokens, including high-quality synthetic data, the model achieves significant throughput advantages over dense models of similar size while maintaining competitive accuracy across reasoning and coding benchmarks.

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)

Architecture

Key Capabilities

Explore AI Studio

Rankings & Comparison

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)

Architecture

Key Capabilities

Explore AI Studio

Rankings & Comparison