NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) is a large language model optimized for agentic reasoning, long-context understanding, and high inference efficiency. Released in December 2025, it is part of NVIDIA's Nemotron 3 family of models designed to balance the reasoning depth of large models with the speed of lightweight architectures. The model is specifically fine-tuned for complex logical tasks, mathematical problem-solving, and code generation.
Architecture
The model utilizes a sophisticated Hybrid Mamba-Transformer Mixture-of-Experts (MoE) architecture. It consists of 31.6 billion total parameters, but employs a sparse activation design that utilizes only approximately 3.6 billion active parameters per token. This hybrid design alternates between Mamba-2 layers, which scale linearly for efficient long-context processing, and standard Transformer attention layers with Grouped-Query Attention (GQA) for precise reasoning.
Key Capabilities
- Agentic Reasoning: The model is designed for multi-step workflows, featuring a "Reasoning ON/OFF" mode that allows it to generate internal reasoning traces before providing a final answer. Users can configure a "thinking budget" to control the depth of reasoning versus inference cost.
- Massive Context Window: It supports an exceptionally large context window of up to 1,000,000 tokens, making it suitable for long-document analysis, retrieval-augmented generation (RAG), and persistent memory in AI agent systems.
- Training and Efficiency: Trained on a corpus of 25 trillion tokens, including high-quality synthetic data, the model achieves significant throughput advantages over dense models of similar size while maintaining competitive accuracy across reasoning and coding benchmarks.