NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) by NVIDIA: LLM Benchmarks, Rankings & Specs

NVIDIA Nemotron 3 Nano 30B A3B is an open-weight large language model (LLM) designed for high-efficiency agentic AI applications and long-context processing. It utilizes a hybrid architecture that integrates Mamba-2 state-space model layers with traditional Transformer attention layers, structured within a Mixture-of-Experts (MoE) framework. This design allows the model to maintain the linear scaling benefits of Mamba for long sequences while utilizing attention mechanisms for precise information retrieval and reasoning.

The model consists of 31.6 billion total parameters, but only activates approximately 3.2 to 3.6 billion parameters per token during a forward pass. This sparse activation allows for significantly higher inference throughput compared to dense models of similar total size. Nemotron 3 Nano supports an extensive context window of up to 1 million tokens, making it suitable for processing massive datasets, codebases, and long-form documents.

Originally trained on 25 trillion tokens, the model is optimized for tool-calling, mathematics, and coding. The "Non-reasoning" variant refers to a configuration where intermediate reasoning traces (thinking steps) are disabled, providing direct answers to user queries for reduced token latency. It is released under the NVIDIA Open Model License, permitting commercial use and redistribution.

NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)

Explore AI Studio

Rankings & Comparison

NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)

Explore AI Studio

Rankings & Comparison