NVIDIA logo
NVIDIA
Open Weights

NVIDIA Nemotron 3 Super 120B A12B (Reasoning)

Released Mar 2026

NVIDIA Nemotron 3 Super 120B A12B is an open-weight large language model (LLM) designed for agentic reasoning, complex multi-step planning, and high-throughput conversational AI. Released in March 2026, the model features a 120-billion-parameter total capacity but utilizes a sparse Mixture-of-Experts (MoE) architecture that activates only 12 billion parameters per forward pass. This design is specifically optimized for multi-agent systems where high inference speed and long-term memory are critical to maintaining alignment during extended tasks.

Architecture and Innovation

The model utilizes a hybrid Mamba-Transformer architecture, interleaving Mamba-2 layers for efficient linear-time sequence processing with Attention layers that serve as global anchors for precise retrieval. A key innovation is Latent MoE, a technique that projects tokens into a compressed latent space before routing them to experts; this allows the model to call upon four times as many expert specialists as traditional MoE designs for the same computational cost. Additionally, the model incorporates Multi-Token Prediction (MTP), which enables native speculative decoding to generate multiple tokens per forward pass, significantly increasing throughput for long-context generation.

Capabilities and Context

Nemotron 3 Super is built to address the "context explosion" in agentic workflows, featuring a native 1-million-token context window. This allows agents to process massive document sets and retain long-term memory of previous tool outputs without goal drift. It was pre-trained on a massive corpus of 25 trillion tokens—including a specialized subset of 10 billion tokens for reasoning and 15 million coding problems—and post-trained using multi-environment reinforcement learning (RL) across various tool-use and planning scenarios.

Implementation and Prompting

The reasoning capabilities of the model are accessible through its chat template, which typically involves the generation of a reasoning trace before providing a final response. NVIDIA has released the model weights along with the full pre-training and post-training datasets, evaluation recipes, and training environments, promoting a high degree of transparency and reproducibility for developers building specialized AI agents.

Rankings & Comparison