Llama 3.3 Nemotron Super 49B v1 (Reasoning) by NVIDIA: LLM Benchmarks, Rankings & Specs

Llama 3.3 Nemotron Super 49B v1 is a large language model developed by NVIDIA, derived from Meta's Llama 3.3 70B Instruct. As part of the Llama Nemotron Collection, it is optimized for advanced reasoning, conversational interactions, retrieval-augmented generation (RAG), and tool-calling tasks. The model employs a Neural Architecture Search (NAS) approach to compress the original 70B architecture into 49 billion parameters, enabling it to operate efficiently on a single high-performance GPU such as the NVIDIA H100 or H200.

The model's architecture utilizes non-standard and non-repetitive blocks, including skip attention layers and variable expansion ratios in its Feed-Forward Network (FFN) layers. This design was achieved through block-wise distillation, creating multiple variants that provide a specific tradeoff between computational complexity and accuracy. It supports a context window of 128,000 tokens.

Post-training involved a multi-phase process including supervised fine-tuning (SFT) for math, code, science, and tool-calling. This was followed by multiple reinforcement learning (RL) stages using REINFORCE and Online Reward-aware Preference Optimization (RPO). While the model is trained for general-purpose chat, its specialized reasoning mode is activated or deactivated via the system prompt.

Llama 3.3 Nemotron Super 49B v1 (Reasoning)

Explore AI Studio

Rankings & Comparison

Llama 3.3 Nemotron Super 49B v1 (Reasoning)

Explore AI Studio

Rankings & Comparison