Llama-3.3-Nemotron-Super-49B-v1.5 is a large language model developed by NVIDIA, derived from Meta's Llama-3.3-70B-Instruct through a process of architectural optimization and specialized post-training. It is designed as a high-efficiency reasoning model that balances computational performance with high accuracy, specifically targeting agentic workflows, complex dialogue, and retrieval-augmented generation (RAG).
Architecture and Optimization
The model's architecture was refined using a distillation-driven Neural Architecture Search (NAS) approach referred to as "Puzzle." This methodology allowed NVIDIA to compress the parameter count from 70B to 49B while maintaining performance levels comparable to or exceeding the original reference model. Key structural modifications include the use of non-standard, non-repetitive blocks, skip-attention mechanisms in specific layers, and variable expansion ratios in the Feed-Forward Network (FFN) layers. These optimizations significantly reduce the memory footprint and increase throughput, enabling the model to fit on a single high-end GPU such as an H100 or H200.
Training and Capabilities
The model underwent a multi-phase post-training regimen that included Supervised Fine-Tuning (SFT) for domains such as math, coding, and science. Alignment and reasoning capabilities were further enhanced using Reinforcement Learning with Verifiable Rewards (RLVR) for multi-step reasoning, Reward-aware Preference Optimization (RPO) for chat alignment, and iterative Direct Preference Optimization (DPO) to improve tool-calling and parameter extraction.
Llama-3.3-Nemotron-Super-49B-v1.5 supports a context window of 131,072 tokens and features a dual-mode operational capability. By default, it operates in "Reasoning ON" mode, generating chain-of-thought explanations for complex queries; this can be toggled to "Reasoning OFF" via system prompt instructions for standard chat interactions. It supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.