NVIDIA logo
NVIDIA
Open Weights

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

Released Apr 2025

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) is a 253-billion parameter large language model developed by NVIDIA, designed for high-accuracy reasoning, scientific problem-solving, and complex instruction following. It is a derivative of Meta's Llama-3.1-405B-Instruct, customized through Neural Architecture Search (NAS) and vertical compression to optimize the tradeoff between model accuracy and inference efficiency.

The model architecture utilizes a dense decoder-only Transformer structure with non-standard, non-repetitive blocks. Key architectural innovations include skip attention, where attention modules in certain layers are replaced with linear layers, and variable Feedforward Network (FFN) ratios. These optimizations allow the 253B model to fit on a single 8xH100 node for inference while maintaining performance levels competitive with frontier-scale models.

Post-training involved a multi-phase process including Supervised Fine-Tuning (SFT) for math and coding, followed by Reinforcement Learning (RL) stages using Group Relative Policy Optimization (GRPO). This alignment focuses specifically on enhancing reasoning depth, tool calling, and retrieval-augmented generation (RAG) tasks. The model supports a specialized "reasoning mode" that can be activated via system prompts to trigger step-by-step thinking for complex queries.

Rankings & Comparison