NVIDIA logo
NVIDIA
Open Weights

NVIDIA Nemotron Nano 12B v2 VL (Reasoning)

Released Oct 2025

Intelligence
#290
Coding
#263
Math
#79
Context128K
Parameters12.6B

NVIDIA Nemotron Nano 12B v2 VL (Reasoning) is a multimodal language model optimized for high-performance visual reasoning and document intelligence. Built upon a hybrid Mamba-Transformer architecture, the model integrates Mamba-2 layers with standard attention mechanisms to achieve significant gains in inference throughput and efficiency. It is designed as a unified model capable of processing text, image, and video inputs, supporting multi-image analysis and complex video understanding.

The model's core capability is its integrated reasoning engine, which allows it to solve logical, spatial, and mathematical problems through internal Chain-of-Thought (CoT) processing. This reasoning behavior can be controlled via system prompts, enabling the model to generate detailed reasoning traces before arriving at a final response. This approach is particularly effective for structured data extraction, chart interpretation, and technical document analysis where multi-step logic is required.

Featuring a 12.6 billion parameter count and a vision encoder based on CRadioV2-H, the model supports an extensive context window of 128,000 tokens. This large context allows for the ingestion of long videos (sampled at 2 FPS) and high-resolution document images (up to 12-tile layouts), making it suitable for enterprise-grade tasks such as invoice processing, manual summarization, and long-form visual Q&A.

Rankings & Comparison