Nanbeige logo
Nanbeige
Open Weights

Nanbeige4.1-3B

Released Feb 2026

Intelligence
#263
Coding
#302
Context256K
Parameters3B

Nanbeige4.1-3B is a unified generalist language model with 3 billion parameters, developed by Nanbeige LLM Lab (the AI research arm of BOSS Zhipin). Released in February 2026, the model represents an enhanced iteration of the Nanbeige4 series, specifically optimized to bridge the gap between small-scale efficiency and large-scale reasoning capability. It is designed as a highly versatile small language model (SLM) capable of handling complex reasoning, coding, and autonomous agent tasks within a single compact framework.

The model is built upon the Nanbeige4-3B-Base and utilizes a sophisticated post-training pipeline consisting of large-scale supervised fine-tuning (SFT) and multi-stage reinforcement learning (RL). The training recipe incorporates both point-wise and pair-wise reward modeling to ensure high-quality, human-aligned responses while mitigating common SLM issues such as repetition and redundant thinking. For code generation specifically, Nanbeige utilizes complexity-aware rewards to optimize for both functional correctness and computational efficiency.

A primary feature of Nanbeige4.1-3B is its native agentic capability, which enables stable long-horizon tool interactions. The model can reliably execute up to 600 tool-call turns for complex deep-search tasks, a capability typically reserved for models with significantly higher parameter counts. This is supported by an extensive context window of 256,000 tokens, allowing the model to maintain coherence over extremely long reasoning chains and multi-turn interactions.

In empirical evaluations, Nanbeige4.1-3B has demonstrated performance that frequently surpasses same-scale models like Qwen3-4B and rivals much larger architectures, such as Qwen3-32B, on reasoning-heavy benchmarks. It achieves competitive results on challenging tasks including LiveCodeBench-Pro, IMO-Answer-Bench, and AIME 2026 I. Additionally, it shows robust preference alignment, scoring highly on human-centric benchmarks like Arena-Hard-v2.

Rankings & Comparison