Nanbeige4.1-3B: A Compact Powerhouse with Strong Reasoning and Agentic Capabilities

Discover Nanbeige4.1-3B, a highly competitive 3B parameter model from Nanbeige Lab that excels in reasoning, preference alignment, and complex agentic behaviors.

by HowAIWorks Team
NanbeigeLLMReasoning ModelsAgentic AIOpen Source AIModelScaleAI Benchmarks

Introduction

In the rapidly evolving landscape of Large Language Models (LLMs), the focus is often on increasing scale. However, the release of Nanbeige4.1-3B by Nanbeige Lab (南北阁实验室) challenges this trend. Built upon Nanbeige4-3B-Base, this enhanced iteration demonstrates that compact models can achieve robust reasoning, exceptional preference alignment, and effective agentic behaviors simultaneously.

Nanbeige4.1-3B is an optimized version achieved through extensive post-training, including supervised fine-tuning (SFT) and reinforcement learning (RL). It fills a significant gap in the small-model ecosystem, where models typically excel at either general reasoning or agentic tasks, but rarely both.

Key Features and Capabilities

Nanbeige4.1-3B stands out for several reasons:

  • Strong Reasoning: Capable of solving complex, multi-step problems with sustained coherence, it achieves impressive results on challenging benchmarks like LiveCodeBench-Pro and AIME 2026 I.
  • Robust Preference Alignment: It outperforms same-scale models and even substantially larger ones like Qwen3-32B on Arena-Hard-v2, showing superior understanding of human preferences.
  • Agentic Capability: As the first general small model to natively support deep-search tasks, it can reliably handle complex problem solving involving hundreds of tool invocations.

Performance Benchmarks

The model's performance across diverse benchmarks is remarkable for its size. In many cases, it not only leads its class but also rivals or exceeds the performance of much larger high-profile models.

General Reasoning Tasks

BenchmarkQwen3-4B-2507Qwen3-32BNanbeige4.1-3B
Live-Code-Bench-V657.455.776.9
AIME 2026 I81.4675.8387.40
GPQA65.868.483.8
Arena-Hard-v234.956.073.2
BFCL-V4 (Tool Use)44.8747.9056.50

Deep Search and Agentic Behavior

Nanbeige4.1-3B represents a qualitative leap in deep-search capability for small foundation models. On the xBench-DeepSearch-2505, it achieved a score of 75, significantly higher than its peers and even exceeding several large foundation models when equipped with tools.

Quickstart: How to Use Nanbeige4.1-3B

You can easily integrate Nanbeige4.1-3B into your projects using the Hugging Face transformers library. Here is a simple example for a chat scenario:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
  'Nanbeige/Nanbeige4.1-3B',
  use_fast=False,
  trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
  'Nanbeige/Nanbeige4.1-3B',
  torch_dtype='auto',
  device_map='auto',
  trust_remote_code=True
)

# Prepare messages
messages = [
  {'role': 'user', 'content': 'Which number is bigger, 9.11 or 9.8?'}
]

# Generate response
prompt = tokenizer.apply_chat_template(
  messages,
  add_generation_prompt=True,
  tokenize=False
)
input_ids = tokenizer(prompt, add_special_tokens=False, return_tensors='pt').input_ids
output_ids = model.generate(input_ids.to('cuda'), eos_token_id=166101)
resp = tokenizer.decode(output_ids[0][len(input_ids[0]):], skip_special_tokens=True)

print(resp)

Conclusion

Nanbeige4.1-3B is a testament to the power of optimization over scale. By demonstrating top-tier reasoning and agentic performance at the 3B parameter level, it opens up new possibilities for efficient, high-performance AI applications that can run on more accessible hardware. Whether you are building complex agents or need a reliable reasoning engine, Nanbeige4.1-3B is a compelling new choice in the open-source community.

Sources

Frequently Asked Questions

Nanbeige4.1-3B excels in three main areas: strong multi-step reasoning, robust preference alignment (outperforming significantly larger models), and advanced agentic capabilities for complex problem-solving.
Despite its 3B scale, Nanbeige4.1-3B outperforms models like Qwen3-32B in various benchmarks, including Arena-Hard-v2 and several coding/math tasks, demonstrating exceptional efficiency.
It is ideal for reasoning-intensive tasks, code generation, mathematical problem solving, and complex agentic scenarios requiring extensive tool use.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.