Nemotron 3 Super AI model, with 40 million samples, shown as a complex neural network graphic.

Editorial illustration for Nemotron 3 Super incorporates 40 million supervised and alignment samples

Nemotron 3 Super: AI Model Breakthrough in Reasoning

Nemotron 3 Super incorporates 40 million supervised and alignment samples

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

March 13, 2026 • 2 min read

Why does the data matter for a model billed as an “open hybrid Mamba‑Transformer MoE for agentic reasoning”? Nemotron 3 Super arrives at a time when developers are hunting for LLMs that can handle more than just chatty replies—think code generation, safety checks, and multi‑step planning. While the architecture blends a mixture‑of‑experts design with the Mamba sequence model, the real differentiator lies in what the model sees after the base training phase.

The team behind the release has amassed a sizable collection of supervised and alignment examples, spanning everything from plain‑language instruction following to complex, multi‑turn agent tasks. In practice, that means the model isn’t just fine‑tuned on a handful of prompts; it’s exposed to a breadth of scenarios that mirror real‑world usage. The upcoming details will reveal just how many samples were added, how they were split across fine‑tuning, preference learning, and reinforcement‑learning pipelines, and what proportion fed directly into supervised fine‑tuning.

Here’s the data that underpins the claim.

- Post-training datasets: 40 million new supervised and alignment samples, covering reasoning, instruction following, coding, safety, and multi-step agent tasks across supervised fine-tuning, preference data, and RL trajectories (about 7 million used directly for SFT) - RL tasks and environments: Interactive RL across 21 environment configurations and 37 datasets (~10 of which are being released) including software engineer-style agent training and tool-augmented search/planning tasks--moving beyond static text into dynamic, verifiable execution workflows and generating ~1.2 million environment rollouts during training. Open training and evaluation infrastructure NVIDIA publishes development techniques and tools, giving researchers and enterprises the flexibility to customize Nemotron 3 Super or build their own reasoning models.

Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning - NVIDIA Developer Blog

Nemotron 3 Super arrives with 40 million new supervised and alignment samples, a mix that spans reasoning, instruction following, coding, safety and multi‑step agent tasks. Its hybrid Mamba‑Transformer mixture‑of‑experts architecture is pitched as a way to give agentic systems the depth needed for dense technical problems while staying efficient enough for continuous operation. Multi‑agent deployments, which can emit up to fifteen times the token volume of ordinary chats, often suffer from “context explosion” and the resulting goal drift that nudges agents away from their original aims.

By feeding the model a large volume of preference data and reinforcement‑learning trajectories—about seven million directly used for supervised fine‑tuning, the developers hope to curb that drift. Yet the article does not detail how the RL environments are structured, nor does it present benchmarks that isolate the impact of the new data versus the MoE design. Consequently, it is unclear whether the added samples will translate into measurable improvements in long‑context alignment or coding performance.

The approach is methodical, but its practical benefits remain to be demonstrated. Results are pending.

Common Questions Answered

How many supervised and alignment samples were used in Nemotron 3 Super's training?

Nemotron 3 Super incorporates 40 million supervised and alignment samples across various domains including reasoning, instruction following, coding, safety, and multi-step agent tasks. Approximately 7 million of these samples were directly used for supervised fine-tuning (SFT).

What makes Nemotron 3 Super's architecture unique in the AI model landscape?

Nemotron 3 Super features a hybrid Mamba-Transformer mixture-of-experts (MoE) architecture designed for agentic reasoning and efficient technical problem solving. This innovative design allows the model to handle complex multi-step tasks while maintaining operational efficiency across different computational environments.

What types of interactive environments were used in Nemotron 3 Super's reinforcement learning training?

The model was trained across 21 different environment configurations and 37 datasets, with approximately 10 of these datasets being publicly released. These environments included software engineer-style agent training and tool-augmented search and planning scenarios, demonstrating the model's versatility in complex reasoning tasks.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

Nemotron 3 Super: AI Model Breakthrough in Reasoning

Further Reading

Common Questions Answered

How many supervised and alignment samples were used in Nemotron 3 Super's training?

What makes Nemotron 3 Super's architecture unique in the AI model landscape?

What types of interactive environments were used in Nemotron 3 Super's reinforcement learning training?

Latest News

Low Kruskal-Rank Adaptation Shows Matrix Rank Stays r, Kruskal Rank Falls to 1

Dario Amodei has one direct report; sister Daniela runs Anthropic's exec team

GPU utilization masks storage and I/O bottlenecks slowing modern AI

LSEG integrates trusted data into ChatGPT workflows, says Max Grigoryev

Anthropic apologizes for invisible guardrails on Claude Fable, first Mythos model

Hermes Agent Builder Unites Identity, Model, Skills, Servers in One Dashboard

Anthropic offers Washington AI playbook, warns of Claude Mythos hacking risk

xAI sues after firing who warned of Grok safety; he led Scale AI safety work

SciConBench launches with 9.11K questions to test AI scientific synthesis

AI pre‑mediation matched professional mediators in multi‑issue negotiation test

Further Reading

Related Reading

LWiAI Podcast #228: OpenAI unveils GPT-5.2, Runway rolls out first world model

OpenAI's Codex powers Lovable AI, letting millions create apps from text

Google releases FunctionGemma, a tiny model for natural-language mobile control

TensorRT Edge‑LLM Enables Efficient Chain‑of‑Thought Processing for Physical AI

OpenAI launches GPT‑5.4 Pro and Thinking; Gemini 3.1 Flash Lite arrives

Common Questions Answered

How many supervised and alignment samples were used in Nemotron 3 Super's training?

What makes Nemotron 3 Super's architecture unique in the AI model landscape?

What types of interactive environments were used in Nemotron 3 Super's reinforcement learning training?

Latest News

Low Kruskal-Rank Adaptation Shows Matrix Rank Stays r, Kruskal Rank Falls to 1

Dario Amodei has one direct report; sister Daniela runs Anthropic's exec team

GPU utilization masks storage and I/O bottlenecks slowing modern AI

LSEG integrates trusted data into ChatGPT workflows, says Max Grigoryev

Anthropic apologizes for invisible guardrails on Claude Fable, first Mythos model

Hermes Agent Builder Unites Identity, Model, Skills, Servers in One Dashboard

Anthropic offers Washington AI playbook, warns of Claude Mythos hacking risk

xAI sues after firing who warned of Grok safety; he led Scale AI safety work

SciConBench launches with 9.11K questions to test AI scientific synthesis

AI pre‑mediation matched professional mediators in multi‑issue negotiation test