Nvidia engineers pose beside a server rack, pointing at a monitor showing Nemotron 3’s Mamba hybrid architecture.

Editorial illustration for Nvidia's Nemotron 3 Debuts with Mamba Hybrid, Activating Just 3B of 31.6B Parameters

Nvidia Nemotron 3: Mamba Hybrid LLM Breakthrough

Nvidia's Nemotron 3 uses Mamba hybrid, 31.6B params, 3B active per step

December 17, 2025 • Updated: January 19, 2026 • 2 min read

Nvidia is shaking up the AI model landscape with its latest open-source release, Nemotron 3. The chip giant's new neural network takes an unconventional approach to computational efficiency, challenging traditional large language model architectures.

By deploying a hybrid design that dynamically activates just a fraction of its total parameters, Nemotron 3 hints at a potential breakthrough in AI processing speed. The model's unique architecture suggests researchers are seeking smarter ways to reduce computational overhead without sacrificing performance.

At 31.6 billion total parameters but only 3 billion active per processing step, Nemotron 3 represents a strategic rethinking of how AI systems allocate computational resources. This approach could signal a significant shift in how complex neural networks handle information processing.

Early benchmarks indicate the model's new design isn't just theoretical. Nemotron 3 appears to deliver competitive accuracy while maintaining impressive token throughput, potentially offering a more simplified alternative to current large language models.

Hybrid architecture boosts efficiency The Nano model has 31.6 billion total parameters, but only 3 billion are active per processing step. On the Artificial Analysis Index benchmark, the open-source model rivals gpt-oss-20B and Qwen3-30B in accuracy but delivers significantly higher token throughput. However, according to Artificial Analysis, it requires 160 million tokens for a test run - far more than runner-up Qwen3-VL at 110 million.

Nvidia introduces two architectural changes for the larger Super and Ultra models. The first, LatentMoE, addresses the memory bandwidth cost of routing tokens directly to expert networks in standard MoE models. The new method projects tokens into a compressed, latent representation before processing.

Nvidia says this drastically increases expert count and active experts per token without slowing inference. The larger models also use multi-token prediction (MTP), where models predict several future tokens simultaneously during training rather than just the next one.

Nvidia's Nemotron 3 swaps pure Transformers for a Mamba hybrid to run AI agents efficiently - THE DECODER

Nvidia's Nemotron 3 signals a fascinating shift in AI model design. The hybrid Mamba architecture allows for massive potential with strategic parameter activation.

Its 31.6 billion total parameters are impressive, but the real idea lies in dynamically engaging just 3 billion per processing step. This approach could dramatically improve computational efficiency.

Performance metrics look promising. The model matches accuracy of competitors like gpt-oss-20B and Qwen3-30B while delivering superior token throughput.

Still, the benchmarking reveals complexity. The Artificial Analysis Index test requires a hefty 160 million tokens - substantially more than alternatives like Qwen3-VL. This suggests the model demands significant computational resources to demonstrate its full capabilities.

Nvidia's architectural changes hint at a nuanced approach to AI model development. By selectively activating parameters, they're exploring ways to balance performance and computational efficiency.

The open-source release invites further scrutiny and potential refinement from the broader AI research community. Researchers will likely be eager to stress test this new hybrid design.

Common Questions Answered

How does Nvidia's Nemotron 3 achieve computational efficiency with its hybrid architecture?

Nemotron 3 uses a unique approach where only 3 billion parameters are actively processed out of its total 31.6 billion parameters per step. This dynamic parameter activation allows the model to maintain high performance while significantly reducing computational overhead.

How does Nemotron 3 compare to other open-source AI models in terms of performance?

On the Artificial Analysis Index benchmark, Nemotron 3 rivals models like gpt-oss-20B and Qwen3-30B in accuracy while delivering higher token throughput. However, it requires 160 million tokens for a test run, which is more than some competitor models.

What makes the Mamba hybrid architecture in Nemotron 3 significant for AI model design?

The Mamba hybrid architecture allows for strategic parameter activation, enabling the model to engage only a fraction of its total parameters during processing. This approach potentially represents a breakthrough in improving computational efficiency and processing speed for large language models.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

Nvidia Nemotron 3: Mamba Hybrid LLM Breakthrough

Further Reading

Common Questions Answered

How does Nvidia's Nemotron 3 achieve computational efficiency with its hybrid architecture?

How does Nemotron 3 compare to other open-source AI models in terms of performance?

What makes the Mamba hybrid architecture in Nemotron 3 significant for AI model design?

Most Popular

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Musk’s Grok still offers free image-editing tools that can undress men

OpenClaw launches ‘Moltbook’ social network for its AI agents

AI‑skilled freshers with workflow automation earn 35‑40% more, up to Rs 22 LPA

Enterprises Misjudge RAG Metrics as Freshness Failures Stem from Source Changes

Firefox adds toggle to disable AI features, matching Edge and Chrome

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

AI aids cross‑breeding to curb decline and genetic loss in endangered species

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Nvidia's NVentures: 21 Deals in 2023 Fuel AI Ecosystem Expansion

NVIDIA Blackwell Wins All MLPerf Training v5.1 Benchmarks with FP4 Accuracy

Senators question AI toys that suggest knives; Mattel drops OpenAI toy for 2025

UC San Diego Lab Uses NVIDIA DGX B200 to Pursue Low-Latency LLM Serving

OpenUSD and NVIDIA Halos Enhance Robotaxi Safety with Synthetic Data, SimReady

NVIDIA appoints CloudThat as first Indian education services partner

Common Questions Answered

How does Nvidia's Nemotron 3 achieve computational efficiency with its hybrid architecture?

How does Nemotron 3 compare to other open-source AI models in terms of performance?

What makes the Mamba hybrid architecture in Nemotron 3 significant for AI model design?

Most Popular

Gemini helps create 7‑day low‑cost meal plan for USD 200 grocery budget

Shared memory adds documented actions for transparent AI orchestration

AI agents launch dedicated social network as GitLab showcases roadmap

Musk’s Grok still offers free image-editing tools that can undress men

OpenClaw launches ‘Moltbook’ social network for its AI agents

AI‑skilled freshers with workflow automation earn 35‑40% more, up to Rs 22 LPA

Enterprises Misjudge RAG Metrics as Freshness Failures Stem from Source Changes

Firefox adds toggle to disable AI features, matching Edge and Chrome

Musk merges SpaceX with xAI and X, cites new AI‑compute satellite plan

AI aids cross‑breeding to curb decline and genetic loss in endangered species