Illustration for: Nvidia launches Nemotron 3; early adopters include Accenture, Oracle, Zoom
LLMs & Generative AI

Nvidia launches Nemotron 3; early adopters include Accenture, Oracle, Zoom

2 min read

Nvidia’s latest LLM push arrives under the banner “Nemotron 3,” a model family built on the company’s hybrid Mamba‑Transformer mixture‑of‑experts design. The architecture promises to squeeze more compute out of each chip, a claim that matters when enterprises weigh the cost of scaling AI workloads. While the tech is impressive on paper, the real test is whether customers will move beyond pilot projects and embed the models into production pipelines.

That’s why the roster of firms signing on early catches the eye: a mix of consulting powerhouses, cybersecurity specialists, cloud providers and enterprise software vendors. Their participation hints at a broader appetite for a platform that can support “agentic AI” without ballooning hardware bills. It also suggests Nvidia is positioning Nemotron 3 as a more efficient alternative to the massive transformer stacks that dominate today’s market.

The next line spells out exactly who’s already on board, and what that could mean for the next wave of AI deployments.

Advertisement

Nvidia said early adopters of the Nemotron 3 models include Accenture, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens and Zoom. Breakthrough architectures Nvidia has been using the hybrid Mamba-Transformer mixture-of-experts architecture for many of its models, including Nemotron-Nano-9B-v2. The architecture is based on research from Carnegie Mellon University and Princeton, which weaves in selective state-space models to handle long pieces of information while maintaining states.

It can reduce compute costs even through long contexts. Nvidia noted its design "achieves up to 4x higher token throughput" compared to Nemotron 2 Nano and can significantly lower inference costs by reducing reasoning token generation by up 60%.

Related Topics: #Nvidia #Nemotron 3 #Mamba-Transformer #Accenture #Oracle Cloud Infrastructure #Mixture-of-Experts #Agentic AI #Token throughput

Will Nemotron 3 live up to Nvidia’s claims? The company says its hybrid Mamba‑Transformer mixture‑of‑experts design offers higher accuracy and reliability for agentic AI, yet independent benchmarks are not yet public. Three variants—Nano at 30 billion parameters, Super at 100 billion, and Ultra with an expanded reasoning engine—target different workloads, from narrowly focused tasks to multi‑agent reasoning.

Early adopters listed include Accenture, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens and Zoom, suggesting broad industry interest. However, the extent to which these customers have deployed the models in production remains unclear. Nvidia’s emphasis on efficiency hints at potential cost advantages, but without disclosed performance metrics, the actual savings are uncertain.

The architecture’s hybrid nature may address some scaling challenges, yet the trade‑offs between model size, speed and accuracy have yet to be quantified. As the models become more accessible, further data will be needed to assess whether the promised improvements translate into tangible benefits for end users.

Further Reading

Common Questions Answered

What is the hybrid Mamba‑Transformer mixture‑of‑experts design used in Nemotron 3?

The hybrid Mamba‑Transformer mixture‑of‑experts design combines a Mamba state‑space model with a traditional Transformer, allowing selective activation of expert sub‑networks. This architecture, based on research from Carnegie Mellon University and Princeton, aims to extract more compute from each chip while improving accuracy for agentic AI tasks.

Which companies are listed as early adopters of the Nemotron 3 models?

Nvidia announced that early adopters include Accenture, CrowdStrike, Cursor, Deloitte, EY, Oracle Cloud Infrastructure, Palantir, Perplexity, ServiceNow, Siemens, and Zoom. These firms are evaluating the models for integration into their production AI pipelines.

How do the three Nemotron 3 variants differ in size and intended workloads?

Nemotron 3 offers three variants: Nano with 30 billion parameters for narrowly focused tasks, Super with 100 billion parameters for broader applications, and Ultra which adds an expanded reasoning engine for multi‑agent reasoning workloads. Each variant targets different performance and scalability needs across enterprise AI use cases.

What evidence does Nvidia provide to support claims of higher accuracy and reliability for Nemotron 3?

Nvidia states that the hybrid Mamba‑Transformer mixture‑of‑experts design delivers higher accuracy and reliability for agentic AI, but independent benchmark results have not yet been released. The claim rests on internal testing and the architectural advantages of selective state‑space modeling.

Advertisement