Baidu’s Ernie 5.1 AI model illustration showcasing a 94% reduction in pre-training costs using the Once-For-All framework, em

Editorial illustration for Baidu's Ernie 5.1 Cuts 94% Pre‑Training Costs Using Once‑For‑All Framework

Baidu's Ernie 5.1 Cuts 94% Pre‑Training Costs Using...

Baidu's Ernie 5.1 Cuts 94% Pre‑Training Costs Using Once‑For‑All Framework

By AI Daily Post Edited by Brian Petersen, Editor-in-Chief

May 11, 2026 • Updated: May 13, 2026 • 2 min read

Baidu has rolled out Ernie 5.1, a distilled version of its earlier Ernie 5.0. While the new model runs on roughly a third of the total parameters and uses about half the active parameters per query, Baidu says pre‑training costs dropped to just six percent of what comparable models demand—a 94 percent reduction. The architecture follows a four‑stage training pipeline, inserting specialist expert models for code, logic and agent tasks so that distinct capabilities don’t clash during learning.

As of May 9, Ernie 5.1 logged 1,223 points on the Arena Search Leaderboard, placing it fourth worldwide and first among Chinese offerings. Baidu claims the model outperforms DeepSeek‑V4‑Pro on autonomous‑agent benchmarks like tau3‑bench and SpreadsheetBench‑Verified, and comes close to Google’s Gemini 3.1 Pro on knowledge‑reasoning tests such as GPQA and MMLU‑Pro. On a tough math set (AIME26) with tool access, it trails only Gemini 3.1 Pro.

The model is available through Baidu’s platforms and powers several creative applications, though its weights remain closed, preventing independent verification of the reported results.

In additional benchmarks, Baidu claims Ernie 5.1 beats DeepSeek-V4-Pro on autonomous AI agent tasks (tau3-bench, SpreadsheetBench-Verified) and comes close to Google's Gemini 3.1 Pro on knowledge and reasoning benchmarks (GPQA, MMLU-Pro).

— Baidu, Baidu's Ernie 5.1 cuts 94 percent of pre-training costs while competing with top models - THE DECODER

Why this matters

We see Baidu’s Ernie 5.1 shaving 94 percent off pre‑training expenses, a figure that catches any team watching compute bills. By distilling Ernie 5.0 through its “Once‑For‑All elastic training framework,” the company sidesteps the need for separate, costly runs for each model size. The four‑stage pipeline—splitting code, logic and agent work into specialist experts—promises cleaner skill development without internal conflict.

For developers, that could mean faster iteration cycles and lower entry barriers to large‑scale language models. Founders may view the cost cut as a lever for tighter budgets, especially when the model already leads Chinese AI benchmarks. Researchers, however, should ask whether the efficiency gains translate into comparable performance on diverse tasks beyond those benchmarks.

It remains unclear if the framework scales to other architectures or datasets outside Baidu’s ecosystem. Until broader validation emerges, we can acknowledge the engineering advance while keeping a measured eye on its real‑world applicability.