AT&T data center with server racks, glowing blue lights, and network cables, symbolizing 90% AI cost reduction and 8 billion

Editorial illustration for AT&T cuts AI orchestration costs 90% after handling 8 B tokens daily

AI Agent Routing Slashes Enterprise Token Costs 90%

AT&T cuts AI orchestration costs 90% after handling 8 B tokens daily

February 25, 2026 • 2 min read

AT&T’s internal AI platform was swallowing roughly eight billion tokens each day, a volume that quickly exposed inefficiencies in the company’s orchestration layer. Faced with soaring compute bills, the telecom giant embarked on a systematic overhaul, hunting for a framework that could scale without draining resources. The effort wasn’t just about slashing spend; it required a modular approach that let engineers test new models, swap out services, and retire underperforming pieces without disrupting downstream applications.

To that end, AT&T instituted a “really rigorous” vetting process, pitting third‑party solutions against home‑grown tools. One standout was its Ask Data offering, built on a Relational Knowledge Graph that recently claimed the top spot on the Spider 2.0 text‑to‑SQL accuracy leaderboard. Other components, still under evaluation, have shown promising early results.

The outcome? A re‑engineered pipeline that now runs at a fraction of its former cost—up to a 90 percent reduction—while maintaining the flexibility needed for rapid experimentation. As the team puts it, “We need to be able to pilot, plug in and plug out different components.”

"We need to be able to pilot, plug in and plug out different components." They do "really rigorous" evaluations of available options as well as their own; for instance, their Ask Data with Relational Knowledge Graph has topped the Spider 2.0 text to SQL accuracy leaderboard, and other tools have scored highly on the BERT SQL benchmark. In the case of homegrown agentic tools, his team uses LangChain as a core framework, fine-tunes models with standard retrieval-augmented generation (RAG) and other in-house algorithms, and partners closely with Microsoft, using the tech giant's search functionality for their vector store. Ultimately, though, it's important not to just fuse agentic AI or other advanced tools into everything for the sake of it, Markus advised. "Sometimes I've seen a solution over engineered." Instead, builders should ask themselves whether a given tool actually needs to be agentic.

8 billion tokens a day forced AT&T to rethink AI orchestration — and cut costs by 90% - VentureBeat AI

Cutting orchestration spend by ninety percent is a striking figure, especially when it follows a daily flow of eight billion tokens. AT&T’s chief data officer Andy Markus says the breakthrough came from abandoning a monolithic model pipeline in favor of a LangChain‑based multi‑agent stack, where large “super agents” delegate work to smaller components. The new layer lets the team pilot, plug in and plug out modules as needed, a flexibility they stress is essential at scale.

Rigorous benchmarking backs the design: their Ask Data tool, built on a relational knowledge graph, currently leads the Spider 2.0 text‑to‑SQL leaderboard. Yet the article stops short of explaining how the cost model was calculated or whether the savings are sustainable as token volumes grow. Moreover, it is unclear if the same architecture would deliver comparable gains for firms with different data footprints or regulatory constraints.

Can this model scale beyond AT&T? The results suggest a viable path for large operators, but broader applicability remains an open question. As AT&T continues to refine its orchestration, observers will watch for evidence that the approach can maintain performance without sacrificing accuracy.

Common Questions Answered

How did AT&T reduce its AI orchestration costs by 90%?

AT&T transitioned from a monolithic model pipeline to a LangChain-based multi-agent stack that allows for modular component management. The new approach enables engineers to pilot, plug in, and remove different AI components easily, dramatically reducing computational overhead and increasing flexibility in their AI infrastructure.

What volume of tokens was AT&T processing daily before their AI orchestration overhaul?

AT&T was handling approximately eight billion tokens each day, which exposed significant inefficiencies in their original AI orchestration layer. This massive token volume drove the company to seek a more scalable and cost-effective framework for managing their AI computational resources.

What key strategy did AT&T's chief data officer Andy Markus implement to improve AI orchestration?

Andy Markus implemented a LangChain-based multi-agent architecture where large "super agents" can delegate work to smaller components. This approach provides unprecedented flexibility, allowing the team to rapidly test, swap, and retire different AI services without disrupting the entire system.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

AI Agent Routing Slashes Enterprise Token Costs 90%

Further Reading

Common Questions Answered

How did AT&T reduce its AI orchestration costs by 90%?

What volume of tokens was AT&T processing daily before their AI orchestration overhaul?

What key strategy did AT&T's chief data officer Andy Markus implement to improve AI orchestration?

Most Popular

Google Gemini 3.1 Pro doubles reasoning performance in benchmark

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Test Shows ‘-ai’ Trick Blocks Google AI Overviews Only on Desktop Browsers

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Peace Corps recruits volunteers to sell AI education tools to developing nations

MIT's PhysiOpt Merges GenAI with Shape Optimization for Custom Accessories

Common Questions Answered

How did AT&T reduce its AI orchestration costs by 90%?

What volume of tokens was AT&T processing daily before their AI orchestration overhaul?

What key strategy did AT&T's chief data officer Andy Markus implement to improve AI orchestration?

Most Popular

Google Gemini 3.1 Pro doubles reasoning performance in benchmark

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Test Shows ‘-ai’ Trick Blocks Google AI Overviews Only on Desktop Browsers

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes