Illustration for: Google, MIT study finds multi‑agent AI often loses context in sequential tasks
Research & Benchmarks

Google, MIT study finds multi‑agent AI often loses context in sequential tasks

3 min read

Google and MIT researchers have just released a paper that puts a spotlight on a subtle flaw in today’s push toward ever‑larger collections of AI agents. The study, presented at a recent AI conference, examined how groups of bots handle tasks that require a series of interdependent actions—think assembling a piece of furniture or navigating a multi‑step troubleshooting script. While the idea of splitting work across several specialized models sounds efficient, the experiments revealed a recurring hiccup: as each step reshapes the problem space, the hand‑off between agents often drops crucial details.

The authors measured performance drops across dozens of benchmark suites, noting that single‑agent configurations kept a steadier grip on the evolving requirements. Their findings suggest that more agents don’t automatically translate into better outcomes, especially when the workflow demands a tight, continuous thread of context. This raises a practical question for developers building complex pipelines: should they favor a unified model over a fragmented team?

The answer lies in the nuance captured by the researchers’ key observation.

Advertisement

Whenever each step in a task alters the state required for subsequent steps, multi-agent systems tend to struggle. This is because important context can get lost or fragmented as information is passed between agents. In contrast, a single agent maintains a seamless understanding of the evolving situation, ensuring that no critical details are missed or compressed during the process.

Three factors that tank multi-agent performance Tasks with many tools, like web search, file retrieval, or coding, suffer most from multi-agent overhead. The researchers say splitting the token budget leaves individual agents too little capacity for complex tool use. Once a single agent hits about 45 percent success rate, adding agents brings diminishing or negative returns.

Coordination costs eat up any gains, according to the researchers. Without information sharing, errors compound up to 17 times faster than with a single agent. A central coordinator helps; errors "only" increase by a factor of four, but the problem doesn't go away.

The 45 percent threshold The key rule of thumb: if a single agent solves more than 45 percent of a task correctly, multi-agent systems usually aren't worth it. Multiple agents only help when tasks divide cleanly. For tasks needing around 16 different tools, single agents or decentralized setups work best.

OpenAI did well with hybrid architectures, Anthropic with centralized ones. Google proved most consistent across all multi-agent setups. The researchers also built a framework that correctly predicts the best coordination strategy for 87 percent of new configurations, what they call "a quantitatively predictive principle of agentic scaling based on measurable task properties." Single agents use tokens more efficiently The researchers tracked tasks completed per token budget.

Single agents averaged 67 successful tasks per 1,000 tokens. Centralized multi-agent systems managed just 21; less than a third.

Related Topics: #Google #MIT #AI #multi-agent #single-agent #token budget #benchmark suites #web search

Does adding more agents guarantee progress? The study says it doesn't guarantee progress. Researchers at Google Research, DeepMind, and MIT examined a range of sequential tasks and found that performance swings dramatically when a team of specialized agents replaces a single, well‑trained model.

Whenever each step in a task alters the state required for subsequent steps, multi‑agent systems tend to struggle, because important context can get lost or fragmented as information passes between agents. In contrast, a single agent maintains a seamless understanding of the evolving situation, often delivering more reliable results. The authors point out that the advantage of multiple agents appears limited to scenarios where tasks are largely independent.

Yet the paper leaves open how architectural tweaks or communication protocols might mitigate the context‑loss problem. It remains unclear whether future designs can preserve the benefits of specialization without sacrificing continuity. For now, the findings temper the assumption that “more agents is all you need,” suggesting a more nuanced approach to system design.

Further Reading

Common Questions Answered

Why do multi‑agent AI systems lose context in sequential tasks according to the Google‑MIT study?

The study found that when each step of a task changes the required state, information must be passed between agents, which often leads to fragmented or lost context. This breakdown occurs because each specialized model only sees a portion of the overall state, unlike a single agent that retains a continuous understanding.

What types of tasks were used to evaluate the performance of multi‑agent versus single‑agent models?

Researchers tested tasks that involve interdependent actions such as assembling furniture and following multi‑step troubleshooting scripts. These scenarios require state changes at each step, highlighting how context loss impacts multi‑agent performance.

Did the study conclude that adding more specialized agents always improves AI performance?

No, the study explicitly states that adding more agents does not guarantee progress. Performance often swings dramatically, with single, well‑trained models sometimes outperforming a team of specialized agents due to better context retention.

Which institutions collaborated on the research that identified the context‑loss problem in multi‑agent AI?

The research was a joint effort by Google Research, DeepMind, and the Massachusetts Institute of Technology (MIT). Their combined expertise allowed a comprehensive examination of sequential tasks across various AI architectures.

Advertisement