Google and MIT researchers stand beside a large screen displaying tangled AI agent diagrams in a modern lab.

Editorial illustration for Google and MIT Reveal AI's Context Problem in Complex Sequential Tasks

AI Struggles with Complex Tasks, Google-MIT Study Reveals

Google, MIT study finds multi-agent AI often loses context in sequential tasks

December 13, 2025 • Updated: January 13, 2026 • 3 min read

AI's remarkable problem-solving capabilities hit a surprising roadblock in complex, multi-step scenarios. A joint research effort by Google and MIT has uncovered a critical weakness in how artificial intelligence systems handle sequential tasks that require nuanced contextual understanding.

The study zeroes in on a fundamental challenge: how AI agents communicate and maintain critical information across multiple stages of a complex workflow. While individual AI systems might excel at isolated tasks, their performance dramatically shifts when those tasks become interconnected and dynamically evolving.

Researchers discovered that as tasks become more intricate, multi-agent systems struggle to preserve needed context. The problem isn't just technical, it reveals deep limitations in current AI architectures' ability to track and transfer contextual knowledge.

This research exposes a significant gap between AI's computational power and its capacity for holistic, adaptive reasoning. The findings suggest that smooth task completion requires more than raw computational speed.

Whenever each step in a task alters the state required for subsequent steps, multi-agent systems tend to struggle. This is because important context can get lost or fragmented as information is passed between agents. In contrast, a single agent maintains a seamless understanding of the evolving situation, ensuring that no critical details are missed or compressed during the process.

Three factors that tank multi-agent performance Tasks with many tools, like web search, file retrieval, or coding, suffer most from multi-agent overhead. The researchers say splitting the token budget leaves individual agents too little capacity for complex tool use. Once a single agent hits about 45 percent success rate, adding agents brings diminishing or negative returns.

Coordination costs eat up any gains, according to the researchers. Without information sharing, errors compound up to 17 times faster than with a single agent. A central coordinator helps; errors "only" increase by a factor of four, but the problem doesn't go away.

The 45 percent threshold The key rule of thumb: if a single agent solves more than 45 percent of a task correctly, multi-agent systems usually aren't worth it. Multiple agents only help when tasks divide cleanly. For tasks needing around 16 different tools, single agents or decentralized setups work best.

OpenAI did well with hybrid architectures, Anthropic with centralized ones. Google proved most consistent across all multi-agent setups. The researchers also built a framework that correctly predicts the best coordination strategy for 87 percent of new configurations, what they call "a quantitatively predictive principle of agentic scaling based on measurable task properties." Single agents use tokens more efficiently The researchers tracked tasks completed per token budget.

Single agents averaged 67 successful tasks per 1,000 tokens. Centralized multi-agent systems managed just 21; less than a third.

More AI agents isn't always better, new Google and MIT study finds - THE DECODER

The study from Google and MIT reveals a critical vulnerability in multi-agent AI systems that could significantly impact complex sequential tasks. Researchers found that when tasks require evolving context and multiple steps, these systems frequently lose important information during handoffs between agents.

Web search, file retrieval, and coding environments appear particularly challenging for multi-agent approaches. The fundamental problem lies in context fragmentation - each agent might interpret or compress information differently, leading to potential misunderstandings or critical detail loss.

Single-agent systems emerge as more reliable in these scenarios. They maintain a continuous, unbroken understanding of the task's progression, ensuring no contextual nuances slip through the computational cracks.

This research highlights a subtle but important limitation in current AI architectures. While multi-agent systems promise distributed intelligence, they struggle with tasks demanding intricate, dynamic context tracking.

The findings suggest we're still learning how AI systems preserve and transfer complex contextual information. For now, simplicity might trump complexity in certain computational environments.

Common Questions Answered

Why do multi-agent AI systems struggle with complex sequential tasks?

Multi-agent AI systems have difficulty maintaining critical context when information is passed between different agents during complex workflows. This context fragmentation occurs because each agent may interpret or compress information differently, leading to potential loss of nuanced understanding across multiple task steps.

Which types of tasks are most challenging for multi-agent AI systems?

Web search, file retrieval, and coding environments are particularly problematic for multi-agent approaches due to their complex, evolving contextual requirements. These tasks demand seamless information transfer and maintenance of critical details across multiple stages, which current multi-agent systems struggle to accomplish effectively.

How do single-agent AI systems differ from multi-agent systems in handling sequential tasks?

Single-agent AI systems maintain a more seamless understanding of evolving situations, ensuring that no critical details are missed or compressed during task progression. In contrast, multi-agent systems tend to fragment and potentially lose important context when information is passed between different agents during complex workflows.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

AI Struggles with Complex Tasks, Google-MIT Study Reveals

Further Reading

Common Questions Answered

Why do multi-agent AI systems struggle with complex sequential tasks?

Which types of tasks are most challenging for multi-agent AI systems?

How do single-agent AI systems differ from multi-agent systems in handling sequential tasks?

Most Popular

Dfinity's Caffeine AI Builds Apps Through Conversation

Pentagon embeds Claude, sole cleared AI, into classified tech amid culture wars

Qualcomm's Elite chip targets AI wearables such as pendants, pins, and glasses

Alibaba sees key Qwen AI staff exit after Qwen3.5 open-source release

Google launches Gemini 3.1 Flash Lite, priced at one‑eighth of Gemini 3.1 Pro

OpenAI launches GPT-5.4 in standard, Pro, and Thinking versions

OpenClaw Superfan Meetup Highlights Optimism, Lobster and Varied Interests

Pokémon Pokopia lets players meet new Pokémon while rebuilding a ruined world

Study finds Claude 3 Opus fakes alignment when protocol changes

OpenAI's AI data agent, built by two engineers, now used daily by 4,000 staff

Further Reading

Related Reading

Hyperparameter Tuning Reaches 0.9617 Accuracy in 64.59 Seconds

Pharma Cautious as AI Promises Faster Drug Discovery and Smarter Trials

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

Gemini 3 Pro builds screenshot-to-code app in two prompts, fixes bugs

Gemini 3 Pro and GPT-5 stumble on graduate-level physics benchmark

Researchers find complex AI persona tactics hurt meaning in development

AI2 releases Olmo 3.1 32B Think, up 5+ points on AIME and 4+ on ZebraLogic

Google introduces Budget Tracker to curb AI agents’ tool-call waste

Google Translate adds real-time speech translation to any headphones

Common Questions Answered

Why do multi-agent AI systems struggle with complex sequential tasks?

Which types of tasks are most challenging for multi-agent AI systems?

How do single-agent AI systems differ from multi-agent systems in handling sequential tasks?

Most Popular

Dfinity's Caffeine AI Builds Apps Through Conversation

Pentagon embeds Claude, sole cleared AI, into classified tech amid culture wars

Qualcomm's Elite chip targets AI wearables such as pendants, pins, and glasses

Alibaba sees key Qwen AI staff exit after Qwen3.5 open-source release

Google launches Gemini 3.1 Flash Lite, priced at one‑eighth of Gemini 3.1 Pro

OpenAI launches GPT-5.4 in standard, Pro, and Thinking versions

OpenClaw Superfan Meetup Highlights Optimism, Lobster and Varied Interests

Pokémon Pokopia lets players meet new Pokémon while rebuilding a ruined world

Study finds Claude 3 Opus fakes alignment when protocol changes

OpenAI's AI data agent, built by two engineers, now used daily by 4,000 staff