Editorial illustration for Sakana AI launches Sakana Fugu; Fugu Ultra leads coding, reasoning and tests
Sakana AI launches Sakana Fugu; Fugu Ultra leads coding,...
Sakana AI launches Sakana Fugu; Fugu Ultra leads coding, reasoning and tests
Sakana AI rolled out its newest offering, Sakana Fugu, today. The service looks like a single OpenAI‑compatible endpoint, but under the hood it’s a multi‑agent orchestration system that decides how to tackle each request. If a task can be solved directly, Fugu handles it itself; when the problem calls for more expertise, it pulls together a team of specialist models and coordinates their output.
What’s notable is that Fugu isn’t just a router—it’s a language model trained to call other LLMs, even spawning instances of itself recursively. It manages model selection, delegation, verification and synthesis without any hard‑coded roles or workflows, learning on the fly when to delegate and how agents should communicate.
The company frames the architecture as a safeguard against single‑vendor lock‑in. Recent export controls on Anthropic’s Fable and Mythos models, for example, prompted the team to build a system that can reroute around provider restrictions. As newer models emerge, they can be slipped into the pool, keeping the service adaptable without exposing the routing logic to users.
Fugu Ultra tops the four coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the only baseline win here.
Its Fugu models stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two are not in Fugu’s pool, since they are not publicly accessible.
Use Cases
Sakana AI ran a beta with close to 500 early users. The published examples favor long, multi-step tasks.
- AutoResearch: An agent improved a small GPT’s training recipe autonomously.
Why this matters
Sakana AI’s Fugu system promises to hide the intricacies of multi‑agent orchestration behind a single endpoint, letting developers submit a request and let the platform decide whether a solo model or a coordinated team is needed. The claim that “the complexity of a multi‑agent system never reaches your code” is appealing, yet it leaves open how much control developers retain when a team of models is assembled. Fugu Ultra’s top scores on four coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam suggest strong performance in isolated tests, while regular Fugu leads in SciCode, τ³ Banking, and Long Context Reasoning.
GPT 5.5’s lone win on MRCRv2 shows competition is still viable. What remains unclear is how these results translate to production workloads, integration overhead, or cost. For founders, the promise of a plug‑and‑play orchestration layer could reduce engineering effort, but the trade‑offs in latency and predictability need careful evaluation.
Researchers may find a useful testbed for model collaboration, yet the lack of detail on the “swappable pool” of frontier LLMs invites skepticism about reproducibility and long‑term support.
Further Reading
- Sakana Fugu: A Multi-Agent Orchestration System as a Foundation Model - Sakana AI
- How Sakana trained a 7B model to orchestrate GPT, Claude and Gemini - VentureBeat
- Sakana AI Launches Fugu Multi-Agent System - Phemex News
- Sakana AI Launches Commercial Product Fugu Multi-Agent Orchestration System - KuCoin News
- Introducing Sakana Fugu AI Orchestration System Beta - LinkedIn