Skip to main content
Poolside AI unveils Laguna XS.2 and M.1 models, achieving 72.5% on SWE-bench Verified, showcasing cutting-edge AI advancement

Editorial illustration for Poolside AI launches Laguna XS.2 and M.1, hitting 72.5% on SWE-bench Verified

Poolside AI launches Laguna XS.2 and M.1, hitting 72.5%...

Poolside AI launches Laguna XS.2 and M.1, hitting 72.5% on SWE-bench Verified

1 min read

Poolside AI’s latest rollout adds two agentic coding models—Laguna XS.2 and M.1—to its growing portfolio. The company describes Laguna XS.2 as its second‑generation mixture‑of‑experts (MoE) system, while M.1 builds on the same architecture with a focus on autonomous code generation. Both are positioned as “agentic” because they can initiate and manage coding tasks without step‑by‑step prompting.

That claim matters most when the models are pitted against established benchmarks that simulate real‑world software development challenges. SWE‑bench, for instance, offers a suite of tests ranging from verified solutions to multilingual and professional‑level tasks, while Terminal‑Bench 2.0 evaluates broader system interaction. Seeing how Laguna XS.2 and M.1 perform across these metrics gives developers a concrete sense of their practical capabilities—and limits.

Here’s how the numbers stack up:

Laguna M.1 and Laguna XS.2 mark Poolside AI’s first public foray into its Laguna line, and the company has paired them with “pool,” a lightweight terminal‑based coding agent and dual ACP client‑server that mirrors the internal RL training environment. On SWE‑bench Verified the models hit 72.5 %, with 67.3 % on the multilingual variant, 46.9 % on SWE‑bench Pro and 40.7 % on Terminal‑Bench 2.0. Those numbers suggest progress, yet it is unclear how they translate to broader software‑development tasks outside the benchmark suites.

The release of “pool” as a research preview may invite external validation, but adoption will depend on how easily researchers can integrate the dual‑client protocol into existing workflows. Laguna XS.2 is described as a second‑generation mixture‑of‑experts model, though details about its architecture remain sparse. Without independent replication of the reported scores, the practical impact of these agents remains uncertain.

For now, the announcement provides concrete metrics and tooling, leaving the community to assess whether the performance gains hold up under real‑world coding pressures.

Further Reading