Skip to main content
MiniMax M2.7 Agent scores 56.22% SWE-Pro, 57% Terminal Bench 2, ELO 1495, showcasing AI performance.

Editorial illustration for MiniMax M2.7 Agent Scores 56.22% SWE‑Pro, 57% Terminal Bench 2, ELO 1495

MiniMax M2.7: Open Source AI Coding Agent Breaks Records

MiniMax M2.7 Agent Scores 56.22% SWE‑Pro, 57% Terminal Bench 2, ELO 1495

3 min read

MiniMax just dropped its latest open‑source agent, the M2.7, and the numbers are hard to ignore. A self‑evolving model, it hit 56.22 % on the SWE‑Pro suite and 57 % on Terminal Bench 2—metrics that many developers watch to gauge code‑generation chops. While those percentages speak for themselves, the real story emerges when you stack M2.7 against the broader field.

Across 45 contenders, a single benchmark tallies domain expertise and task‑delivery skill, assigning an ELO rating that lets you see who’s truly ahead. MiniMax’s score lands it at the top of the open‑source pack, nudging just behind the heavyweight commercial offerings. And that’s not all; another competition focused on tool use also shows the model holding its own.

Why does this matter? Because open‑source agents have rarely breached the performance ceiling set by proprietary systems, and M2.7’s results suggest the gap may be narrowing. The details below lay out exactly where MiniMax stands.

In the GDPval-AA evaluation, which measures domain expertise and task delivery capability across 45 models, MiniMax M2.7 achieved an ELO score of 1495 -- the highest among open-source models, second only to Opus 4.6, Sonnet 4.6, and GPT-5.4, and surpassing GPT-5.3. On Toolathon, MiniMax M2.7 achieved an accuracy of 46.3%, reaching the global top tier. In MM Claw testing -- an evaluation MiniMax built based on real-world usage patterns from the OpenClaw personal agent platform -- MiniMax M2.7 maintained a 97% skill compliance rate across 40 complex skills (each exceeding 2,000 tokens) and achieved an overall accuracy of 62.7%, approaching Sonnet 4.6.

In finance, MiniMax M2.7 can autonomously read a company's annual reports and earnings call transcripts, cross-reference multiple research reports, independently design assumptions and build a revenue forecast model, and produce a PPT and Word research report based on templates -- understanding, making judgments, and producing output like a junior analyst. Key Takeaways - MiniMax M2.7 is now officially open source, with weights available on Hugging Face, making a frontier-grade agentic model freely accessible for developers to deploy and build on. - MiniMax M2.7 achieves SOTA performance on real-world software engineering benchmarks, scoring 56.22% on SWE-Pro (matching GPT-5.3-Codex) and 57.0% on Terminal Bench 2 -- tests that measure production-level reasoning, not just code generation.

- MiniMax M2.7 is the first model to actively participate in its own development, running over 100 autonomous rounds of scaffold optimization and achieving a 30% performance improvement -- an early, concrete example of AI-assisted AI development in practice.

MiniMax M2.7 is now publicly available on Hugging Face, with its weights released for the first time. Its scores—56.22 % on SWE‑Pro and 57 % on Terminal Bench 2—place it ahead of many open‑source peers. In the GDPval‑AA benchmark, the model posted an ELO of 1495, the top figure among open‑source offerings and only behind Opus 4.6, Sonnet 4.6 and GPT‑5.4.

The MoE architecture underpins its performance, and the claim that the model actively participates in its own development cycle marks a noteworthy shift in how such systems are iterated. Yet it is unclear whether this self‑evolving approach will translate into sustained gains beyond the reported benchmarks. The Toolathon results were mentioned but not detailed, leaving a gap in the public record.

Overall, MiniMax M2.7 demonstrates measurable progress within the open‑source sphere, but its broader impact remains to be clarified.

Further Reading

Common Questions Answered

What benchmark scores did the MiniMax M2.7 agent achieve?

The MiniMax M2.7 scored 56.22% on the SWE-Pro suite and 57% on Terminal Bench 2, demonstrating strong performance in code generation capabilities. These scores position it as a competitive open-source agent in technical evaluation metrics.

How does the MiniMax M2.7 rank in the GDPval-AA evaluation?

In the GDPval-AA evaluation, the MiniMax M2.7 achieved an ELO score of 1495, which is the highest among open-source models. This score places it second only to Opus 4.6, Sonnet 4.6, and GPT-5.4 in the overall ranking.

Where can developers access the MiniMax M2.7 model?

The MiniMax M2.7 model is now publicly available on Hugging Face, with its weights released for the first time. This open release allows developers to explore and potentially utilize the model's advanced capabilities.