Our content generation service is experiencing issues. A human-curated summary is being prepared.
LLMs & Generative AI

OpenAI launches GPT‑5.1‑Codex‑Max, completes 24‑hour coding task internally

2 min read

OpenAI’s latest offering, GPT‑5.1‑Codex‑Max, arrives with a bold claim: it can keep a coding session running for a full day without human intervention. The company says the model was put through an internal marathon, tackling a continuous 24‑hour workload that involved multiple refactors, test‑driven cycles and self‑directed debugging. What catches the eye isn’t just endurance; it’s the reported efficiency gain.

In side‑by‑side runs, the new version reportedly slashed the amount of “thinking” tokens it needed by roughly a third when operating at a moderate reasoning level, all while matching or exceeding the output of its predecessor, GPT‑5.1‑Codex. If those figures hold up, developers could see faster iteration cycles and lower compute costs on long‑running projects. The details of the internal evaluation set the stage for the performance snapshot that follows.

The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency. Platform Integration and Use Cases GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI's own integrated tools and interfaces built specifically for code-focused AI agents. These include: Codex CLI, OpenAI's official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live.

Related Topics: #OpenAI #GPT-5.1-Codex-Max #GPT-5.1-Codex #Codex CLI #thinking tokens #autonomous debugging

The rollout of GPT‑5.1‑Codex‑Max marks OpenAI’s latest push toward an “always‑on” coding assistant. By swapping out GPT‑5.1‑Codex as the default model, the company signals confidence in the new system’s long‑horizon reasoning and real‑time interaction. Internally, the model has already logged a 24‑hour stretch of work, handling multi‑step refactors, test‑driven iteration and autonomous debugging while consuming roughly 30 % fewer thinking tokens at medium reasoning effort.

That efficiency gain suggests a tighter token budget for comparable outcomes. Yet the evidence comes from a single, internal use case; it remains unclear whether the same token savings and performance will translate across diverse developer environments. The claim of “better” results lacks quantifiable benchmarks, leaving open questions about consistency and scalability.

Moreover, the shift to a persistent, high‑context agent raises questions about resource management and error handling over extended sessions. In short, GPT‑5.1‑Codex‑Max demonstrates promising internal metrics, but its broader utility for the developer community is still uncertain.

Further Reading

Common Questions Answered

What endurance capability does GPT‑5.1‑Codex‑Max claim to have?

GPT‑5.1‑Codex‑Max claims it can maintain a coding session for a full 24‑hour period without human intervention. During internal testing it performed continuous multi‑step refactors, test‑driven iteration, and autonomous debugging throughout the entire stretch.

How does GPT‑5.1‑Codex‑Max’s token efficiency compare to GPT‑5.1‑Codex?

When operating at medium reasoning effort, GPT‑5.1‑Codex‑Max uses roughly 30 % fewer thinking tokens than its predecessor, GPT‑5.1‑Codex. This reduction translates into lower computational cost and reduced latency while maintaining comparable or better accuracy.

What types of coding tasks were included in the internal 24‑hour marathon?

The marathon incorporated multi‑step refactors, test‑driven iteration cycles, and autonomous debugging of code. These tasks simulate real‑world development workflows, demonstrating the model’s ability to handle complex, iterative programming challenges without supervision.

How is GPT‑5.1‑Codex‑Max being deployed across OpenAI’s platforms?

GPT‑5.1‑Codex‑Max is now the default model in multiple Codex‑based environments, replacing the earlier GPT‑5.1‑Codex. This rollout supports the company’s “always‑on” coding assistant vision, enabling developers to benefit from long‑horizon reasoning and real‑time interaction.