Skip to main content
OpenAI engineers gather around a wall‑mounted screen displaying scrolling code as a digital clock reads 24:00.

OpenAI launches GPT‑5.1‑Codex‑Max, completes 24‑hour coding task internally

2 min read

When I first saw the headline for OpenAI’s newest model, GPT-5.1-Codex-Max, I was surprised by the claim that it can sit on a coding task for an entire day without anyone stepping in. According to the company, they ran the model through a 24-hour internal marathon that included a handful of refactors, a series of test-driven loops and some self-directed debugging. What sticks out isn’t just the stamina; the numbers suggest a noticeable boost in efficiency.

In side-by-side tests the new version apparently cut the “thinking” tokens it used by about a third when set to a moderate reasoning mode, yet it still produced output that matches, or even tops, what GPT-5.1-Codex delivered. If those early figures are accurate, we might see quicker iteration cycles and lower compute bills on projects that run long. The internal test setup, while not fully disclosed, gives a glimpse of the performance snapshot that follows.

It’ll be interesting to see how it holds up in real-world codebases.

The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency. Platform Integration and Use Cases GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI's own integrated tools and interfaces built specifically for code-focused AI agents. These include: Codex CLI, OpenAI's official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live.

Related Topics: #OpenAI #GPT-5.1-Codex-Max #GPT-5.1-Codex #Codex CLI #thinking tokens #autonomous debugging

The new GPT-5.1-Codex-Max is OpenAI’s latest attempt at an “always-on” coding buddy. By making it the default model, the company seems to be betting on its longer-term reasoning and real-time chat. Inside OpenAI, the system already logged a full day of work - it tackled multi-step refactors, ran test-driven loops and even did some autonomous debugging while using about 30 % fewer thinking tokens at a medium reasoning level.

That drop hints at a tighter token budget for similar results. Still, the data comes from one internal scenario, so it’s hard to say if the same savings will show up in the wild, across the many ways developers code. The “better” claim also lacks hard benchmarks, leaving questions about consistency and how well it scales.

On top of that, moving to a persistent, high-context agent brings up worries about resource use and error handling over long sessions. All in all, the internal numbers look encouraging, but whether GPT-5.1-Codex-Max will actually help the broader dev community remains an open question.

Common Questions Answered

What endurance capability does GPT‑5.1‑Codex‑Max claim to have?

GPT‑5.1‑Codex‑Max claims it can maintain a coding session for a full 24‑hour period without human intervention. During internal testing it performed continuous multi‑step refactors, test‑driven iteration, and autonomous debugging throughout the entire stretch.

How does GPT‑5.1‑Codex‑Max’s token efficiency compare to GPT‑5.1‑Codex?

When operating at medium reasoning effort, GPT‑5.1‑Codex‑Max uses roughly 30 % fewer thinking tokens than its predecessor, GPT‑5.1‑Codex. This reduction translates into lower computational cost and reduced latency while maintaining comparable or better accuracy.

What types of coding tasks were included in the internal 24‑hour marathon?

The marathon incorporated multi‑step refactors, test‑driven iteration cycles, and autonomous debugging of code. These tasks simulate real‑world development workflows, demonstrating the model’s ability to handle complex, iterative programming challenges without supervision.

How is GPT‑5.1‑Codex‑Max being deployed across OpenAI’s platforms?

GPT‑5.1‑Codex‑Max is now the default model in multiple Codex‑based environments, replacing the earlier GPT‑5.1‑Codex. This rollout supports the company’s “always‑on” coding assistant vision, enabling developers to benefit from long‑horizon reasoning and real‑time interaction.