OpenAI engineers gather around a wall‑mounted screen displaying scrolling code as a digital clock reads 24:00.

OpenAI launches GPT‑5.1‑Codex‑Max, completes 24‑hour coding task internally

November 19, 2025 • 2 min read

When I first saw the headline for OpenAI’s newest model, GPT-5.1-Codex-Max, I was surprised by the claim that it can sit on a coding task for an entire day without anyone stepping in. According to the company, they ran the model through a 24-hour internal marathon that included a handful of refactors, a series of test-driven loops and some self-directed debugging. What sticks out isn’t just the stamina; the numbers suggest a noticeable boost in efficiency.

In side-by-side tests the new version apparently cut the “thinking” tokens it used by about a third when set to a moderate reasoning mode, yet it still produced output that matches, or even tops, what GPT-5.1-Codex delivered. If those early figures are accurate, we might see quicker iteration cycles and lower compute bills on projects that run long. The internal test setup, while not fully disclosed, gives a glimpse of the performance snapshot that follows.

It’ll be interesting to see how it holds up in real-world codebases.

The model has been internally observed to complete tasks lasting more than 24 hours, including multi-step refactors, test-driven iteration, and autonomous debugging. At medium reasoning effort, GPT‑5.1-Codex-Max used approximately 30% fewer thinking tokens than GPT‑5.1-Codex for comparable or better accuracy, which has implications for both cost and latency. Platform Integration and Use Cases GPT‑5.1-Codex-Max is currently available across multiple Codex-based environments, which refer to OpenAI's own integrated tools and interfaces built specifically for code-focused AI agents. These include: Codex CLI, OpenAI's official command-line tool (@openai/codex), where GPT‑5.1-Codex-Max is already live.

OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally - VentureBeat AI

Related Topics: #OpenAI #GPT-5.1-Codex-Max #GPT-5.1-Codex #Codex CLI #thinking tokens #autonomous debugging

The new GPT-5.1-Codex-Max is OpenAI’s latest attempt at an “always-on” coding buddy. By making it the default model, the company seems to be betting on its longer-term reasoning and real-time chat. Inside OpenAI, the system already logged a full day of work - it tackled multi-step refactors, ran test-driven loops and even did some autonomous debugging while using about 30 % fewer thinking tokens at a medium reasoning level.

That drop hints at a tighter token budget for similar results. Still, the data comes from one internal scenario, so it’s hard to say if the same savings will show up in the wild, across the many ways developers code. The “better” claim also lacks hard benchmarks, leaving questions about consistency and how well it scales.

On top of that, moving to a persistent, high-context agent brings up worries about resource use and error handling over long sessions. All in all, the internal numbers look encouraging, but whether GPT-5.1-Codex-Max will actually help the broader dev community remains an open question.

Common Questions Answered

What endurance capability does GPT‑5.1‑Codex‑Max claim to have?

GPT‑5.1‑Codex‑Max claims it can maintain a coding session for a full 24‑hour period without human intervention. During internal testing it performed continuous multi‑step refactors, test‑driven iteration, and autonomous debugging throughout the entire stretch.

How does GPT‑5.1‑Codex‑Max’s token efficiency compare to GPT‑5.1‑Codex?

When operating at medium reasoning effort, GPT‑5.1‑Codex‑Max uses roughly 30 % fewer thinking tokens than its predecessor, GPT‑5.1‑Codex. This reduction translates into lower computational cost and reduced latency while maintaining comparable or better accuracy.

What types of coding tasks were included in the internal 24‑hour marathon?

The marathon incorporated multi‑step refactors, test‑driven iteration cycles, and autonomous debugging of code. These tasks simulate real‑world development workflows, demonstrating the model’s ability to handle complex, iterative programming challenges without supervision.

How is GPT‑5.1‑Codex‑Max being deployed across OpenAI’s platforms?

GPT‑5.1‑Codex‑Max is now the default model in multiple Codex‑based environments, replacing the earlier GPT‑5.1‑Codex. This rollout supports the company’s “always‑on” coding assistant vision, enabling developers to benefit from long‑horizon reasoning and real‑time interaction.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

OpenAI launches GPT‑5.1‑Codex‑Max, completes 24‑hour coding task internally

Common Questions Answered

What endurance capability does GPT‑5.1‑Codex‑Max claim to have?

How does GPT‑5.1‑Codex‑Max’s token efficiency compare to GPT‑5.1‑Codex?

What types of coding tasks were included in the internal 24‑hour marathon?

How is GPT‑5.1‑Codex‑Max being deployed across OpenAI’s platforms?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds

Related Reading

OpenAI says AI saves knowledge workers 40‑80 minutes; use yields five‑fold gains

Grok Chat: AI for debugging, building, testing web apps with voice and images

Samsung adds Vision AI Companion, an AI Bixby, to TVs for real‑time queries

Developers say Sora, unlike Vine/TikTok, is not about people in social media

Jony Ive and Sam Altman unveil AI hardware prototype; OpenAI plans launch under two years

Common Questions Answered

What endurance capability does GPT‑5.1‑Codex‑Max claim to have?

How does GPT‑5.1‑Codex‑Max’s token efficiency compare to GPT‑5.1‑Codex?

What types of coding tasks were included in the internal 24‑hour marathon?

How is GPT‑5.1‑Codex‑Max being deployed across OpenAI’s platforms?

Most Popular

Rob Pike’s AI‑generated ‘act of kindness’ spams draft tribute to his work

Meta adds Spotify AI music, Kannada/Telugu, and noise filtering to AI Glasses

Fusion reactors could produce dark‑sector particles via neutron emissions

Gemini 3 Flash Offers Fast Multimodal Reasoning for Video, Data, Visual Q&A

NeuroPixel.AI draws global brands with production‑ready design automation tools

Qwen‑Image‑2512 launches, rivals Google’s Nano Banana Pro in AI image generation

OpenAI Opens Submissions for Apps Using ChatGPT’s SDK, Unveiled at DevDay

OpenAI launches App Directory, accepts ChatGPT apps with privacy notices

Sora 2 Generates Disturbing AI Kid Videos as Legal Grey Area Persists

72% of US teens surveyed have used AI companions, Common Sense Media finds