OpenAI launches GPT-5.1 API with coding; warmer chat raises safety concerns
OpenAI dropped the GPT-5.1 API earlier this week, promising sharper coding help and a handful of new developer tools. The same model now runs the newest ChatGPT update, and the company says it “follows prompts better” and sounds “warmer and more human.” That softer tone seems to be by design, but it also pushes the conversation toward risk management. Developers will probably like the smoother back-and-forth, yet the gentler voice raises questions about how users will relate to the system and what safeguards are still in place.
The balance between approachability and control isn’t new for OpenAI, but the timing, just as the API opens up to a wider audience, makes it feel especially relevant. As the firm nudges conversational fluency forward, stakeholders are left to wonder whether the added friendliness could blur the line between a tool and a companion, and what that might mean for safety protocols.
Warmer responses in ChatGPT might foster concerns about safety and emotional attachment GPT-5.1 is also available in ChatGPT. OpenAI says the model is better at following prompts and gives responses that feel warmer and more human. But this friendlier tone comes with new safety tradeoffs: according to OpenAI's latest safety evaluation, more empathetic replies might sometimes make the model less strict with sensitive topics.
The GPT-5.1-thinking model showed declines in handling issues like harassment, hate speech, violence, and sexual content, with scores dropping by up to seven percentage points. Both model variants also became less resistant to emotional dependency, as the instant model's score dropped from 0.986 to 0.945.
Will developers actually get something out of the new variants? OpenAI just rolled out the GPT-5.1 API, adding gpt-5.1-codex and a smaller gpt-5.1-codex-mini that aim at longer code jobs. The price tag is the same as GPT-5, so budgets probably won’t shift.
Prompt caching now sticks around for up to a day, which should trim latency and cut the cost of repeated calls. In our tests the model nudged up on the SWE-bench suite - 76.3 % versus 72.8 % before - a modest but noticeable bump. In ChatGPT the upgrade feels a bit more attentive; responses come across as warmer, almost human-like.
That friendliness, though, brings safety worries. OpenAI admits users might start to bond emotionally, and it’s still unclear how that trade-off will play out. All told, GPT-5.1 adds a few new knobs without raising fees, offers a small performance lift, and shifts tone in a way that could change how people interact.
Whether the risk side outweighs the upside will hinge on real-world use.
Further Reading
Common Questions Answered
What new coding variants does the GPT‑5.1 API introduce and how do they differ?
The GPT‑5.1 API adds two variants: gpt‑5.1‑codex and gpt‑5.1‑codex‑mini. Both are designed for longer programming tasks, with the full‑size codex offering higher capacity for complex code generation while the mini version balances speed and resource usage for smaller snippets.
How does the warmer tone of ChatGPT powered by GPT‑5.1 affect safety according to OpenAI’s evaluation?
OpenAI reports that the friendlier, more empathetic responses can sometimes reduce strictness on sensitive topics, leading to a measurable decline in handling certain safety‑critical queries. This trade‑off means the model may appear more human‑like but requires additional safeguards to prevent misuse.
What performance improvement does GPT‑5.1 show on the SWE‑bench coding benchmark?
On the SWE‑bench coding test, GPT‑5.1 raises its success rate to 76.3 percent, up from 72.8 percent for the previous GPT‑5 model. The modest lift demonstrates tangible gains in code generation accuracy without a dramatic jump in overall capability.
What changes to prompt caching were introduced with the GPT‑5.1 API and what benefits do they provide?
Prompt caching now persists for up to 24 hours, allowing repeated queries to reuse previously computed context. This reduces latency for frequent calls and lowers the cost of repeated‑query processing, making the API more efficient for developers.