Skip to main content
Analyst points at a monitor with a chart where xAI's Grok 4.1 ranks second with 1721.9 score and reduced hallucinations.

xAI's Grok 4.1 ranks second creative writing, scores 1721.9, cuts hallucinations

2 min read

When xAI rolled out Grok 4.1 last week, the first thing I noticed was the headline about fewer hallucinations in web searches and integrated apps - even though there’s still no public API. The real eye-catcher, though, are the numbers that follow. In a standard creative-writing benchmark Grok 4.1 jumped roughly 600 points over its previous version, a lift that feels sizable.

Even more intriguing is its standing against the big players: it sits just behind a model called Polaris Alpha, which researchers are calling an early GPT-5.1 variant. That gap-closing suggests proprietary systems might finally be catching up with the leading open-source and commercial offerings, especially when the task calls for nuanced storytelling. The same pattern shows up on the Arena Expert leaderboard, so the boost isn’t limited to one test.

I’m still waiting to see how these scores translate to real-world use, but the rankings give a hint that Grok 4.1 is moving into a more competitive zone. Below you’ll find the exact rankings and scores.

In creative writing, Grok 4.1 ranks second only to Polaris Alpha (an early GPT-5.1 variant), with the "thinking" model earning a score of 1721.9 on the Creative Writing v3 benchmark. This marks a roughly 600-point improvement over previous Grok iterations. Similarly, in the Arena Expert leaderboard, which aggregates feedback from professional reviewers, Grok 4.1 Thinking again leads the field with a score of 1510.

The gains are especially notable given that Grok 4.1 was released only two months after Grok 4 Fast, highlighting the accelerated development pace at xAI. Core Improvements Over Previous Generations Technically, Grok 4.1 represents a significant leap in real-world usability.

Related Topics: #AI #language model #Grok 4.1 #Polaris Alpha #GPT-5.1 #creative writing #Arena Expert

Grok 4.1 is now live on Grok.com, X and the iOS and Android apps, but the company still hasn’t opened an API. They say the new version cuts hallucinations on the web and in the apps, yet I haven’t seen any independent checks beyond the headline numbers. On the Creative Writing v3 benchmark the model hits 1721.9, which puts it just behind Polaris Alpha and looks like roughly a 600-point jump from the previous Grok releases.

That jump is impressive, but the test only covers one kind of task and says nothing about how it does on, say, coding or reasoning. The model also shows up on the Arena Expert leaderboard, though the summary leaves out its exact rank or score. Its launch lands at a time when competitors like Google’s Gemini 3 are getting a lot of buzz, so the timing feels a bit strategic.

Whether the lower hallucination rate and higher writing score will actually make the model more useful is still up in the air, especially since developers can’t poke at it via an API yet.

Further Reading

Common Questions Answered

What score did Grok 4.1 achieve on the Creative Writing v3 benchmark and how does it compare to previous Grok versions?

Grok 4.1 earned a score of 1721.9 on the Creative Writing v3 benchmark, which places it second only to Polaris Alpha. This represents roughly a 600‑point improvement over earlier Grok iterations, indicating a substantial leap in creative writing capability.

Which model currently outranks Grok 4.1 in creative writing, and what variant is it?

Polaris Alpha, an early GPT‑5.1 variant, currently holds the top spot in creative writing, ranking ahead of Grok 4.1. Polaris Alpha achieved a higher score on the same Creative Writing v3 benchmark, making it the only model ahead of Grok 4.1.

How does Grok 4.1 perform on the Arena Expert leaderboard, and what does this leaderboard measure?

On the Arena Expert leaderboard, Grok 4.1’s “Thinking” version leads the field with a score of 1510, reflecting strong feedback from professional reviewers. The leaderboard aggregates expert evaluations across multiple tasks to gauge overall model quality and usefulness.

Is an API available for developers to access Grok 4.1, and what platforms can users currently use it on?

No public API has been released for Grok 4.1; the model is only accessible through xAI’s own platforms. Users can interact with Grok 4.1 via Grok.com, the X social network, and the iOS and Android applications.