xAI's Grok 4.1 ranks second creative writing, scores 1721.9, cuts hallucinations
Elon Musk’s xAI just rolled out Grok 4.1, the latest version of its in‑house language model. The upgrade arrives with a claim of fewer hallucinations across web searches and integrated apps, though the company isn’t offering an API just yet. What catches a tech‑savvy reader isn’t the lack of public access, but the numbers that follow the release.
Benchmarks that pit large models against each other show Grok 4.1 climbing noticeably higher than its predecessors—about a 600‑point jump on a standard creative‑writing test. Even more striking is how it measures up against other heavyweight contenders, landing just behind a model labeled Polaris Alpha, which researchers say represents an early GPT‑5.1 variant. Those figures matter because they hint at a narrowing gap between proprietary systems and the leading open‑source or commercial offerings, especially in tasks that require nuanced storytelling.
The same trend shows up on the Arena Expert leaderboard, suggesting the improvements aren’t limited to a single test. Below, the exact ranking and score details lay out the model’s performance.
In creative writing, Grok 4.1 ranks second only to Polaris Alpha (an early GPT-5.1 variant), with the "thinking" model earning a score of 1721.9 on the Creative Writing v3 benchmark. This marks a roughly 600-point improvement over previous Grok iterations. Similarly, in the Arena Expert leaderboard, which aggregates feedback from professional reviewers, Grok 4.1 Thinking again leads the field with a score of 1510.
The gains are especially notable given that Grok 4.1 was released only two months after Grok 4 Fast, highlighting the accelerated development pace at xAI. Core Improvements Over Previous Generations Technically, Grok 4.1 represents a significant leap in real-world usability.
Grok 4.1 is now live on Grok.com, X and the iOS and Android apps, but the company has not opened an API yet. It arrives with a claimed lower hallucination rate on the web and in the apps, though independent verification beyond the headline figures has not been published. In the Creative Writing v3 benchmark the model scores 1721.9, placing it second only to Polaris Alpha and representing roughly a 600‑point jump from earlier Grok versions.
That improvement is notable, yet the evaluation focuses on a single benchmark and does not address performance across other tasks. Similarly, the model appears on the Arena Expert leaderboard, but the summary provides no detail on its ranking or score there. The launch coincides with heightened attention on rival offerings such as Google’s Gemini 3, suggesting a strategic timing rather than a purely technical milestone.
Whether the reduced hallucinations and higher creative‑writing score translate into broader utility remains unclear, especially without API access for developers to test the model in varied contexts.
Further Reading
Common Questions Answered
What score did Grok 4.1 achieve on the Creative Writing v3 benchmark and how does it compare to previous Grok versions?
Grok 4.1 earned a score of 1721.9 on the Creative Writing v3 benchmark, which places it second only to Polaris Alpha. This represents roughly a 600‑point improvement over earlier Grok iterations, indicating a substantial leap in creative writing capability.
Which model currently outranks Grok 4.1 in creative writing, and what variant is it?
Polaris Alpha, an early GPT‑5.1 variant, currently holds the top spot in creative writing, ranking ahead of Grok 4.1. Polaris Alpha achieved a higher score on the same Creative Writing v3 benchmark, making it the only model ahead of Grok 4.1.
How does Grok 4.1 perform on the Arena Expert leaderboard, and what does this leaderboard measure?
On the Arena Expert leaderboard, Grok 4.1’s “Thinking” version leads the field with a score of 1510, reflecting strong feedback from professional reviewers. The leaderboard aggregates expert evaluations across multiple tasks to gauge overall model quality and usefulness.
Is an API available for developers to access Grok 4.1, and what platforms can users currently use it on?
No public API has been released for Grok 4.1; the model is only accessible through xAI’s own platforms. Users can interact with Grok 4.1 via Grok.com, the X social network, and the iOS and Android applications.