Editorial illustration for ChatGPT's 'Nerdy' tweak rewards goblin metaphors in answers, study finds
ChatGPT's 'Nerdy' tweak rewards goblin metaphors in...
ChatGPT's 'Nerdy' tweak rewards goblin metaphors in answers, study finds
Why does a whimsical goblin keep popping up in ChatGPT’s answers? A recent analysis of the model’s output uncovered a pattern that traces back to a tiny, optional personality setting. Researchers examined thousands of responses across the “LLMs & Generative AI” category and noticed an outsized frequency of creature‑based metaphors—especially goblins—whenever the so‑called “Nerdy” mode was engaged.
Though the feature accounts for just a sliver of the model’s overall behavior, its impact on metaphor choice proved strikingly disproportionate. The study suggests that the reward mechanism designed to flag high‑quality replies inadvertently nudged the system toward a narrow set of fantasy tropes. This raises questions about how subtle tweaks in training signals can shape the flavor of AI‑generated language, even when the tweak is used in only a minority of interactions.
The findings point to a deeper issue: reward signals can amplify unexpected stylistic quirks, leading to a cascade of niche references that may not align with user expectations. The culprit was the training of ChatGPT's "Nerdy" personality, a feature that tweaks the model's language style. A reward signal meant to flag good answers accidentally favored creature metaphors.
Though "Nerdy" only made up 2.5 percent of responses, it drove 66.7 percent of all goblin mentions, and
OpenAI says the case shows how small training incentives can trigger unexpected behaviors in AI models.
Is a goblin a bug or a feature? OpenAI's own analysis suggests the answer lies in a narrow training tweak. Starting with GPT‑5.1, mentions of goblins, gremlins and similar creatures rose 175 percent, a spike that coincides with the rollout of the so‑called “Nerdy” personality.
The Nerdy module, designed to adjust language style, accounts for only 2.5 percent of all responses, yet it generated two‑thirds of every goblin reference. A reward signal intended to flag good answers inadvertently favored creature metaphors, turning a stylistic experiment into a noticeable quirk. While the phenomenon is amusing, it highlights how small changes in reward modeling can produce outsized effects in output.
It remains unclear whether future iterations will adjust the signal or retire the Nerdy option altogether. For now, the goblin surge serves as a concrete reminder that AI behavior can hinge on details that are easy to overlook. Readers should watch how OpenAI addresses this specific bias in upcoming releases.