Skip to main content
Britannica vs. OpenAI: ChatGPT lawsuit over verbatim content copying, with Britannica logo and ChatGPT interface.

Editorial illustration for Britannica sues OpenAI, alleging ChatGPT copied its content verbatim

Britannica Sues OpenAI for ChatGPT Content Plagiarism

Britannica sues OpenAI, alleging ChatGPT copied its content verbatim

2 min read

Britannica has taken OpenAI to court, claiming the AI service has been reproducing its articles word for word. The filing alleges that ChatGPT’s latest model, GPT‑4, pulls directly from the encyclopedia’s copyrighted entries and spits them back to users without any licensing agreement. While the tech behind large language models is often described as “learning” from massive text corpora, the complaint says the result is little more than a copy‑and‑paste operation.

It isn’t just Britannica that feels aggrieved; the lawsuit also names Merriam‑Webster, suggesting a pattern of behavior rather than an isolated slip. Legal experts note that proving “memorization” in a neural network can be tricky, but the plaintiffs argue the evidence is clear enough to merit a trial. Here’s what the company says about the extent of the copying:

The lawsuit accuses OpenAI of outputting near-identical copies of Britannica and Merriam-Webster's content. According to Britannica, OpenAI repeatedly copied its content without permission, stating, "GPT-4 itself has 'memorized' much of Britannica's copyrighted content and will output near-verbatim copies of significant portions on demand. The memorized examples are unauthorized copies that [OpenAI] used to train their models, including GPT-4." The lawsuit goes on to include examples of responses from OpenAI's models side by side with Britannica's text, in which entire passages appear to match word for word. Britannica also claims that OpenAI has been "cannibalizing" its web traffic by generating responses that "substitute, or directly compete" with Britannica's content, rather than directing users to its website the way a traditional search engine would.

Britannica’s lawsuit frames the core dispute as a claim that OpenAI’s GPT‑4 has memorized protected text and reproduces it without authorization. The complaint, joined by Merriam‑Webster, points to instances where the model allegedly generated responses that are ‘substantially similar’ to the encyclopedic and dictionary entries. OpenAI has not yet responded publicly, leaving it unclear whether the alleged near‑verbatim outputs stem from training data practices or from coincidental overlap.

If the court finds that the model indeed stored and emitted copyrighted material, the case could set a precedent for how large language models handle proprietary sources. Conversely, the burden of proof rests on the plaintiffs to demonstrate systematic copying rather than isolated similarity. The outcome will hinge on technical analyses of the model’s behavior and on legal interpretations of ‘memorization.’ Until a judgment is rendered, the extent of OpenAI’s liability remains uncertain, and the broader implications for AI training remain to be clarified.

Further Reading

Common Questions Answered

What specific copyright allegations does Britannica make against OpenAI's ChatGPT?

Britannica alleges that ChatGPT's GPT-4 model is reproducing its copyrighted encyclopedia entries word for word without authorization. The lawsuit claims that the AI system has 'memorized' substantial portions of Britannica's content and can output near-verbatim copies of entire passages on demand.

How does Britannica characterize OpenAI's content reproduction in the lawsuit?

Britannica describes OpenAI's content reproduction as more of a 'copy-and-paste operation' rather than genuine learning or transformation of text. The complaint suggests that GPT-4 is essentially creating unauthorized copies of their copyrighted material, going beyond typical machine learning practices.

Which other organization has joined Britannica in the lawsuit against OpenAI?

Merriam-Webster has joined Britannica in the lawsuit against OpenAI, supporting the claim that the AI model is reproducing copyrighted dictionary and encyclopedic content without permission. Together, they are challenging OpenAI's training data practices and content generation methods.