Our content generation service is experiencing issues. A human-curated summary is being prepared.
Research & Benchmarks

RECAP tool shows Claude 3.7 reproduces ~3,000 words from The Hobbit and Harry Potter

2 min read

Why does this matter? Because a new benchmark called RECAP is pulling back the curtain on how much copyrighted prose today’s language models can reproduce. The tool, designed to probe large‑scale models for verbatim recall, runs a series of prompts that ask the model to “recap” a text without revealing the source.

While the idea sounds straightforward, the results have been anything but. Researchers ran the test on several high‑profile models, including Anthropic’s Claude 3.7, and compared the output against a baseline that used an older system known as earl. The disparity was stark.

By scanning the generated passages for exact matches, the team could quantify how many snippets from well‑known books resurfaced in the model’s answers. This method gives a concrete measure of what was previously an anecdotal concern: that LLMs might be stitching together large blocks of protected material rather than merely echoing general knowledge. The numbers that emerged are enough to raise eyebrows and prompt a deeper look at the limits of “creative” generation.

In testing, RECAP was able to reconstruct large portions of books like "The Hobbit" and "Harry Potter" with striking accuracy. For example, the researchers found that Claude 3.7 generated around 3,000 passages from the first "Harry Potter" book using RECAP, compared to just 75 passages found by earlier methods. Implications for Copyright Law To test RECAP's limits, the team introduced a new benchmark called "EchoTrace" that includes 35 complete books: 15 public domain classics, 15 copyrighted bestsellers, and five recently published titles that were definitely not part of the models' training data.

They also added 20 research articles from arXiv. The results showed that models could reproduce passages from almost every category, sometimes nearly word for word, except for the books the models hadn't seen during training.

Related Topics: #RECAP #Claude 3.7 #LLM #EchoTrace #The Hobbit #Harry Potter #Anthropic #arXiv #Copyright Law

The RECAP study puts a spotlight on how much text large language models can retain. By looping feedback through several models, researchers at Carnegie Mellon and Instituto Superior Técnico managed to coax Claude 3.7 into reproducing roughly 3,000 passages from the first *Harry Potter* book, while earlier checks surfaced only dozens. Likewise, sizable excerpts from *The Hobbit* emerged with “striking accuracy,” according to the authors.

What does this mean for copyright enforcement? The findings suggest that memorization is not merely anecdotal; it can be quantified. Yet the report stops short of claiming that every model behaves identically, or that the reproduced material would survive legal scrutiny. The tool itself is new, and its limits—such as how it scales across diverse corpora or how it handles less‑famous works—remain unclear.

Consequently, while the evidence raises legitimate concerns for future lawsuits, the precise legal ramifications are still uncertain. Further research will be needed to determine whether RECAP’s methodology can become a standard benchmark for assessing model memorization.

Further Reading

Common Questions Answered

How does the RECAP benchmark evaluate Claude 3.7's ability to reproduce copyrighted prose?

RECAP prompts Claude 3.7 to "recap" a text without revealing its source, then checks the output for verbatim passages. The study found the model reproduced roughly 3,000 passages from the first Harry Potter book, far exceeding earlier methods.

What were the key findings of the RECAP study regarding excerpts from *The Hobbit*?

The researchers observed that Claude 3.7 generated sizable excerpts from *The Hobbit* with striking accuracy when using RECAP. These excerpts were comparable in length to the Harry Potter passages, highlighting the model's extensive memorization of copyrighted material.

What is the EchoTrace benchmark and how does it relate to the RECAP experiments?

EchoTrace is a supplemental benchmark introduced by the team that includes 35 complete books, 15 of which are public‑domain classics. It was used to test the limits of RECAP, providing a broader context for measuring how many passages models can recall across diverse texts.

Which institutions conducted the RECAP research and what implications does it have for copyright enforcement?

The study was carried out by researchers at Carnegie Mellon University and Instituto Superior Técnico. Their findings suggest that large language models can retain and reproduce large swaths of copyrighted text, raising significant challenges for current copyright law and enforcement strategies.