RECAP tool shows Claude 3.7 reproduces ~3,000 words from The Hobbit and Harry Potter
When I first saw the RECAP benchmark, I thought it was just another test. Turns out it actually pulls back the curtain on how much copyrighted prose today’s language models can spit out. The tool asks a model to “recap” a passage without naming the source, then checks whether any sentences line up word-for-word with known books.
Researchers ran it on a handful of high-profile systems - Claude 3.7 from Anthropic, for example - and measured the output against an older baseline called earl. The gap was pretty stark. By scanning for exact matches, they could count how many snippets from well-known works resurfaced in the answers.
It's still unclear whether those matches are accidental or systematic, but it gives us a concrete number for something we’d only guessed at before: LLMs may be stitching together sizable blocks of protected text, not just rephrasing general facts. The figures that came out are enough to raise eyebrows, and they suggest we still have a lot to learn about the limits of so-called “creative” generation.
In testing, RECAP was able to reconstruct large portions of books like "The Hobbit" and "Harry Potter" with striking accuracy. For example, the researchers found that Claude 3.7 generated around 3,000 passages from the first "Harry Potter" book using RECAP, compared to just 75 passages found by earlier methods. Implications for Copyright Law To test RECAP's limits, the team introduced a new benchmark called "EchoTrace" that includes 35 complete books: 15 public domain classics, 15 copyrighted bestsellers, and five recently published titles that were definitely not part of the models' training data.
They also added 20 research articles from arXiv. The results showed that models could reproduce passages from almost every category, sometimes nearly word for word, except for the books the models hadn't seen during training.
The RECAP study shines a light on just how much text large language models can hold onto. By feeding feedback through a chain of models, a team from Carnegie Mellon and Instituto Superior Técnico managed to get Claude 3.7 to spit out roughly 3,000 passages from the first *Harry Potter* book - a jump from the dozens that showed up in earlier checks. The authors also point out that sizable chunks from *The Hobbit* came back with a surprisingly high level of detail.
So, what does this mean for copyright enforcement? It seems memorization isn’t just a one-off anecdote; you can actually put a number on it. Still, the paper doesn’t claim every model behaves the same way, nor that the reproduced snippets would hold up in court. The tool is fresh, and we’re not sure how it scales to bigger, messier corpora or to works that aren’t as well-known.
That said, the findings do raise real worries for future lawsuits, even if the exact legal fallout is still murky. We’ll need more work to see whether RECAP’s approach can become a go-to benchmark for measuring model memorization.
Common Questions Answered
How does the RECAP benchmark evaluate Claude 3.7's ability to reproduce copyrighted prose?
RECAP prompts Claude 3.7 to "recap" a text without revealing its source, then checks the output for verbatim passages. The study found the model reproduced roughly 3,000 passages from the first Harry Potter book, far exceeding earlier methods.
What were the key findings of the RECAP study regarding excerpts from *The Hobbit*?
The researchers observed that Claude 3.7 generated sizable excerpts from *The Hobbit* with striking accuracy when using RECAP. These excerpts were comparable in length to the Harry Potter passages, highlighting the model's extensive memorization of copyrighted material.
What is the EchoTrace benchmark and how does it relate to the RECAP experiments?
EchoTrace is a supplemental benchmark introduced by the team that includes 35 complete books, 15 of which are public‑domain classics. It was used to test the limits of RECAP, providing a broader context for measuring how many passages models can recall across diverse texts.
Which institutions conducted the RECAP research and what implications does it have for copyright enforcement?
The study was carried out by researchers at Carnegie Mellon University and Instituto Superior Técnico. Their findings suggest that large language models can retain and reproduce large swaths of copyrighted text, raising significant challenges for current copyright law and enforcement strategies.