German court deepens AI‑copyright split, says lyric similarity not coincidental
The ruling from a German court has sharpened the debate over whether machine‑generated text can cross the line into copyright infringement. In this case, a generative model produced a set of song lyrics that a plaintiff claimed mirrored an existing work. The judges examined the two texts side by side, weighing the sheer length of the verses and the intricacy of their phrasing.
While the technology behind the model is impressive, the court focused on the practical outcome: does the output amount to a copy of protected material? That question sits at the heart of a growing legal split, with some tribunals treating AI‑created content as a fresh creation and others seeing it as a derivative. Here, the bench concluded that the overlap was too substantial to dismiss as random.
The decision underscores how courts are beginning to apply traditional copyright standards to algorithms that can mimic human artistry.
After comparing the original lyrics to the model's output, the court said the similarity could not be explained by coincidence, given the length and complexity of the songs, according to its press release. For the judges, this was enough to count as copyright‑relevant reproduction. The lyrics are em
After comparing the original lyrics to the model's output, the court said the similarity could not be explained by coincidence, given the length and complexity of the songs, according to its press release. For the judges, this was enough to count as copyright-relevant reproduction. The lyrics are embedded in the model's parameters, meaning they're embodied in the model itself, even if only as probability values.
The court cited the EU directive on reproductions, which covers works "by any means and in any form, in whole or in part." In general, companies developing large language models can make copies for analysis under text and data mining (TDM) rules. But the court said this only covers copies needed to assemble a training dataset.
Yet the decision leaves many questions open. The Munich Regional Court granted GEMA’s request for an injunction, ordered disclosure of the model’s training data, and awarded damages, though the ruling is not final. By deeming the similarity between ChatGPT’s output and the original verses “not coincidental,” the judges treated the excerpts as copyright‑relevant reproduction.
This reasoning hinges on the length and complexity of the songs, a point the court emphasized in its press release. Meanwhile, a recent UK judgment took a markedly different approach, underscoring how divergent national courts are on the same issue. If the German ruling stands, developers may face tighter obligations to secure licenses before training language models on lyrical content.
However, the appellate path remains unclear, and the broader legal framework for AI‑generated text is still unsettled. Critics argue that the standard for “coincidence” could be hard to apply consistently. Observers will watch how higher courts respond, but for now the split persists and the practical impact on AI services remains uncertain.
Further Reading
- OpenAI models and outputs infringed lyrics copyright, German court rules - MLex
- German court: OpenAI committed copyright infringement in AI memorization and output of song lyrics. 1st copyright decision v. OpenAI. More to follow. - ChatGPT is Eating the World
- Papers with Code - Latest NLP Research - Papers with Code
- Hugging Face Daily Papers - Hugging Face
- ArXiv CS.CL (Computation and Language) - ArXiv
Common Questions Answered
What did the Munich Regional Court decide about the similarity between the AI‑generated lyrics and the original song?
The court concluded that the similarity could not be explained by coincidence, citing the length and complexity of the verses. It therefore treated the AI‑generated excerpts as copyright‑relevant reproduction and granted GEMA’s request for an injunction.
How does the court’s ruling define the role of a model’s parameters in copyright infringement?
The judges stated that the lyrics are embedded in the model’s parameters as probability values, meaning the model itself contains the copyrighted material. This embedding was deemed sufficient to count as a reproduction under the EU directive on reproductions.
What actions did the court order regarding the AI model’s training data?
The Munich Regional Court ordered the disclosure of the model’s training data as part of the injunction against GEMA. This requirement aims to verify whether the original lyrics were used during the model’s training process.
Why did the court emphasize the length and complexity of the songs in its decision?
The court highlighted these factors because they make it unlikely that the similarity arose by chance, strengthening the argument for copyright infringement. According to the press release, this emphasis was crucial for classifying the output as a copyright‑relevant reproduction.