Skip to main content
Courtroom scene with a gavel, OpenAI and Microsoft logos on a screen, and newspaper clippings stacked beside a robot.

Editorial illustration for OpenAI and Microsoft Sued Over Alleged Newspaper Text Scraping Without License

OpenAI, Microsoft Sued Over Unlicensed News Content Scraping

OpenAI, Microsoft sued for scraping newspapers and using text without license

3 min read

Tech giants are facing fresh legal heat over their AI training practices. A new lawsuit targets OpenAI and Microsoft, accusing the companies of systematically harvesting news content without proper authorization.

The legal challenge centers on how artificial intelligence companies obtain training data, a increasingly contentious issue in the rapidly evolving AI landscape. News organizations have grown increasingly concerned about how their intellectual property gets used without compensation.

This lawsuit represents a critical test of copyright boundaries in AI development. By targeting both OpenAI and Microsoft, the legal action suggests a coordinated strategy to challenge how tech companies source and use digital content.

The case could have significant implications for how AI models are built and trained. Newspapers and media outlets are signaling they won't passively watch their content get absorbed into large language models without consent or payment.

Journalists and publishers are watching closely, seeing this as a potential watershed moment for protecting digital content rights in the age of generative AI.

The lawsuit alleges OpenAI and Microsoft simply ignored these rules. The companies supposedly scraped the sites, stripped the copyright notices, and used the text without a license for both training and direct search results. Microsoft is targeted not just as an infrastructure provider, but as a co-designer of the models and a direct beneficiary of the alleged theft.

The plaintiffs are seeking damages exceeding $10 billion, citing US laws that allow for up to $150,000 per work for willful infringement and up to $25,000 for removing copyright information. They argue that because higher-quality datasets were sampled more frequently during training, professional press content had a disproportionate impact on the models. They also want the nuclear option: the destruction of all GPT models and training sets containing their work, a demand the New York Times also made in late 2023.

The mystery of the deleted book datasets It's not just newspapers. OpenAI faces ongoing litigation from authors and publishers over the books used to train its AI. The dispute focuses on internal datasets dubbed "Books1" and "Books2," which allegedly contain massive amounts of e-books downloaded from the pirate library Library Genesis (LibGen).

According to an opinion and order by Magistrate Judge Ona T.

Related Topics: #OpenAI #Microsoft #AI training #Copyright infringement #Large language models #News content #Digital content rights #Tech lawsuit #Generative AI

The lawsuit against OpenAI and Microsoft reveals the high-stakes legal battle brewing around AI's data sourcing practices. News organizations are pushing back hard, claiming these tech giants brazenly scraped copyrighted content without permission or compensation.

The $10 billion damages claim signals how seriously publishers view this alleged intellectual property violation. Microsoft's role isn't just passive - the lawsuit suggests the company was actively involved in model design and benefiting from potentially unauthorized content use.

Stripping copyright notices and using text without licensing represents a provocative legal challenge. These allegations go beyond technical nuance, striking at core questions of digital content ownership in the AI era.

Publishers are sending a clear message: AI development can't happen at the expense of journalistic intellectual property. The massive damages sought underscore the financial and ethical tensions emerging as artificial intelligence rapidly transforms how information gets collected and repurposed.

This legal action could become a watershed moment for how AI companies approach content acquisition and attribution. The outcome might reshape digital content rights for years to come.

Common Questions Answered

How much in damages are OpenAI and Microsoft facing in this lawsuit?

The lawsuit is seeking damages exceeding $10 billion, with potential penalties of up to $150,000 per copyrighted work for willful infringement. This substantial financial claim underscores the serious nature of the alleged intellectual property violations by the tech companies.

What specific allegations does the lawsuit make about OpenAI and Microsoft's content scraping practices?

The lawsuit alleges that OpenAI and Microsoft systematically harvested news content without proper authorization, stripping copyright notices and using text without licensing. Microsoft is not just viewed as an infrastructure provider, but as a co-designer of AI models and a direct beneficiary of the alleged content theft.

Why are news organizations concerned about AI companies' data training practices?

News organizations are increasingly worried about how their intellectual property is being used without compensation in AI training datasets. The lawsuit highlights the growing tension between tech giants and media companies over the unauthorized use of copyrighted content for artificial intelligence development.