Skip to main content
The New York Times headline announcing Microsoft’s supercomputer training ChatGPT using its articles, highlighting AI advance

Editorial illustration for NYT says Microsoft built supercomputer that trained ChatGPT on its articles

NYT says Microsoft built supercomputer that trained...

NYT says Microsoft built supercomputer that trained ChatGPT on its articles

3 min read

The New York Times is taking its 2023 lawsuit against OpenAI a step further, now pointing a finger directly at Microsoft. The paper alleges that the tech giant built a “supercomputer” expressly to train large‑language models on the Times’ copyrighted articles, without permission. According to the updated complaint, the machine was not a generic cloud service but a purpose‑built system that “disproportionately featured Times Works” so the resulting AI could mimic high‑quality journalism.

The Times says the same setup let Microsoft seize copyrighted content, then use it to power products that have siphoned clicks, affiliate revenue and even substituted for paid subscriptions. It also claims the arrangement has inflated Microsoft’s market value, adding a trillion dollars to its capitalization. While the initial filing described Microsoft’s role in broad terms, the new version zeroes in on the hardware itself, describing it as “unusually complex” and designed to train the “most capable LLM in history.” The move underscores a growing clash between publishers and AI developers over the use of copyrighted material.

The prominent newspaper alleged that ChatGPT was illegally trained on its articles, infringed on its copyrights by outputting articles verbatim, and caused market harms by positioning ChatGPT as a substitute for a NYT subscription, as well as reputational harms by falsely attributing claims to NYT reporting. Additionally, ChatGPT outputs summarizing Wirecutter reviews robbed writers of commissions from lost clicks on affiliate links, the NYT alleged.

In the initial complaint, the NYT discussed Microsoft's supercomputing systems as if they were providing generic cloud computing services. The updated complaint seeks to specify that the supercomputer was tailor-made to help OpenAI infringe and allege that it was built for the explicit purpose of training AI on copyrighted works without permission. And as the NYT alleged, its articles were more heavily weighted by this system, as both firms hoped to train models on the highest-quality journalism possible, so that level of writing could be confidently mimicked in outputs.

By building this "unusually complex" machine, Microsoft not only helped select the works that were infringed but also provided a means to seize copyrighted works without permission, the NYT alleged.

"Microsoft specifically designed it for the purpose of using essentially the whole Internet--curated to disproportionately feature Times Works--to train the most capable LLM in history," the NYT alleged.

And now it's allegedly unfairly profiting.

"Microsoft's deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone," the NYT alleged.

Why this matters

The New York Times’ lawsuit puts Microsoft’s role in OpenAI’s training pipeline front‑and‑center, accusing the supercomputer it built of feeding ChatGPT with the paper’s copyrighted articles. According to the filing, the model reproduced NYT text verbatim, effectively offering a free substitute for a subscription. It also allegedly attached the newspaper’s byline to statements it never made, creating reputational damage.

For developers, this raises a clear question about the provenance of training data and the legal exposure of using large‑scale compute resources supplied by cloud partners. Founders must weigh the cost of building or renting such infrastructure against the risk of inadvertently violating copyright. Researchers should expect tighter scrutiny of data‑collection practices, and possibly more defensive documentation of provenance.

Whether the claims will hold up in court remains uncertain, but the case underscores that technical ambition alone does not shield projects from legal accountability. Our community would do well to monitor how this dispute shapes policy and partnership choices moving forward.

Further Reading