Illustration for: Reddit sues Perplexity over illegal scraping of its content from Google
Policy & Regulation

Reddit sues Perplexity over illegal scraping of its content from Google

3 min read

Reddit sued Perplexity, the AI-driven search tool, on Wednesday, saying the company lifted Reddit posts by scraping Google’s results. The filing claims Perplexity teamed up with “several companies” to dodge the anti-scraping blocks that both Google and Reddit have poured money into. Apparently the defendants “conspired” to pull down comments and posts without asking, slipping past the technical shields meant to guard the data.

Reddit’s complaint hints the whole thing required “substantial investments” from the two tech giants, yet the alleged actors still found a loophole. The details are still thin, but the core accusation is that user-generated content was taken illegally. It feels like another episode in the growing tug-of-war between sites that own the content and AI services that lean on publicly available info to learn and run.

I’m not sure how the judge will read the claims, or if Perplexity will end up paying, but the case is definitely worth watching.

In a lawsuit filed on Wednesday, Reddit accused an AI search engine, Perplexity, of conspiring with several companies to illegally scrape Reddit content from Google search results, allegedly dodging anti-scraping methods that require substantial investments from both Google and Reddit. Reddit alleged that Perplexity feeds off Reddit and Google, claiming to be “the world’s first answer engine” but really doing “nothing groundbreaking.” “Its answer engine simply uses a different company’s” large language model “to parse through a massive number of Google search results to see if it can answer a user’s question based on those results,” the lawsuit said. “But Perplexity can only run its ‘answer engine’ by wrongfully accessing and scraping Reddit content appearing in Google’s own search results from Google’s own search engine.” Likening companies involved in the alleged conspiracy to “bank robbers,” Reddit claimed it caught Perplexity “red-handed” stealing content that its “answer engine” should not have had access to. Baiting Perplexity with “the digital equivalent of marked bills,” Reddit tested out posting content that could only be found in Google search engine results pages (SERPs) and “within hours, queries to Perplexity’s ‘answer engine’ produced the contents of that test post.” “The only way that Perplexity could have obtained that Reddit content and then used it in its ‘answer engine’ is if it and/or its Co-Defendants scraped Google SERPs for that Reddit content and Perplexity then quickly incorporated that data into its answer engine,” Reddit’s lawsuit said.

Related Topics: #Reddit #Perplexity #Google #lawsuit #scraping #AI #search engine #content #LLM #answer engine

Reddit claims Perplexity has been pulling its posts from Google’s search results without permission. The filing says the move involved a handful of other firms trying to dodge the anti-scraping blocks that both Google and Reddit have put in place. Perplexity calls itself “the world’s first answer engine,” but Reddit argues it’s just a repackaged feed of content taken elsewhere.

If the accusations hold up, we might be looking at a wider habit of data-driven services slipping past site-level defenses. The suit, however, doesn’t spell out exactly how Perplexity’s tech works beyond the note that it “simply uses a different company’s” large-scale model. It’s unclear whether a court will deem the scraping illegal or lean on existing case law.

Either way, the dispute shines a light on the tug-of-war between open-web indexing and the rights of platforms that host user-generated posts. Until a judgment lands, how much Reddit material Perplexity actually relies on will stay up for debate.

Common Questions Answered

What specific method did Perplexity allegedly use to scrape Reddit content?

Perplexity allegedly scraped Reddit content by harvesting it from Google's search results, working with several companies to bypass the anti-scraping defenses. The lawsuit claims they conspired to sidestep the technical barriers that both Google and Reddit have heavily invested in to protect their platforms.

According to the lawsuit, how does Reddit characterize Perplexity's 'answer engine' claims?

Reddit characterizes Perplexity's claim of being 'the world's first answer engine' as misleading, alleging the service does nothing groundbreaking. The complaint states that Perplexity merely repackages content harvested from elsewhere, like Reddit posts obtained from Google search results.

What are the anti-scraping defenses mentioned in the Reddit lawsuit?

The lawsuit refers to anti-scraping defenses as technical barriers that both Google and Reddit have invested heavily in to protect their content. Perplexity is accused of conspiring with other firms to bypass these specific safeguards designed to prevent unauthorized harvesting of posts and comments.