Reddit Sues Perplexity and Others Over Stolen Posts
In a lawsuit filed in New York, Reddit accuses Perplexity and three firms of building their AI businesses on stolen data.
Topics
News
- Meta’s Chief AI Mind Yann LeCun to Exit, Build Own AI Lab
- Meta Expands AI’s Vocabulary to 1,600 Languages
- Google Bridges Cloud Power and Privacy with Private AI Compute
- California’s Genspark Enters AI Unicorn Club with $200 Million Round
- OpenAI Taps Intel’s Sachin Katti to Build Compute Backbone for AGI
- Flipkart Brings Voice-Led Wholesale Ordering to WhatsApp with Sarvam AI
When OpenAI’s ChatGPT arrived, it sparked a gold rush. Tech companies everywhere began chasing their own conversational AI, hungry for the raw material that makes them possible: data. And data, as it turned out, had a new class of miners.
For years, firms known as “data scrapers” quietly extracted information from the web—parsing Google search results, cataloguing Reddit discussions, indexing the substance of the internet—and then selling it to whoever needed it. When the AI boom began, that trade exploded. Scrapers became suppliers to companies racing to train their models on the sprawling archive of human language.
However now, Reddit, one of the web’s largest public forums, is trying to put a stop to it. In a lawsuit filed in the US District Court for the Southern District of New York, the company accused four firms—SerpApi, Oxylabs, AWMProxy, and Perplexity—of illegally harvesting its data by scraping Google search results that contained Reddit posts. Three of them, Reddit claims, resold that data to OpenAI and Meta. The fourth, Perplexity, is a San Francisco start-up that offers an AI-powered search engine and, Reddit alleges, has been a repeat offender.
The case revives an old argument over who truly owns the internet’s words. For decades, scraping was tolerated, even celebrated. Google itself built an empire on it, crawling websites to fuel its search index —a process that, in turn, sent traffic back to publishers. It was a convenience. But with AI, the balance has tipped. Now, the scraped feel stripped of credit and control.
Reddit, used by more than 400 million people weekly, has begun selling what it once gave away. Its chats have become prized linguistic data. Google and OpenAI have already signed licensing deals. Others, Reddit said, chose to take rather than pay.
According to the suit, the company even laid a trap: a “test post” hidden from all but Google’s crawler appeared hours later in Perplexity’s results. “Perplexity’s business model,” Reddit contends, “is to take Reddit’s content… and call it a new product.”
Whether Reddit can prevail is unclear; the scrapers are scattered across borders and jurisdictions. But the company is persistent in fighting for a shift.