Reddit vs. Anthropic: The Complicated Ethics of AI Training

Share this article
Share this article
Prioritise Us on Google
Reddit is suing Anthropic | Credit: Brett Jordan
Social media platform Reddit has filed a lawsuit against Anthropic, creator of Claude AI, alleging the firm trains its models on user posts without consent

Reddit has filed a lawsuit against Anthropic, the AI company behind the Claude chatbot, amid allegations of unauthorised data scraping.

The social media giant has accused the tech firm of training its LLM by feeding it the posts and comments of users without consent from the individuals or the platform.

The case, logged in California's judicial system, is based on suggestions that Anthropic has made more than 100,000 unauthorised requests to Reddit's servers.

“For its part, despite what its marketing material says, Anthropic does not care about Reddit’s rules or users,” the lawsuit states, according to AI News.

“It believes it is entitled to take whatever content it wants and use that content however it desires, with impunity.”

Reddit CEO, Steve Huffman | Credit: Reddit

Did Anthropic use Reddit's data without permission?

It is important to note that Reddit does not have a blanket ban on AI developers using its data to train models.

The platform has negotiated content licenses with major tech companies like OpenAI and Google, ensuring provisions for responsible data usage and privacy measures are in place.

However, according to Reuters, Reddit's lawsuit suggests Anthropic sidestepped the platform's technical barriers, including a robots.txt file, which serves as a mechanism to direct automated systems on allowed areas for access and scraping.

Despite the presence of these digital 'do not enter' signs for bots and scrapers, Reddit claims Anthropic proceeded without respect, collecting content surreptitiously and in defiance of the social media firm's terms of service, raising alarm over data security and user privacy.

The absence of a formal licensing deal allegedly enabled Anthropic to elude paying associated fees and avoid established privacy protections, which are crucial for maintaining ethical standards in data handling.

Anthropic CEO, Dario Amodei | Credit: Anthropic

A breach of privacy

In the lawsuit, Reddit has presented evidence that Anthropic's Claude chatbot is able to reproduce Reddit posts with remarkable accuracy, including posts and comments that have been deleted from the platform.

Reddit argues that this constitutes a lack of respect for the privacy of users and their data, and that it shows that Anthropic has failed to put appropriate safeguards in place.

Reddit also says that Anthropic's actions are in breach of fair competitive practices, as the access it has offered companies like Google and OpenAI have come at a cost.

The lawsuit urges for financial reparation and seeks a court ruling to restrict Anthropic from incorporating Reddit’s content in any upcoming AI model enhancements.

Key facts:
  • Reddit is suing Anthropic for allegedly scraping over 100,000 user posts and comments without permission to train its AI models
  • Reddit claims Anthropic bypassed technical protections, violated terms of service, and refused to enter a licensing agreement
  • The lawsuit highlights broader industry tensions over data rights, user privacy and ethical AI development

Anthropic's ongoing copyright disputes

This case is the latest in a series of ongoing disputes that Anthropic is facing over its AI training methods. In August 2024, a group of authors accused Anthropic of using their copyrighted literature without consent or compensation.

In October 2023, Universal Music Group and other music industry heavyweights opened legal proceedings against Anthropic, claiming Claude's use of copyrighted song lyrics breached their intellectual property rights.

But whilst these previous cases focused on breaches of IP, Reddit's lawsuit is centred on contractual breaches and alleged unfair competition, insisting that user-generated content on its platform is protected under its terms of service.

Reddit argues that Anthropic ignored has ignored these terms.

Youtube Placeholder

The complicated and ambiguous ethics of AI training

This legal battle is a just one instance of a conflict that is taking place across the industry. Training an AI model requires vast amounts of data, but the non-consensual use of data to train AI has sparked the ire of companies and individuals alike.

In a recent interview with Lex Fridman, OpenAI CEO Sam Altman addressed the issue of copyright and compensation.

“If I was an artist, a) I would like to be able to opt out of people generating art in my style and b) if they do generate art in my style I’d like to have some kind of economic model associated with that.”

Sam Altman, CEO of OpenAI | Credit: OpenAI

But as things stand, web scraping still exists in a fairly nebulous legal zone.

Rightful access can be defined by user agreements and technical protocols like Reddit's robots.txt, but AI copyright law is still a somewhat grey area.

The dispute brings to question Anthropic's duality between public ethical commitments and their actual data acquisition approach, as per Reddit's claims, sowing confusion among users and industry peers regarding ethical AI advancements.

“Reddit’s humanity is uniquely valuable in a world flattened by AI,“ says Ben Lee, Chief Legal Officer at Reddit.

Ben Lee, Chief Legal Officer at Reddit | Credit: Reddit

”Now more than ever, people are seeking authentic human-to-human conversation. Reddit hosts nearly 20 years of rich, human discussion on virtually every topic imaginable.

"These conversations don’t happen anywhere else and they’re central to training language models like Claude.”


Explore the latest edition of Technology Magazine and be part of the conversation at our global conference series, Tech & AI LIVE.

Discover all our upcoming events and secure your tickets today.


Technology Magazine is a BizClik brand