Publishers Sue Meta for Systematic Copyright Infringement in Llama Training
Five major publishing houses allege Meta deliberately avoided licensing deals and sourced millions of copyrighted works from pirate sites, testing whether mass-scale LLM training constitutes fair use or willful infringement.
Five major US publishers filed coordinated copyright litigation against Meta and Mark Zuckerberg on 5 May 2026, alleging the company systematically trained its Llama language model on millions of pirated books, journals, and articles without authorization or compensation.
The lawsuit — filed in the US District Court for the Southern District of New York by Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill — represents the first major publisher-led class action against an AI company. It alleges Meta acquired at least 666 copies of plaintiff works via torrenting from Library Genesis, Z-Library, and Anna’s Archive, according to Authors Guild court filing analysis. The complaint claims Meta executives made a calculated decision to rely on pirated content rather than negotiate licensing deals, despite internal discussions about establishing a $200 million dataset licensing budget.
“Meta made calculated decisions to enrich itself with literary properties that it did not create and does not own, when instead it could have partnered with publishers and authors.”
— Maria Pallante, CEO, Association of American Publishers
The Fair Use Gambit
Internal Meta communications disclosed in court filings reveal the company’s strategic reasoning. Between January and April 2023, Meta considered increasing its dataset licensing budget from $17 million to $200 million, but the effort was abandoned after escalation to Zuckerberg, per Publishers Weekly. An internal employee memo stated bluntly: “if we license once single book, we won’t be able to lean into the Fair Use strategy.”
The publishers argue this demonstrates willful infringement — a critical distinction under Copyright law. Statutory damages for willful infringement can reach $150,000 per work, creating potentially catastrophic exposure when multiplied across millions of copyrighted titles. Meta maintains the opposite position. “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use,” a Meta spokesperson told The Bookseller. “We will fight this lawsuit aggressively.”
Judge Vince Chhabria granted partial summary judgment to Meta on fair use grounds in the Kadrey v. Meta case on 25 June 2025, ruling that LLM training constitutes transformative use. However, claims related to torrenting pirated works and deliberately stripping copyright information remain active, according to Norton Rose Fulbright legal analysis.
The Economics of Avoidance
The lawsuit arrives as the AI training data licensing market rapidly matures. The Association of American Publishers estimated in April 2025 that the licensing market stands at $2.5 billion currently, with projections reaching $30 billion within a decade. Meta’s decision to bypass this emerging market — despite a $1.4 trillion market valuation at the time — forms a central pillar of the publishers’ case.
Court filings reveal TechCrunch reported Zuckerberg personally approved use of Library Genesis despite internal warnings that reliance on pirated sources could “undermine regulatory position.” The publishers allege this demonstrates both awareness of legal risk and conscious disregard for copyright holders’ rights.
Publisher Strategy and Precedent
The coordinated publisher action differs strategically from previous author-led suits. While individual authors in Kadrey v. Meta emphasized output harm — arguing Llama could generate derivative works that substitute for original books — publishers emphasize institutional market harm. Their complaint foregrounds the existence of viable licensing markets and Meta’s deliberate decision to circumvent them.
This framing aligns with the $1.5 billion settlement Anthropic reached with authors in September 2025, reported by Investing.com. That settlement — which paid approximately $3,000 per infringed work — established a damages baseline for cases involving pirated training data, though final fairness hearings concluded only in April 2026. The Anthropic precedent suggests settlement economics may favor publishers even without definitive fair use resolution.
- Coordinated publisher litigation creates both statutory damages exposure and reputational risk for Meta, with potential liability exceeding billions if courts reject fair use defense
- Anthropic’s $1.5 billion settlement establishes damages precedent, though fair use jurisprudence remains unsettled across circuits
- Meta’s internal communications reveal strategic avoidance of licensing markets, potentially undermining fair use claims by demonstrating willfulness
- Emerging licensing markets ($2.5B currently, projected $30B by 2035) suggest viable commercial alternatives existed to piracy-based training approaches
Regulatory and Market Ripple Effects
The litigation arrives as AI companies face mounting copyright challenges across jurisdictions. OpenAI faces multidistrict litigation consolidated in the Southern District of New York, while similar suits target Stability AI, Midjourney, and others. The coordinated publisher response signals industry-wide mobilization beyond individual author actions.
For enterprise AI adopters, the lawsuit amplifies indemnification concerns. Many AI vendors offer limited copyright protection, leaving corporate users exposed to downstream infringement claims. Meta’s Llama model — released as open-source — creates particularly complex liability chains, as enterprises fine-tuning the model on proprietary data may inherit foundational copyright exposure.
Judge Chhabria’s June 2025 ruling in Kadrey v. Meta offered provisional comfort to AI developers, holding that training itself constitutes transformative fair use. But he noted pointedly: “These products are expected to generate billions, even trillions, of dollars for the companies that are developing them. If using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it.”
What to Watch
The Southern District of New York will now determine whether to consolidate this publisher suit with pending author claims or proceed on separate tracks. Meta’s defense will likely emphasize the Chhabria fair use ruling while attempting to distinguish torrenting activities from training use. Discovery will focus on internal Meta communications regarding licensing negotiations and Zuckerberg’s personal involvement in training data decisions.
The litigation timeline extends years, but settlement pressure may accelerate. Anthropic’s $1.5 billion resolution — despite stronger fair use arguments for its Claude model — suggests Meta may face commercial incentives to settle regardless of legal merits. For publishers, the suit represents both financial recovery and strategic positioning as AI transforms content distribution. The outcome will establish whether AI companies must license training data or can rely on fair use defenses — a distinction worth tens of billions as the licensing market scales.
Parallel developments in EU AI Act implementation and potential US federal AI legislation may overtake judicial resolution, creating regulatory frameworks that supersede copyright jurisprudence. Until then, the publishing industry’s coordinated litigation offensive forces Meta to defend its training practices in the most hostile venue: discovery of internal communications that reveal strategic decisions to prioritize pirate sites over compensation.