Printed from

Authors Strike Back: Microsoft Sued for Using Books in AI Training

Deepika Rana / Updated: Jun 26, 2025, 19:51 IST

A group of prominent authors, including Pulitzer Prize winners and bestselling novelists, have filed a lawsuit against Microsoft, alleging the unauthorized use of their copyrighted books in the training of artificial intelligence systems. The lawsuit, filed in a U.S. District Court, asserts that Microsoft and its partner OpenAI scraped and processed vast quantities of literary content without proper licensing or permission.

OpenAI Partnership Adds Complexity

The lawsuit not only names Microsoft but also ties in its close relationship with OpenAI, the developer of ChatGPT. Microsoft has invested billions into OpenAI and integrated its technology into tools like Copilot and Bing AI. The plaintiffs argue that both companies benefitted commercially from their intellectual labor, raising concerns about ethical AI development and fair compensation for content creators.

Training Data at the Heart of the Case

At the core of the dispute is how large language models (LLMs) are trained. Authors claim that their books were part of datasets used to teach AI systems how to generate human-like text. They argue this constitutes a clear case of copyright infringement, as these models can produce summaries, reviews, or imitations of their works—undermining their market value and creative control.

Seeking Accountability and Reform

The authors are seeking monetary damages and an injunction to halt the use of their copyrighted material in AI systems without consent. They’re also calling for greater transparency in how training data is sourced. The lawsuit could become a landmark case in defining how intellectual property law applies to artificial intelligence development.

Industry-Wide Implications

This legal action comes amid a wave of similar lawsuits against AI firms, as creators from various fields push back against unconsented data harvesting. If the court sides with the authors, the outcome could significantly reshape AI training practices, forcing companies to license copyrighted works or rely on public domain and opt-in data sources.