Revealed: The Authors Whose Pirated Books Are Powering Generative AI
Updated at 1:40 p.m. ET on September 25, 2023
Editor’s note: This article is part of The Atlantic’s series on Books3. Check out our searchable Books3 database to find specific authors and titles. A deeper analysis of what is in the database is here.
One of the most troubling issues around generative AI is simple: It’s being made in secret. To produce humanlike answers to questions, systems such as ChatGPT process huge quantities of written material. But few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on.
Some comes from Wikipedia and other online writing, but high-quality generative AI requires higher-quality input than is usually found on the internet—that is, it requires the kind found in books. In a filed in California last month, the writers Sarah Silverman, Richard Kadrey, and Christopher Golden allege that Meta violated copyright laws by using their books to train LLaMA, a large language model similar to OpenAI’s —an algorithm that can generate text by mimicking the word patterns it finds in sample texts. But neither the lawsuit itself nor the commentary surrounding it has offered a look under the hood: We have not previously known for certain whether LLaMA was trained on Silverman’s, Kadrey’s, or Golden’s books, or any
You’re reading a preview, subscribe to read more.
Start your free 30 days