The Atlantic

What I Found in a Database Meta Uses to Train Generative AI

Nobel-winning authors, Dungeons and Dragons, Christian literature, and erotica all serve as datapoints for the machine.
Source: Video by The Atlantic. Source: Getty.

Editor’s note: This article is part of The Atlantic’s series on Books3. You can search the database for yourself here, and read about its origins here.

This summer, I a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by and , and a lot more. It brought against Meta by writers who claim that its use amounts to copyright infringement.

You’re reading a preview, subscribe to read more.

More from The Atlantic

The Atlantic5 min read
The Strangest Job in the World
This is an edition of the Books Briefing, our editors’ weekly guide to the best in books. Sign up for it here. The role of first lady couldn’t be stranger. You attain the position almost by accident, simply by virtue of being married to the president
The Atlantic6 min read
The Happy Way to Drop Your Grievances
Want to stay current with Arthur’s writing? Sign up to get an email every time a new column comes out. In 15th-century Germany, there was an expression for a chronic complainer: Greiner, Zanner, which can be translated as “whiner-grumbler.” It was no
The Atlantic6 min read
There’s Only One Way to Fix Air Pollution Now
It feels like a sin against the sanctitude of being alive to put a dollar value on one year of a human life. A year spent living instead of dead is obviously priceless, beyond the measure of something so unprofound as money. But it gets a price tag i

Related Books & Audiobooks