What I Found in a Database Meta Uses to Train Generative AI
Nobel-winning authors, Dungeons and Dragons, Christian literature, and erotica all serve as datapoints for the machine.
by Alex Reisner
Sep 25, 2023
3 minutes
Editor’s note: This article is part of The Atlantic’s series on Books3. You can search the database for yourself here, and read about its origins here.
This summer, I a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by and , and a lot more. It brought against Meta by writers who claim that its use amounts to copyright infringement.
You’re reading a preview, subscribe to read more.
Start your free 30 days