Anthropic possibly purchased 1 million print books for scanning and AI model training

More information about large-scale AI model training, specifically by Anthropic, has been reported this week by the Washington Post. As reminder, this is the company that’s paying out a $1.5 billion settlement to authors and publishers in the Bartz v Anthropic case for illegally obtaining millions of ebooks for AI model training. The judge ruled the training itself was fair use, but not using pirated books for such training.

It turns out Anthropic was (maybe still is?) purchasing print books for scanning and AI model training. They bought books from wholesalers and used book retailers, including Ingram and Baker & Taylor, among others. It’s unclear how many books they purchased and scanned, but possibly 1 million or more in the English language.

“Having every book [emphasis theirs] is more valuable to us than a smaller volume of the most critically acclaimed literary works when training our models,” Anthropic noted in a document. But they didn’t focus on self-published books, considering them “lower quality.”

While the Washington Post story is getting a lot of attention and offers in-depth reporting, this revelation about destroying copies for scanning was first noticed and reported by Ars Technica in June 2025.