What Authors Don’t Know about the Anthropic Settlement (But Should)

In Bartz v Anthropic, a class-action lawsuit brought by authors in 2024, the judge had to determine whether Anthropic’s training of its AI chatbot, Claude, on 7 million books constituted fair use. The judge ruled that Anthropic’s acts of training were fair use (legal), but that it was not legal for Anthropic to knowingly download and copy pirated works for its training library.

Ultimately, the two sides settled out of court to the tune of $1.5 billion. But there’s a wrinkle in how the class has been defined that has angered some authors: Works eligible for the class must have been registered with the US Copyright Office. Also, even though authors brought this case, the settlement will be divided between authors and publishers.

These legal details have been little discussed and are not well understood by authors—and that includes me. This week I spoke with Dave Hansen, the executive director of the Authors Alliance and an expert on issues of copyright and fair use. The Authors Alliance is a nonprofit organization based in the San Francisco Bay Area that is similar to the Authors Guild in that it advocates for authors. But its advocacy has a different purpose: to advance the interests of authors who want to serve the public good by sharing their creations broadly. Their membership is populated by scholars, professors, and librarians. You’re more likely to find the Authors Alliance arguing that AI model training is fair use, and they don’t look at copyright cases in the same way as the Authors Guild. For starters, they were not in favor of how the class was certified in Anthropic, which we discuss in this interview.

Prior to joining the Authors Alliance, Hansen was an associate university librarian and lead for copyright and information policy at Duke University Libraries. He has testified before Congress, the US Copyright Office, and the USPTO on copyright, AI, open access, and fair-use issues affecting academic authors and universities. He received his JD and MSLS from UNC Chapel Hill.

This interview has been edited for clarity and shortened for this newsletter.

Jane Friedman: So let’s start by discussing the list of works that Anthropic trained on, however many millions that was. That list was run against copyright registrations at the US Copyright Office, which gives us the current class that’s eligible for the $1.5 billion settlement. Is that correct?

Dave Hansen: Correct. Yeah.

So this settlement leaves out many works that were not registered with the US Copyright Office. Can you talk about how many works that might be and what kind of works they are?

Let’s start with the biggest number and work our way down from there.

So what we know is Anthropic used a couple of different data sets to train. One of those was LibGen. That’s the biggest data set. Then they also used a smaller data set called the Pirate Library Mirror, PiLiMi. Then their third data set was the books that they scanned themselves.

So that’s the large corpus that they had, and we’re talking 7 million plus books there.

Now the court’s decision in this case, it looked at that and treated them pretty differently. The court said, actually, for LLM training, for the pirated books, the LibGen books, and the books you’ve scanned, that’s fair use. But for these copies that Anthropic just retained access to, what the court termed a centralized library, the court could not say was fair use, and it was sort of barreling toward a trial to determine whether it was infringing or not.

So we’ve taken the universe of material and reduced it at least by the size of the books that Anthropic has scanned, and there’s lots of public information about that right now. There’s a Washington Post article about it. It was several million volumes. So then what we’re looking at is which of those materials are actually covered by the class definition, and this is the part of the suit that I think most people, even people who are used to looking at copyright lawsuits, are a little less familiar with.

This is a class action suit, and the way that class actions work is you don’t get to just waltz up into court and claim that you represent whoever you want. The court has to approve that and say, Yeah, you can fairly and adequately represent the interests of these other groups of people who are just like you. So this is where the case got pretty interesting last summer. This case originally started off as really an author’s lawsuit, right? It didn’t include publishers.

Two things happened when it got around to class certification. One is, the court narrowed the class pretty dramatically by limiting it only to works that had been registered with the US Copyright Office before the date on which Anthropic used them. The other piece is the court expanded the class by saying that the class includes anyone who has an interest in the reproduction right in one of these books. That includes authors but potentially publishers as well. That’s important especially for what we’re seeing in the objections [to the settlement].

Can you explain why publishers or authors might decide intentionally not to register for copyright?

Well, it’s just kind of a pain to do. There are a whole lot of books that somebody didn’t want to pay the $35 fee or whatever and run it through that process. The other large group of books that are not registered as frequently are books by foreign publishers, because in most other jurisdictions, you don’t have this registration requirement. So it’s just less part of the practice. Now, of course, big international publishers do this more. So if you look at Cambridge University Press or Oxford University Press, they do register. But if you look at the works list, they are disproportionately underrepresented in that list, I would say. And I believe that’s because they just didn’t register as many of their books as the US-based publishers.

Then there’s the mistake part, too, where publishers just forgot to register.

Sometimes you read your publishing contract and it says, I, publisher, promise to register your work with the US Copyright Office, and they don’t always do that. There are a couple instances that I’m aware of, publishers with egg on their face, authors’ works would’ve been covered under this class and under this settlement, except the publisher didn’t do what they were supposed to.

Right. The big example I’ve seen is Macmillan’s Tor. I don’t know how broadly they forgot to register, but they’ve actually told authors, We’ll make you whole. I haven’t seen other publishers raising their hand like that, which has made me wonder if there’s maybe grounds for a class action by authors left out.

Possibly, because, you know, they had a contractual obligation to do it, and they were excluded here.

Back to the pared down works list, then.

So we started off with about 7 million. With all of those parings down, the class works list I think ended up at right around 482,000, which is still a really big class. I believe—by far, actually—the largest copyright class action that’s been certified.

In the US, do you need to have the work registered to get damages? I’m looking for the reason the court decided, Okay, registration is a requirement to be part of the class.

So in the US, in order to file a lawsuit, you do have to register your work before you file the lawsuit. But it didn’t have to happen the way it was framed here. Here it was you had to have the work registered as of the date of the harm that was done. But that’s not the rule in the United States. The rule is that you just have to register before you file the suit.

Registering before the infringing action has benefits, like one of the things that gets you is access to statutory damages. So the US, it’s sort of special in the world in that there are very few countries that offer statutory damages, which makes it really lucrative in some cases to bring a copyright lawsuit, especially like this, because you can say, Hey, you know, the law says you owe me up to $150,000 per work infringed, and I don’t even have to prove how I was harmed.

So that’s one of the benefits and why people often do register. But you don’t actually have to register until you file a suit. And so why the court did this, I think it just administratively makes it way easier. You don’t have to worry about this issue of people wanting to be in the class but not having an opportunity to register before it was filed—that just kind of creates a mess.

And reading between the lines, if you look at how this judge managed this case, efficiency was a really high priority. There are all these other lawsuits that have been kind of clanging around in the courts for a long time. This suit was filed and resolved on class certification and fair use in, let’s see, I guess it was about eight or nine months—by far the fastest trajectory of any of these suits—and the judge just really pushed it forward, and one of the ways that he did that was by making the class much easier to manage.

You mentioned these other cases that are out there with similarities. Do you think that there’s potential for the class to be defined differently in those cases? That another judge will make a different determination?

Possibly. I mean, even in this case, there are people filing objections to the settlement, and they’re saying, I was unfairly excluded. My book was used, as far as I can tell. But yeah, there are these other suits. … There’s also the Google litigation, which is, like, a super-sized class action with a similar kind of dynamics. The rest of them are kind of further behind, but there are numerous efforts to start to certify the class in these suits. The reason why may be obvious, but just to state it: It gives the class action attorneys a whole lot more leverage to negotiate a big settlement when they’re able to say, I represent 500,000 people or a million people with aggregate claims of multi-billion dollars.

I know some authors who think publishers kind of swooped in and took half the settlement. Do publishers have to be included in it? I mean, the reasoning I hear is that in the Anthropic case, it was piracy, and so that’s why the publishers get part of the settlement.

They don’t have to be included. … There are ways to limit it. I do think it highlights that there really is a tension. The way rights get allocated is typically by a publishing contract, and those publishing contracts can be quite different in terms of what rights they allocate to the author or the publisher for a variety of uses. So I would say at a minimum what would have been better … is if the court had approved two subclasses, one for authors and one for publishers. They really do have different interests. And in some ways they’re potentially antagonistic to each other.

I think the publishers have a strong incentive to say, This is a settlement over rights that are essentially like subsidiary rights. And in a lot of contracts, in subsidiary rights, you are splitting things 50/50. So that sounds, like, really great for the publishers. But another way to look at this is that these are rights that were never granted, and the author holds them all. And [the author] signed a contract that said, I give you the right to publish my book in book form, and that was it, and retained all of those other rights. So a lot of it depends on how you construe what the usage is being litigated and settled over. And this Bartz v Anthropic case makes it especially complicated because the court already ruled that the AI use, the LLM training, was all fair use. So all that’s left is Anthropic making copies of stuff, which does leave open the door for, like, What are they making these copies for?

That’s the messiest part of this whole suit, and one of the reasons why the Authors Alliance filed an amicus brief before the Ninth Circuit, saying, We don’t think this class should be certified. We’re not going to get into the underlying legal claims, but just everybody’s contract is different. The interests are very different here. And having this group of even just three authors represent basically the entire publishing industry seems like not the right approach. There are a diversity of interests here. That’s not to even mention that one of the groups of authors that we work with frequently are academic authors who are trying to distribute their work for free and get widespread dissemination. There are works included in this class that are licensed under Creative Commons, and it’s a little bit confusing to sort through who should get paid for those.

You’ve looked pretty extensively at the works data and written about it, noting some oddities.

The analysis that we did was mostly focused on the bigger set of works that were used. So we pulled down metadata associated with LibGen and tried to just understand more about what’s actually there.

But when you look at the approved works list, you do still end up with some oddities. You’re gonna find books by—I believe Charles Dickens is in there and other things where it’s a very old book that’s in the public domain that has a little bit of copyrighted content added around it, like a preface.

Of course those things are protected, and infringing them is infringing them. But what’s kind of curious here is an author or a publisher who added maybe three pages of copyrighted content around a public domain title is gonna get the same payout as someone who wrote a 400-page deeply researched scholarly monograph or a New York Times bestselling novel. The settlement treats all of these works the same, and in some ways maybe that’s good, but in other ways it just seems a little weird.

I just did a search for Charles Dickens [in the works list], and it’s [laughs] giving me 25 pages of results.

I guess at any point the judge in this case—there’s a new judge on the case now because the first one retired—could still throw a wrench into the affair?

The previous judge actually gave sort of like a laundry list when he handed the case over: Here’s some things that you should look into.

And I think one of those items was attorney’s fees. He had some strong words about how much they were getting paid.

Well, there were two problems. One is the attorneys themselves, like the class-appointed attorneys. Some [forthcoming] hearings are related to how much the attorneys get paid. Maybe this is something that a lot of authors don’t know, but it’s just kind of the way class actions work. They’re really driven by the class action law firms, because they are hoping to get a very big payment out of this. And in this case, they have a motion for $200 million dollars and something as a payout. So it’s a lot of money.

But then the other thing he had a problem with was, it looked like the class-appointed attorneys were farming out some of the work and distributing money to all these other firms, some of whom represented the Authors Guild, some of whom represented the Association of American Publishers. So he said, You’re not the approved class counsel, so, you know, thank you for doing this pro bono, but you’re not gonna be on the take from this $1.5 billion.

So if these other firms get paid, it would have to come from the attorneys who are actually appointed for the classes?

Yeah. But even then, I mean, the court pretty closely manages that. They don’t get to just send a bill and say, This is what it cost us, including stuff we had to pay to other firms. They have to justify it. And that’s why there’s all of these hearings and filings, and that’s why we know things, for example, like how many hours the different firms claim that they spent on this matter. And some of them did do a lot of work. I don’t want to [imply] they just showed up and filed a lawsuit and then are getting a $1.5 billion check out of it for the settlement. They work pretty hard, but they’re also asking for a tremendous amount of money, and it would dwarf anything, certainly anything any individual author would get. [See Hansen’s analysis here.]

But also it will dwarf any publisher receipts out of this as well. Some of the largest publishers—at one point I was poking around the works list, and I think we had a couple publishers that were up to the 20,000-work range or something like that. So there are a few publishers who are slated for a payout of potentially $30 million. But nothing compared to what the attorneys would get out of this.

And maybe this isn’t clear to everyone either, but any money that the attorneys get is money that the class members don’t get. Anthropic is writing a $1.5 billion check, and that’s it. Whoever’s going to get paid gets paid out of that pot of money, and if the attorneys get a lot, it means class members get less, and that’s just how it gets divvied up.

A reminder to all authors: Many of the key deadlines in the Anthropic case (like the objection deadline) have passed, but there’s still time to submit a claim. The deadline is March 30. If you haven’t filed an objection and haven’t excluded yourself from the class, and your book is on the works list, you’re automatically included. As Hansen commented to me, “You might as well get paid” and submit a claim.

Jane Friedman

Jane Friedman has spent her entire career working in the publishing industry, with a focus on business reporting and author education. Established in 2015, her newsletter The Bottom Line provides nuanced market intelligence to thousands of authors and industry professionals; in 2023, she was named Publishing Commentator of the Year by Digital Book World.

Jane’s expertise regularly features in major media outlets such as The New York Times, The Atlantic, NPR, The Today Show, Wired, The Guardian, Fox News, and BBC. Her book, The Business of Being a Writer, Second Edition (The University of Chicago Press), is used as a classroom text by many writing and publishing degree programs. She reaches thousands through speaking engagements and workshops at diverse venues worldwide, including NYU’s Advanced Publishing Institute, Frankfurt Book Fair, and numerous MFA programs.