Book Publishers and AI Licensing Deals: The Value Remains Unclear for Everyone

If you survey any group of authors on how they feel about AI licensing deals, the overwhelming response is likely to be negative. Distributor Draft2Digital found this out when they surveyed authors in September: Roughly 50 percent of them said noamount of money would ever be enough for them to license their work for AI training, even if the use case were non-competitive. Authors, as well as much of the general public, find AI companies untrustworthy or unethical, not to mention many believe the technology itself is harmful to creative people and society at large.

In a panel hosted last week by the Book Industry Study Group, book publishing industry experts discussed challenges and opportunities for AI content licensing. Moderator Thad McIlroy began the panel by noting that it’s a “tense landscape” for hosting such a discussion, given the creative community’s predominantly negative views of AI. Yet the panel itself was reasonable, open-minded (perhaps disturbingly so for any authors listening), and pragmatic about opportunities presented by the current AI landscape. Panelists included Pamela Malpas, a literary agent; Paul Sweeting at the RightsTech Project; Kris Kliemann, a book publishing consultant with expertise in licensing deals; and Michael Bommarito, a researcher, academic, and entrepreneur with expertise in AI.

So far, there is no model for valuing data for AI training, and pricing for such rights remains arbitrary, said Sweeting. This licensing challenge is the result of the vast scale at which generative AI operates, which is unprecedented in the media industry. GPT4 (the power behind ChatGPT and other models) was trained on billions of individual works, or something on the order of 10 trillion English words. That means, according to Sweeting, “Data is really only valuable in this context in aggregate. What that further implies is that any given collection of works, any given archive or corpus of works—let alone individual works an author or artist may own the rights to—is weighted against the immense quantity of other content that’s being used to train these models. It doesn’t affect the efficiency or capability of the model if [your work] is not included in the training data.”

His comments echoed the words of analyst Benedict Evans writing on AI in 2023: “Your novel or song or article is just one grain of dust in the Great Pyramid.” The largest AI companies are focused on striking deals for very large, professionally produced bodies of work, not individual works. This makes a collective management or blanket licensing system for text-based work the only feasible path if authors want to see some money for AI training, something that the Authors Guild and others have been working on.

Many large media companies have been striking licensing deals and taking the cash while it’s offered. (See an updated running list here.) The New York Times, before it sued OpenAI and Microsoft over copyright infringement, was in negotiations and trying to come to a deal. But when the two sides couldn’t agree on how to value the content for training, the Times became frustrated and sued instead.

Book publishers that have struck licensing deals are from the scholarly corners of the industry, including Wiley, Taylor & Francis, and Oxford University Press; many of these deals have been for backlist titles. While these deals remain challenging to facilitate, it’s potentially a major opportunity, said Sweeting, particularly for publishers with shareholders or outside investors. “As a publisher, you’re not really supposed to ignore those sorts of opportunities. It’s something that publishers of all kinds need to figure out. It’s really just too big to simply say no to. What do you get out of saying no?”

Kliemann agreed in theory, saying, “The general perspective I bring is, how can we make a deal here? What’s the right way to move forward with this opportunity? Let’s get to yes. Obviously we’re going to negotiate, but we’re not in the business of keeping our content a secret. We’re all here to get it out to the world. So I’ve always seen that licensing has a role in that.” She added that she’s not a complete Pollyanna about AI companies, but she does see how some of them are working toward good ends (e.g., finding cures to disease by licensing large swaths of content from the scientific publishing community).

Can authors keep their works out of these licensing agreements? Yes, that is doable and possible, Kliemann said. All publishers are aware that some authors are saying “Don’t do it” and asking for contract language that prohibits training; the book’s metadata can automatically keep it out of any collections licensed for AI training. However, she pointed out this prohibition may not be entirely wise: “This is a marketplace where it is possible to be a part of something and to receive some money.”

The most grave warning during the panel came from academic Mike Bommarito of the ALEA Institute. He believes AI companies are now focused on dividing and conquering people in publishing and creative communities: “The LLM companies kind of surface all of our dirty laundry with each other. We have history, it goes without saying, which makes it easy for us to be divided.” He also argued that, in the end, there will not be any negotiating power downstream for publishers once consumers get accustomed to using AI tools. “It looks like easy cash [for publishers]. If some or all of the industry pursues these short-term licensing deals, is it a bridge to a future of a collaborative partnership, or is it short-term and performative and not part of [AI companies’] long-term strategy?” He believes that if anyone should own these repositories of knowledge, it shouldn’t be the upstart AI companies, but “the publishers who have paid to aggregate the rights over centuries now.”

Bottom line: There remains a tremendous lack of transparency around AI deals and what AI licensing and training ultimately means now and in the future. Complicating matters, the whole AI industry is a moving target. Malpas, the literary agent, said, “What we were talking about 12 months ago is different from what we were talking about six months ago, to where we were two weeks ago. … From the rightsholder perspective, there’s a feeling that we’re working blind.” She said agents most often learn what publishers are doing from their earnings reports rather than from proactive conversations. “We don’t know as rights holders what kind of material is going to be fit for training. We don’t have a good sense of what’s valuable. Just STEM titles? Just nonfiction? We know that’s not true. … What is going to define the value?”

Just announced: The Authors Guild has partnered with AI licensing agency Created by Humans (founded by Trip Adler, former CEO of Scribd). Learn more in the New York Times (gift link). So far, Created by Humans has not announced any licensing agreements with AI companies or a payment model.

Jane Friedman

Jane Friedman has spent her entire career working in the publishing industry, with a focus on business reporting and author education. Established in 2015, her newsletter The Bottom Line provides nuanced market intelligence to thousands of authors and industry professionals; in 2023, she was named Publishing Commentator of the Year by Digital Book World.

Jane’s expertise regularly features in major media outlets such as The New York Times, The Atlantic, NPR, The Today Show, Wired, The Guardian, Fox News, and BBC. Her book, The Business of Being a Writer, Second Edition (The University of Chicago Press), is used as a classroom text by many writing and publishing degree programs. She reaches thousands through speaking engagements and workshops at diverse venues worldwide, including NYU’s Advanced Publishing Institute, Frankfurt Book Fair, and numerous MFA programs.