Why Book Sales Figures Are So Hard to Interpret (and Complete Sales Figures Nearly Impossible to Find)

The most consequential reporting mistake I’ve ever made was to publish an unsourced data point from the DOJ vs Penguin Random House antitrust trial in August 2022.

During this trial, the US government had to prove that anticipated top-selling books, those receiving an advance of $250,000 or higher, are treated differently by publishers, and market harm would occur to authors of such books if PRH (the biggest of the Big Five) acquired Simon & Schuster. The DOJ won the case, and PRH didn’t acquire Simon & Schuster. A private equity firm later did.

During the trial, countless publishing statistics were discussed but not necessarily sourced, and in the dense thicket of my August 31, 2022 article, DOJ vs. PRH: The Key Questions of the Trial, I listed many of them. For example, of the 58,000 trade titles published per year, fully half of those titles sell fewer than one dozen books, and more broadly 90 percent of titles sell fewer than 2,000 units. While the first statistic seemed rather shocking (and I stated “not a typo” in the article), the second one was fairly ordinary—at least for anyone who’s ever accessed Circana BookScan—plus variations of it sometimes appear in the New York Times. For example, in 2021, they shared a BookScan stat that 98 percent of the books that publishers released in 2020 sold fewer than 5,000 copies.

The general public, along with authors and readers, tends to believe books sell far more than they actually do, so it leads to disbelief and lots of online sharing when these numbers surface, although their meaning is hotly debated (more on that below). It is straightforward to generate a piece of clickbait, in fact, by talking about book sales, especially if you can provide a semblance of sourcing. Some articles are genuinely helpful; the worst are bastions of misinformation and misdirection. In my sharing of these trial stats, I unwittingly became a source of the latter when a reader innocently screenshotted “of the 58,000 trade titles published per year, fully half of those titles ‘sell fewer than one dozen books’” and posted it on Twitter, where it went viral.

The most immediate and substantial response came from author Lincoln Michel, who went to work debunking this statistic; he has a long history of commenting knowledgeably on the industry for an audience of authors. We both assumed that the stat was likely derived from BookScan figures, which only measure US sales of print books through retail channels, such as bookstores, online retailers, and mass merchandisers. I immediately commented on his article, then available for free, guessing the figure probably included many types of publishers, some of whom may not focus on print sales or bookstore sales. On the other hand, traditionally published authors tend to highly value bookstore sales. I wrote, “So in that regard, I think it’s a helpful reminder that print bookstore sales may not drive a book’s success and a lot depends on the publisher and category of book.”

Then BookScan analyst Kristen McLean jumped into the discussion; she spent many years reporting on BookScan sales figures for publishers and the media. She believed the figure likely came from BookScan via parties involved in the trial, so she tried her best to reverse-engineer it. She wrote: “The data below includes frontlist titles from [top 10 publishers] Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not publishers they distribute. Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks.” Then she listed the dataset:

  • 0.4% or 163 books sold 100,000 copies or more
  • 0.7% or 320 books sold between 50,000–99,999 copies
  • 2.2% or 1,015 books sold between 20,000–49,999 copies
  • 3.4% or 1,572 books sold between 10,000–19,999 copies
  • 5.5% or 2,518 books sold between 5,000–9,999 copies
  • 21.6% or 9,863 books sold between 1,000–4,999 copies
  • 51.4% or 23,419 sold between 12–999 copies
  • 14.7% or 6,701 books sold under 12 copies

That means 15 percent of big publisher frontlist (new) books sold less than 12 copies. That’s not half, but it’s not nothing either. McLean commented that she thinks the “real story” is that roughly 66 percent of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks, and less than 2 percent sold more than 50,000 copies. (Seth Godin was so taken with these figures, he made a chart, shown below.) Obviously, this is just a small snapshot of sales, limited to one format in a single year. But notably it’s restricted to publishers who are ranked in the top 10 in their ability to sell print books through conventional retail channels.

Chart by Seth Godin based on analysis by Kristen McLean of data gleaned from the DOJ vs Penguin Random House antitrust trial. 15 percent of big publisher frontlist (new) books sold less than 12 copies. Roughly 66 percent of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks, and less than 2 percent sold more than 50,000 copies.

It can’t be stated often enough: Circana BookScan figures are limited to US print retail sales. That means sales through the register into a customer’s hands. BookScan doesn’t include direct sales by publishers and authors. It doesn’t include ebook sales or audiobook sales. It doesn’t include library sales. It doesn’t include reading and listening activity through subscription platforms. It doesn’t include foreign sales. And more. So it’s an incomplete picture, but again, big publishers and commercial authors alike tend to focus on how well books are selling through outlets like Amazon, Barnes & Noble, independent bookstores, and mass-market merchandisers like Walmart. Those sales matter if you’re trying to land on a bestseller list.

Other sources of book sales data are incomplete in their own way. The Association of American Publishers releases StatShot, which reports sales of all formats combined, but they only report sales from AAP members. Notably, their membership does not include Sourcebooks or Entangled Publishing, among many others. (Last year, we all discovered that Sourcebooks now beats Big Five Macmillan when measuring by BookScan units alone.) Publishers Lunch regularly analyzes AAP reports and compares them to sales trends through BookScan to offer a fuller picture of how the industry is performing. BookScan does measure ebook and audiobook sales through a separate service (PubTrack Digital) that relies on publisher-supplied sales data, but they don’t combine all those sales into one figure, at least not for public consumption. Internationally, there’s NielsenIQ BookScan, an entirely different company despite the similar name, that tracks sales in 17 territories, including the UK.

None of these services can tell you much about self-publishing book sales, where ebook sales dominate. Amazon is by far the most important retailer for them, and Amazon doesn’t report on ebook sales. Bookstat, which started off as Author Earnings in 2014, extrapolates book sales figures from Amazon sales rank, among other sources. More than 10 years ago, authors volunteered to share their sales data with Author Earnings, which helped prove that self-publishing ebook sales were not simply a rounding error and deserved attention. Bookstat to this day offers the sharpest view of what’s happening in the self-publishing market and ebook market, an area that’s only become more significant since the antitrust trial. In 2017, Michael Cader at Publishers Lunch attempted to quantify the US market in a four-part series, which he opened by stating, “All of the core publishing statistics are incomplete in various ways.” At the time, he believed that self-published work constituted about 40 percent of Amazon’s ebook unit sales. He also estimated that publishers earned 80 percent of ebook dollar sales but only 53 percent of unit sales.

For authors who want a reality check on print unit sales of new releases, Publishers Weekly bestseller lists are the best resource, as they include BookScan figures. If you want to determine if a particular book might be performing better in another format, check Amazon Charts, which incorporates Kindle and Audible sales and reads. USA Today is notable for trying to offer a bestseller list that accounts for sales across all formats, but it has become less reliable since 2023.

Why am I dissecting all this now? There’s been a resurgence of interest in this topic after it was reported that Lindy West’s memoir sold 1,800 copies in its first week (that’s the BookScan figure), which people couldn’t quite fathom given how much media attention her book has received. Some commentators, including myself, have tried to use this as a teaching moment. Agent Anna Sproul-Latimer explains that West’s sales figures are not bad if you understand how small the publishing industry is in comparison to other businesses. Others have pointed out West likely sold far more copies in audiobook format, as that’s the pattern for celebrity titles. Others, like Lincoln Michel, have argued in a variety of ways that BookScan figures are incomplete and not to be relied on. All of these things are true.

But I’m also writing about this nearly four years later so I can make amends and clarify this claim that half of all books sell fewer than one dozen copies. I recently saw this Thread from novelist Ruth Ware, who wrote, “I keep seeing the PRH trial cited as the source, but I’ve read the judgement and a lot of the coverage and couldn’t find any mention of the figure.”

I first saw the stat mentioned on Twitter by a reporter from Publishers Weekly who was at the trial on the day the assertion was made in court. Soon after, Publishers Lunch reported the greater context: The DOJ’s lawyer was questioning an expert witness for the defense: “The DOJ’s lawyer Mel Schwarz explained later—to the dismay of any who might see it—that of the universe of 58,000 total titles published a year that they have been discussing, fully half of those titles ‘sell fewer than one dozen books.’” Notably, two years later, Publishers Lunch revisited that conversation when the stat made the rounds due to this misleading, clickbait article about how no one buys books.

Bottom line: While McLean admitted that BookScan figures can be misleading because they don’t include all sales, she added [emphasis hers], “It does represent the general reality of the ECONOMICS of the publishing market. In general, most of the revenue that keeps publishers in business comes from the very narrow band of publishing successes in the top 8–10 percent of new books, along with the 70 percent of overall sales that come from BACKLIST books in the current market.”

BookScan: A Brief History

Before 2001, there was no way for anyone to track industry-wide book sales. Publishers tracked their own sales, and that was it. That changed when Nielsen, already known for measuring TV audiences and music sales, launched BookScan. For the first time, industry professionals could see what was actually selling across the market, not just within their own slice of it. In 2017, Nielsen sold its US book business to NPD Group, which renamed it NPD BookScan. Then NPD merged with another firm to create a new company called Circana in 2023. So it’s now called Circana BookScan. Outside the US, the international BookScan service remained with Nielsen.

In the US, authors can access their own BookScan figures for free through Amazon Author Central. Full subscription access, which allows searching across titles, authors, and categories, is available through Publishers Marketplace to literary agents, scouts, authors, and others approved by Circana. It costs about $3,000 per year for a basic, single-user subscription, but ability to pay does not mean you will get access. When I tried to become a BookScan subscriber last year, I was turned down.

Subscribe to comments
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
oldest
newest most voted
Inline Feedbacks
View all comments