Audiobook popularity is increasing, and now AI could make them ubiquitous. Is this a good thing?
AI and the Future of Audiobooks, featuring Simon Vance
If you’re an audiobook listener, you've probably heard or seen comments that suggest that listening to an audiobook is “cheating,“ that it’s not really reading a book. But you know that this is false. When you listen to an audiobook, you listen to a performance of a book, which is as much "the book" as when you view the words. In the voice of a skillful narrator, an audiobook becomes an immersive experience, and can unfold elements of a text that you might not appreciate on paper.
We have reached a fork in the road for audiobooks. While they have never been more popular, as people listen to a wide variety of audio content on the go and at the gym, it is now possible to use synthesized voices to make spoken-word versions of any kind of text that sound sort of real. These voices are much better than the text-to-speech voices you can already access on your computer.
Apple is already offering an AI audiobook option for self-published authors who sell on their Books platform, and it won’t be long before Amazon-owned Audible follows. We will soon have two tiers of audiobooks: those read by experienced narrators, able to tease out the subtleties and nuances of texts, and those read by mechanical voices, that may sound somewhat real, but that lack humanity. Is using AI to narrate audiobooks a form of cheating?
Should all authors have audiobooks?
Audiobooks are big business, projected to reach $35 billion by 2030. About 65% of audiobooks are fiction, and genre fiction - mystery, romance, thriller, fantasy - is by far the most widely listened to type of book. But personal memoirs, read by their authors, also attract. Think of recent books by Michelle Obama and Price Harry; there's a sense of personal connection when listening to a well-known person narrate their own book.
For authors looking to maximize their revenue streams, audiobooks are a great way to tap into readers who might not otherwise discover them. But this comes at a price. Having a professional narrator read a book costs several thousand dollars.
As new possibilities arise for book distribution, self-published, or even traditionally published authors, may be tempted to read their own books. It's cheaper than paying a narrator. In some cases, publishers will provide a studio and producer for novice authors to record their books. In other cases, an author will buy a microphone, connect it to their computer, and read their book, perhaps paying someone to edit the audio file so it is usable. But these books rarely sound the way they should.
The art of narrating audiobooks
Simon Vance is one of the best-known audiobook narrators, and he has narrated more than 1,000 books. "I stopped counting way back," he told me. For him, audiobooks are a sort of intermediary between the author and the reader. A carefully crafted text has certain rhythms, and, "What I try to do is transmit the author's brainwaves to the listener."
I asked Simon if an author who has just published their first novel should record their own audiobook. He laughed, then said, "Very, very rarely. Reading aloud, so that people want to listen to you, is a skill."
Simon trained and worked as an actor before becoming an audiobook narrator, and he brings that training in speaking and rhythm to the books he reads. He stressed how important the nuances of a text are, that can't be captured by synthesized voices. "I'm not sure that AI can do that, ever, in a way that a human being can, who is immersed in the story in the way a human being is. It's that humanity that AI cannot replace." Simon also said that when he is narrating well, "it feels like singing," that the musicality of texts is important.
He pointed out how a narrator knows how to express things differently when a phrase is deliberately repeated, "and give it whatever nuance the author requires in the situation it appears to make full sense and allow full enjoyment of the story." AI can't do that, and it also can't shape sentences musically; "I find these narrations painfully monotonous rhythmically, where true narration is like jazz, or at least much more improvisational."
One of the problems of AI narrations is the number of pronunciation errors they make, of proper names, place names, and foreign words. When listening to Simon Vance's narration of Anthony Powell's A Dance to the Music of Time, which I had previously read in print, I discovered that the first name of the character St John Clarke was pronounced "sinjin." And being a French speaker, hearing French names mispronounced in audiobooks is grating, and prevents me from enjoying books. (To be fair, AI will eventually be taught correct pronunciations.)
The future of audiobooks
It's undeniable that AI will have a big role in future audiobooks; the cost and time savings are substantial. Instead of taking weeks to record, edit, and produce a book, it can be done in a day. It takes just minutes to generate the audio, then an editor can prepare it for release in a few hours. But Simon pointed out that, "publishers are not passing on the savings to consumers, they can charge pretty much the same price for an AI-narrated book as they do for a regular narrated book."
Self-published authors will benefit from low-cost audiobooks, even if they don't express the nuances that professional narrators bring to them, and this will likely not be detrimental to their reputations. But major authors will still want to use narrators, because they won't want to be seen as skimping and providing sub-standard content. And celebrities will still read their own memoirs, their audiobooks being more attractive because of the author's voice.
New AI tools can take a sample of anyone's voice and create a model, then provide a finished book that sounds like it was read by that person. Maybe authors will model their own voices, or maybe there will be models available based on well-known actors, or even audiobook narrators. But, as Simon points out, "You can listen to an artificial narration, but you're not getting the humanity."
Two audio samples
Here are a couple of samples of text-to-speech voices. The first is one of Apple's Siri voices, with a UK English accent
This one is from Eleven Labs, which offers AI voices, and the ability to create a voice model from any voice. I created a model from my voice.
While the reading is much more realistic than the Siri voice, it doesn't sound like me. (You can hear my voice on the Write Now with Scrivener podcast.) The latter sounds much more realistic, but it does lack humanity.
Kirk McElhearn is a writer, podcaster, and photographer. He is the author of Take Control of Scrivener, and host of the podcast Write Now with Scrivener.