Voice in the machine

As text-to-speech technology improves, we shouldn't forget that narration isn't just reading a book: it's a performance

Photo by Gertrūda Valasevičiūtė on Unsplash

Voice in the machine

Let's talk about supply chains for a moment, shall we?

What happens when you have incredible demand for a product, but constraints on your production — X number of people want to buy from a finite pool of whatever Y is — Tickle-me-Elmos, garden gnomes, Pokémon Cards, etc. You get a shortage, and if economics works as theorists say, the price goes up to accommodate demand.

I've been thinking about how demand and supply work when it comes to more intangible things — stuff you can't pick up off a shelf. Working in student support at a university, we had a lot of discussions about how to scale our efforts, reexamine job duties and processes to try and make things more effective, with the idea that if we could shift our skills to things we were better at, automate other aspects of the job, or assign specialists, we could accommodate more people.

In other cases, you build additional infrastructure to meet the demand. In Brian Merchant's excellent book The One Device: The Secret History of the iPhone, he recounted how the explosive growth in smartphone usage fueled an incredible demand for cellular bandwidth. To build out their networks, companies like AT&T and Verizon had to build more towers, which meant more people working in dangerous jobs — and which ended up meaning that there were more deaths in that specific industry. The scaling up of demand for iPhones (and presumably, devices from other makes) meant incredible demands all the way down the supply chain, and onto the workers who produced the devices that we rely on every day. In many cases, those jobs can't be automated to be made safer.

There's another example that came to mind this week: audiobooks. Publishers Weekly posted a lengthy article by Thad McIlroy called "AI Comes to Audiobooks," in which he proposed that publishers should adopt artificial intelligence as a way to cut costs when it comes to the production of audiobooks and audio content.

Audiobooks are one of the fastest-growing segments of the publishing industry, and I've spoken with authors in recent years who've pointed out that in the pie-chart that is their backlist (print, ebook, and audio), audiobooks are taking up an increasingly bigger part of their income.

There are a lot of reasons for this: up until recently, audiobooks were cumbersome: you'd have to cart around a binder of cassette tapes or CDs any time you wanted to listen to an unabridged novel. Audible and smartphones changed that: everyone soon had a device that they could store an entire library on. Digital audio production has been an innovative field, producing high-quality stories that fuel demand even further. Demand for audiobooks and podcasts has skyrocketed in the last decade.

Audiobook and podcast platforms have also experimented beyond the form of the novel: audio dramas, once a mainstay of major radio networks like NPR or the BBC have made a bit of a comeback, while some books boast full cast readings, new editions featuring the voices of actors from recent adaptations, or stories published exclusively in the medium.

All of that creates an incredible amount of demand for content. Narrators like R.C. Bray, Luke Daniels, and Robin Miles have become stars in their own right, commanding their own audiences for their performances on a wide range of projects. In some instances, demand has become so tight that some authors can't get their preferred narrators who've voiced their books. Last year, Frontlines author Marko Kloos noted that the audio edition of one of his books had to be delayed because of scheduling conflicts and other complications.

One thing that I've come to understand is that an audiobook is essentially a twin of the original novel in terms of work and effort: an author might spend months or years on a book; a considerable amount of work also goes into that alternate edition: not only does the book need to be narrated, it needs to be edited and mixed, both of which take up time and effort from skilled workers.

This is where McIlroy — a founding partner of Publishing Technology Partners, a consulting firm that specializes in helping publishes utilize new technologies ranging from IT to workflow to content management — posits a solution: with the rise of artificial voice technology, publishers could quickly automate the production process. "Wouldn’t it be great if publishers could get rid of “the talent” and reduce the long production cycles, push a button, and, presto, an instant audiobook, ready for sale?" he asks?

It's an incredibly shitty and condescending solution to this problem, in my view.

Plenty of companies have been working on improving their artificial voice technologies. Back in 2018, Google unveiled a system called Google Duplex, which was designed as a way to conduct natural-sounding conversations with a live person: it was designed as a digital assistant and demoed in a series of phone calls reserving a hair appointment and a table at a local restaurant, something you could ask your phone to do for you while you're on the go, and don't have time yourself to do it. It's a neat idea, and the demo (embedded on the Google announcement linked above) sounded pretty natural, if a little off. The announcement wasn't without some controversy, and the company said that it would make some changes so that the assistant would identify itself as an automated process. Three years later, and the technology has rolled out to 49 states (except for Louisiana.)

Artificial voice technology has come a long way in the decades. Indeed — a number of news sites now include widgets that will mechanically read the article to you (here's an example of one of my old posts on The Verge). McIlroy points to these advances as reaching the point where it's ready to play a role in serving up content to readers, pointing to digital assistants like Siri or Alexa, and points to a couple of companies that are operating in the text-to-speech field, like DeepZen, Speechki, and a couple of others.

The problem, he points out, is that there are constraints on this field: policies from places like Audible that specifically prohibit automated text-to-speech technologies on their ACX platform (which is used by freelancers to book narration gigs), and that professional narrators are represented by SAG-AFTRA, which has its own concerns about the introduction of artificial intelligence into this field, and that "The organization offers the full slate of union perks: training, guaranteed minimum rates for recordings, and health and life insurance. Artificial voices require none of these benefits."

The core assumption here is one that's been fostered by the publishing and bookselling industry for decades: that books are easily-shippable commodities that can be produced en masse like anything off of an assembly like. There's truth to that: books are easy to ship and display in stores, while publishers handle the production element. But there are flaws in that assumption: it ignore the gooey mess and quirks that is an author — or a narrator. With audio, that's even more pronounced.

Audiobooks aren't simply a person reading from a book — it's a performative act, one that requires nuance, emotion, and acting to convey the text and story to the listener. This — I've been told by authors, experts, and narrators is one reason why audiobooks and podcasts remain popular: it's not just that we have the device to play the stories back in our ears while we take a walk or commute to work; it's because it's an intimate listening experience for the listener.

Comparing the text-to-speech technology of digital assistants to that of a full-length audio production also strikes me as problematic. I have an Amazon Echo that we'll interact with for basic things like a shopping list or the radio, or Siri on our iPhones. The technology is passible for short commands like asking them for directions, to add bananas to the shopping list, or to recount some bit of trivia, but those are short snippets and interactions. With an audiobook, you're with that voice for hours at a time.

I tend to read and listen to a book at the same time. I might throw an audiobook on while driving, only to pick up a couple of chapters later when I have time (I'm currently doing this with Cadwell Turnbull's excellent novel No Gods, No Monsters), and both experiences are different: they're not entirely interchangeable. I frequently go back and reread chapters because the processes in my head for listening process the text slightly differently than just reading to myself. And, a bad narrator can ruin a book for me — pushing me off of it completely. (Or the reverse — I much prefer Ernie Cline's Ready Player One with Wil Wheaton as a narrator than I do the text.) It's why you see publishers go through the effort to re-record books with the actors from an adaptation — such as the case with J.R.R. Tolkien's Lord of the Rings and The Hobbit (read by Andy Serkis), Neil Gaiman and Terry Pratchett's Good Omens (Michael Sheen and David Tennent), or The Eye of the World (Rosamund Pike) — audiences respond to those connections with the actors and narrators.

The technology will get there: there's little doubt in my mind. Look at the rapid developments that we've seen Boston Dynamics undergo with its robots: we laughed the dumb robot when it was knocked over by a hockey stick — but now, just a couple of years later, they can run obstacle courses and dance. Text-to-speech technology will continue to improve. There is value in this technology — talking to Alexa is a frustrating experience, and for places where there's a lot of regular content like a news publication or newsletter, having the ability to populate a natural-sounding narration would be a cool thing to have.

But I think there's a distinction to be made between something like a novel or long-form nonfiction, and listening to shorter works like news articles and updates: fiction and non-fiction isn't designed just to deliver information to the listener: they're designed to convey a feeling, a vibe, or a tone. Waving that away in the name of cost cutting and expedience does not only a disservice to the people who work on these products, but it feels like it lessens the work itself, diminishing it to a mere commodity to be consumed and disposed of. Good art stays with you, and artists actively improve the work. One author I spoke with recently explained that his narrator for his book had such a compelling take on one of his characters that it influenced how he imagined and wrote him after hearing the performance.

As the technology improves and as demand continues to stress the audiobook supply chain, the pressure on publishers to cut their production costs will grow while companies and labor advocates will be pressured to defend the value of their workers. Unless there are systematic protections for workers, narrators will find themselves out in the cold, and those who remain will likely face working conditions and compensations that won't entice them to remain in the field, unless they're a rare superstar that can command better pay and conditions.

McIlroy's dismissive attitude towards the livelihood of those artists responsible for creating this entire field is enraging and dehumanizing, but not surprising — it's part of a long tradition of disruptions that sees human labor as an inconvenience to be solved through automation and systems, rather than the value that they bring to the format — arguably, the reason why that format is so popular in the first place.

Currently Reading

One book that I finished this week was The Art and Soul of Dune by Tanya Lapointe — I love the big, chunky art books full of concept art and behind the scenes insights, and the look and feel of Dune meant that this was an instant-preorder when it was announced last year. It's a great look at the making of the film, with some interesting insights that helped fill in some thematic gaps that I didn't quite realize existed after seeing it in theaters. (I still plan to jot down some thoughts about Dune soon.)

On the to-read list this week: Cadwell Turnbull's No Gods, No Monsters and John Scalzi's Kaiju Preservation Society are both at the top of the list, along with Chuck Wendig's The Book of Accidents and Douglas Wolk's All of the Marvels: A Journey to the Ends of the Biggest Story Ever Told.

I also recently picked up a new title by Mark McGurl: Everything and Less: The Novel in the Age of Amazon, which looks like an interesting examination of modern publishing (something I'm hoping will be useful for a profile I'm writing.)

In case you missed it yesterday, I released a lengthy interview with P. Djèlí Clark about his books Ring Shout, A Master of Djinn, and a couple of others. I really loved his books when I read them earlier this year, and we chatted about his work as a historian and how that informs his writing, about race, alternate histories, and quite a bit more. You can read the entire interview here.

A Dark Forest

Writing on Medium, Fonda Lee has some thoughts about some memorable moments on Twitter and the way the services facilitates conversations or controversy. Recently, she's attracted the ire of various users after going viral. Most recently, she tweeted about how she loved Dune, but critiqued the "discourse" that was going around about the film — which many interpreted to mean some specific criticisms about the role of people of color in the world.

That's not actually what she was talking about — she had in mind something else: the discourse about how the film stacked up to David Lynch's original movie, the original novel, and so forth. That intention didn't matter.

What she writes about is pretty important I think: when I joined Twitter more than a decade ago, there was more of a water cooler element to it between SF/F author, an environment that really fostered community. Now, it's something like Liu Cixin's The Dark Forest, where you worry about sticking your head out too far before getting sniped by a competitor.

Lee points to the ease to which one's thoughts can be mischaracterized and blown up, and how that's really counter to they way that we actually communicate with one another. The result is potentially attracting attention amidst a crowd of around 200 million people, none of whom have any stake in the actual conversation or intent of the author. Sometimes that's good — shouting down a blatant white supremacist or racist or transphobe (or any mindsets that are actively harmful to the larger community) is good. But if your targeting is off and you can't actually tell if someone's being awful, the result is like a case of friendly fire.

All in all, it's a good reminder to treat one's existence on social media carefully, because the environment is increasingly hazardous and fraught.

Fall TV

The next two months are bringing some highly-anticipated shows to a screen near you, and earlier this week, I put together a bit of a roundup that listed them off by release date, if anything, to keep them straight in my head. I didn't send this out to any subscribers — you didn't miss it in your inbox!. What are you looking forward to the most?

Three-Body Problem: the other series

If you've read and enjoyed Liu Cixin's science fiction novel The Three-Body Problem, a trailer dropped that should be of interest: one for a live-action TV series.

But Andrew! You might ask: didn't Netflix just announce casting for this? Is this project already filming? The answer? This is a different adaptation of the novel.

The trailer in question comes from Tencent, the Chinese media company that holds the rights to the project, and which has apparently been adapting locally. I've honestly lost track of the adaptations of this particular novel: there was a big-budget movie directed by Zhang Panpan that had gone into production, but was then shelves (according to people I spoke with, it was a pretty bad adaptation), and then there was an adaptation entirely animated in Minecraft from China called My Three-Body. Then there was an anime adaptation that would apparently run in 2021, which was apparently released to a lot of success online in China. And finally, there's the adaptation that Netflix announced to much fanfare last year, and this locally-produced version from Tencent, directed by Yang Lei.

Word broke about the series back in June, and Global Times reported that this is a separate live-action adaptation from the failed movie, and that it's been in the works for several years, with cameras rolling in July 2020. This particular adaptation will run for 24 episodes, although there isn't a release date just yet. It's also not clear if it'll end up coming to the US: I can't imagine that Netflix or Tencent would want two live-action shows on the same platform, given the confusion that could bring to subscribers.

But hopefully they'll both be available at some point down the road: it would make for an interesting exercise to compare the two.

New look for some old books

Del Rey revealed its next phase of its Star Wars Expanded Universe "Essentials Collection", in which they're dusting off some of the older, non-canon books for a reissue, complete with new covers. Included in that mix? Kenobi by John Jackson Miller, which makes sense, given that the Obi-Wan Kenobi series is coming to Disney+ next year at some point, Darth Plagueis by James Luceno, Darth Bane: Dynasty of Evil (the last of a trilogy that was started with Path of Destruction and Rule of Two) by Drew Karpyshyn, and Wedge's Gamble, the second installment of the X-Wing series by Michael A. Stackpole. No surprise there, given that the first book in the series, Rogue Squadron, came out earlier this year, and given that there's the Rogue Squadron movie coming out in ... 2023 (maybe). I'd guess that the other books in the series will eventually make their way out in this format. Wedge's Gamble is also getting a new audiobook. The new books will all be released on April 5th.

I'm not a huge fan of the covers for the X-Wing novels — the originals were beautiful and iconic, but I dig the overall look of the series.

I'm happy to see that Del Rey is continuing this series: my impression when it was kicked off that it was a bit of a one-off, but now that they've done two rounds, I'm guessing that it'll continue the project for some of the other high points in the SWEU. There's certainly no shortage of books to release: Zahn's Thrawn duology, the Republic Commando series, Han Solo trilogy, Jedi Academy trilogy, Shadows of the Empire, and others. I can't see them releasing some of those soft middle books that popped up in the mid-1990s, like The Crystal Star.

Fans of the Expanded Universe have dogged the new franchise ever since it was wiped away to make way for the new continuity that started with The Force Awaken, and which has slowly been filled in with a handful of comics, games, novels, and TV shows. I think Del Rey whiffed an opportunity by not finding a way to close out that storyline for those long-time fans — nearly thirty years of an ongoing story that was pretty abruptly shifted into non-canon status, especially given that there were some books that had been pitched and announced that were then canceled.

It makes sense in one respect: you have this IP sitting around that's still pretty popular, and even though it isn't canon, there are still people who want to read it between movies or TV seasons. And it throws a bone to those die-hard EU fans who grew up with it: it shows that Del Rey is still aware that they exist and if they continue to sell, maybe they'll add another book or two to that continuity (much like Paramount has done with the soon-to-be-concluded Star Trek: Coda trilogy), to tie off that story.

Related reading:

30 years ago, Timothy Zahn resurrected Star Wars
Michael A. Stackpole's Rogue Squadron novels unlocked Star Wars' vast potential
Returning to a lost galaxy
Building a Galaxy, parts one, two, three, four, five, and six.

November books

In case you missed it: here's the list of new books that are hitting stores in November (and a couple that I missed in October). You can read the genre-sorted list here, and a chronological list here.

Unlocked posts

Over the last year, I've written a bunch of longer posts that went out to paid subscribers, but which I've wanted more people to read — I've been unlocking a handful of those longer features (not the regular news or shorter reports), and this week, that includes the following:

30 years ago, Timothy Zahn resurrected Star Wars. The Thrawn trilogy turned 30 this year, and I took a look at the impact that Heir to the Empire had on the franchise.
Cyberpunk's power came from global dystopian politics. I wrote this last year around the same time that Cyberpunk 2077 game out: a look at the roots of cyberpunk and why it's endured.
Returning to a lost galaxy. Earlier this year, I discovered that someone had printed up and was selling a bunch of copies of a never-published Star Wars novel by Kenneth C. Flint, The Heart of the Jedi. After Amazon shut the listing down, copies started selling for hundreds on eBay. This looks at the story behind that novel and why it was never published.

That work (and the rest of the newsletter!) takes time and effort to produce. If you've read it and found it useful / entertaining / informative, consider supporting Transfer Orbit with a paid subscription.

Paid subscribers get some additional news posts over the course of the week. This week's post on Tuesday included stories about The Book of Boba Fett, Bradly Cooper's long-in-the-works adaptation of Dan Simmons' Hyperion, Netflix's Three-Body Problem and The Witcher, and Amazon's Wheel of Time series.

That's all for this week — thanks for reading. Newsletter supporters can expect a recap of today's episode of Foundation in their inboxes tomorrow, and a news roundup to kick off the week on Monday.

Have a good weekend,

Andrew

Voice in the machine