Ren Shelburne was fed up with trying to listen to popular podcast episodes her friends recommended. Shelburne, a photographer with partial hearing loss and an auditory processing condition, remembers struggling to finish a particular episode. It was a specific type of show: too many talking heads, complicated overlapping dialogue and, until recently, no transcription. “Those I’m just so lost on because there’s just too much going on at once,” Shelburne says. She couldn’t follow along, so she couldn’t discuss the show with her friends. “Podcasts are such a big part of pop culture and media at this point. I want to be able to be a part of that conversation.”
Weekly podcast listenership in the United States has more than quadrupled in the past decade, according to Pew Research. For some, though, the medium still feels inaccessible.
“Sometimes I miss something because of my hearing loss,” says Alexandra Wong, a Rhodes scholar studying digital accessibility, “and I have to go back and rewind it maybe like five or six times to make sure I can understand what’s going on with it.”
Shelburne and Wong are among the roughly 15% of adults in the US, some 37.5m people, who report difficulty hearing without an aid, many of whom rely on captions and transcripts to follow music, movies and podcasts. Video streaming companies like Netflix, Peacock and Hulu offer captions for nearly all their programming, and time-synced lyric captions have become increasingly standard in music streaming. The prevalence of video captions has been embraced by audiences beyond the disability community; 80% of Netflix viewers turn on captions at least once a month.
By contrast, podcasting companies have been late to the accessibility game. Sirius XM and Gimlet have faced lawsuits claiming they’re in violation of the Americans with Disabilities Act for failing to provide transcripts. Spotify, which owns Gimlet, launched podcast transcriptions in September, but the feature is solely available for shows owned by the streaming service and podcasts hosted there.
Apple announced in March that automatically generated transcriptions would be available for any new podcast episodes played on its app via iPads and iPhones on the latest Apple operating system.
“Our goal is obviously to make podcasts more accessible, more immersive,” says Ben Cave, Apple’s global head of podcasts.
Sarah Herrlinger, who manages Apple’s accessibility policy, says the development of the transcription tool involved working with both disabled Apple employees and outside organizations. Transcription became a priority for Apple Podcasts because of increasing demand from both disabled users and podcast creators, she said.
“This is one of the top requested features from creators,” says Cave. “They’ve been asking for this.”
Apple’s journey to podcast transcripts started with the expansion of a different feature: indexing. It’s a common origin story at a number of tech companies like Amazon and Yahoo – what begins as a search tool evolves into a full transcription initiative. Apple first deployed software that could identify specific words in a podcast back in 2018.
“What we did then is we offered a single line of the transcript to give users context on a result when they’re searching for something in particular,” Cave recalls. “There’s a few different things that we did in the intervening seven years, which all came together into this [transcript] feature.”
Cave says that one of the big hurdles in the years of development was ensuring a high standard of performance, display and accuracy. A number of big leaps forward came from accessibility innovation at other departments within Apple.
“In this case, we took the best of what we learned from reading in Apple Books and lyrics in Apple Music,” says Herrlinger. Apple Podcast transcripts borrow time-synced word-by-word highlighting from Apple Music and make use of Apple Books’ font and contrasting color scheme for the visually impaired.
Apple has attempted to make up for its delay in releasing a transcription feature by offering a more wide-ranging one than its competitors. Amazon Music has offered auto-generated transcripts for podcasts since 2021, but they’re available only for their original programs and a smattering of other popular shows, with captions appearing as block text instead of word-by-word highlighting. Spotify released a transcript feature in September 2023 that includes word-by-word highlighting, but it is only available for Spotify originals and shows hosted on its platform.
In the war with Apple over music and podcast streaming, Spotify bungled the launch of an AI-powered translation tool last fall purporting to offer multiple podcasts in French, German and Spanish. The company has seemingly failed to deliver on specific promises made at the time. Currently, the streaming service appears to only have Spanish translations – and predominantly for just one podcast: The Lex Fridman Podcast. In its announcement, Spotify namechecked a number of podcasts as being a part of their translation program – such as The Rewatchables and Trevor Noah’s What Now? – that don’t have transcriptions available in languages other than English at the time of publication, nine months later. Spotify declined to comment when asked about these discrepancies.
Apple’s podcast app will transcribe every new uploaded episode. “We wanted to do it for all the shows, so it’s not just for like a narrow slice of the catalogue,” says Cave. “It wouldn’t be appropriate for us to put an arbitrary limit on the amount of shows who get it … We think that’s important from an accessibility standpoint because we want to give people the expectation that transcripts are available for everything that they want to interact with.”
Cave adds: “Over time, the entire episode library will be transcribed.” However, he says Apple is prioritizing transcripts for new content, declining to say when the transcriptions of back catalogues might come.
Disability activists and users said they believed Apple’s story that the company was working to get it right instead of launching a bad product, though it lagged behind competitors. They said they would rather wait for the right product than for a bad accessibility project to be rushed out.
“I respect that. Having captions and transcriptions that are inaccurate just defeats the purpose,” said Shelburne.
“I was knocked out on how accurate it was,” says Larry Goldberg, a media and technology accessibility pioneer who created the first closed captioning system for movie theaters. The fidelity of auto-transcription is something that’s long been lacking, he adds. “It’s improved, it has gotten better … but there are times when it is so wrong.”
Goldberg and other experts pointed to YouTube’s auto-generated closed captioning tool, available since 2009, as a subpar, rushed product. While YouTube says its transcriptions received a notable upgrade in 2023, the tool has been the subject of frequent criticism over the years for its lack of accuracy. It’s what some critics have dubbed as “craptions” – when the tool mistakes words like “concealer” for “zebra” and “wedding” for “lady”.
Goldberg recalls being shocked at one colleague’s reaction while discussing the misinformation consequences of unreliable transcription: it’s better than nothing.
“That’s your bar for quality? Better than nothing?” Goldberg said. “No, that’s just not good enough.”
In addition to making sure the transcripts capture speech as accurately as possible, Cave at Apple says a lot of work went into training the software to exclude things, too.
“We also want to make it a pleasure to read … That means we wanted to reduce the relative importance of things like filler words, like ‘ums’ and ‘ahs’.” For podcast creators who want those vocal disfluencies transcribed, Apple says they’ll have to upload a custom transcript.
The folks at Apple are already noticing the transcription tool used for a variety of surprising purposes.
“We’re seeing lots of users engaging with transcripts in the language learning space,” Cave says, noting that Apple Podcast transcripts supports English, Spanish, French and German-language podcasts.
“We often find that by building for the margins, we make a better product for the masses,” says Herrlinger. “Other communities will find those features and find ways to use them in some cases where we know this could benefit someone else.”
Goldberg, the accessibility expert, wants to see other platforms adopt Apple’s approach to transcription. His dream is that more companies might start treating podcast transcripts with the same priority given to video content.
“I used to refer to my job as chief begging officer. ‘Please, please put captions on your video, please!’ Not anymore. Oh no. Everyone’s doing it,” he said, referring to his work founding the National Center for Accessible Media at the radio station WGBH in Boston. He says the norm now is that “you simply do not put videos online without captions”. He hopes podcasts follow suit.
Wong, the Rhodes scholar, also has praise for the Apple Podcasts transcript feature – but she also sees some areas for improvement. She has some wariness around the tool’s recognition of complex and unique terms.
“Since it is automatically generated, there are errors that can happen with really bad name spellings in transcripts and really hard scientific terms,” Wong says. Apple acknowledges that better name recognition is already on its radar and that it plans to expand to more languages.