3Play Media Releases Annual Study, Finds ASR Technology Showing Signs of Plateau

The latest research additionally shows ASR engines outperforming LLMs

While Automatic Speech Recognition (ASR) technologies are maturing and becoming more sophisticated, human review remains essential for meeting accessibility standards, according to the latest State of ASR report by 3Play Media, the leading media accessibility provider in North America, released today.

“Our research continues to show that while ASR technology has made remarkable strides, we're witnessing an increasing plateau in accuracy improvements for English pre-recorded content," Josh Miller, co-CEO and co-Founder, 3Play Media, said. "The gulf between the leading engines and the rest of the field has widened. However, the error rates across all engines still fall short of meeting accessibility requirements, reaffirming that human-in-the-loop workflows remain critical for captioning and transcription use cases."

The study evaluated speech-to-text technology as it applies to captioning and transcription across 205 hours of diverse audio content, representing a 30% increase in testing volume from the previous year. The expanded dataset of over 1.7 million words spans multiple industries and use cases, providing unparalleled insight into real-world ASR performance. The research evaluated eight ASR engines along with Gemini, a multimodal large language model (LLM) prompted to perform transcription.

A key finding from this year's report is that Whisper X performs significantly differently from the original Whisper models, showing no signs of the hallucination behavior that was observed with Whisper Large V2 and V3, which demonstrated significantly higher rates of hallucinations compared to other engines. Meanwhile, AssemblyAI's Universal-2 model and Whisper X slightly outperformed Speechmatics based on error rates, though all three stood substantially ahead of other engines tested.

As observed in previous years, ASR accuracy varies significantly across different industries, reinforcing the need for specialized approaches depending on content type and use case. The study also found that LLMs are not yet viable replacements for dedicated ASR engines in transcription tasks. The greatest challenge for ASR technology remains sports content, with error rates 3x higher than the best performing industries due to complicated noise environments, unscripted speech, player and coach names, and numerical information with unique phrasing conventions.

Given the plateau in improvements, the report indicates that future ASR innovations are likely to focus less on incremental improvements to English pre-recorded content accuracy and more on real-time applications and non-English language capabilities.

To obtain a free copy of The 2025 State of ASR report, please visit: https://go.3playmedia.com/rs-2025-asr

About 3Play Media

3Play Media provides closed captioning, transcription, and audio description services to make video accessibility easy. We are based in Boston, MA, and have been operating since 2008.

“Our research continues to show that while ASR technology has made remarkable strides, we're witnessing an increasing plateau in accuracy improvements for English pre-recorded content," Josh Miller, co-CEO and co-Founder, 3Play Media, said.

Contacts

Stock Quote API & Stock News API supplied by www.cloudquote.io
Quotes delayed at least 20 minutes.
By accessing this page, you agree to the following
Privacy Policy and Terms Of Service.