Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech.

Overview

abstract

OBJECTIVE: To evaluate the performance of commercial automatic speech recognition (ASR) systems on d/Deaf and hard-of-hearing (d/Dhh) speech. METHODS: A corpus containing 850 audio files of d/Dhh and normal hearing (NH) speech from the University of Memphis Speech Perception Assessment Laboratory was tested on four speech-to-text application program interfaces (APIs): Amazon Web Services, Microsoft Azure, Google Chirp, and OpenAI Whisper. We quantified the Word Error Rate (WER) of API transcriptions for 24 d/Dhh and nine NH participants and performed subgroup analysis by speech intelligibility classification (SIC), hearing loss (HL) onset, and primary communication mode. RESULTS: Mean WER averaged across APIs was 10 times higher for the d/Dhh group (52.6%) than the NH group (5.0%). APIs performed significantly worse for "low" and "medium" SIC (85.9% and 46.6% WER, respectively) as compared to "high" SIC group (9.5% WER, comparable to NH group). APIs performed significantly worse for speakers with prelingual HL relative to postlingual HL (80.5% and 37.1% WER, respectively). APIs performed significantly worse for speakers primarily communicating with sign language (70.2% WER) relative to speakers with both oral and sign language communication (51.5%) or oral communication only (19.7%). CONCLUSION: Commercial ASR systems underperform for d/Dhh individuals, especially those with "low" and "medium" SIC, prelingual onset of HL, and sign language as primary communication mode. This contrasts with Big Tech companies' promises of accessibility, indicating the need for ASR systems ethically trained on heterogeneous d/Dhh speech data. LEVEL OF EVIDENCE: 3 Laryngoscope, 135:191-197, 2025.

authors

Zhao, Robin

Choi, Anna S G

Koenecke, Allison

Rameau, Anais

publication date

August 19, 2024

published in

The Laryngoscope Journal

Research

keywords

Speech Recognition Software

Identity

PubMed Central ID

PMC11637924

Scopus Document Identifier

85201524169

Digital Object Identifier (DOI)

10.1002/lary.31713

PubMed ID

39157956

Additional Document Info

has global citation frequency

10

volume

135

issue

1

VIVO Weill Cornell Medical College

Quantification of Automatic Speech Recognition System Performance on d/Deaf and Hard of Hearing Speech. Academic Article

Overview

abstract

authors

publication date

published in

Research

keywords

Identity

PubMed Central ID

Scopus Document Identifier

Digital Object Identifier (DOI)

PubMed ID

Additional Document Info

has global citation frequency

volume

issue