TY - GEN
T1 - Inclusive ASR for Critical Public Services
T2 - 28th International Conference on Text, Speech, and Dialogue, TSD 2025
AU - Torgbi, Melissa
AU - Clayman, Andrew
AU - Speight, Jordan
AU - Hirst, Joe
AU - Madabushi, Harish Tayyar
PY - 2025/8/22
Y1 - 2025/8/22
N2 - Recent advances in automatic speech recognition (ASR) have improved the overall performance of speech recognition, yet regional dialects continue to pose significant challenges. This is particularly critical in public service applications such as legal aid and housing support, where bias in ASR systems inadvertently disadvantages vulnerable groups. While fine-tuning existing models using data from the target application is a common approach to addressing bias, the sensitive nature of these services makes this approach infeasible. To overcome this and ensure inclusivity, we collected over 200 h of actor-generated simulated data, aimed at addressing regional dialects in the United Kingdom, where dialects and accents are interlinked with socioeconomic status. Through a set of rigorous experiments, including fine-tuning several models using simulated data, we demonstrate that simulated data not only improves the real-world performance of models but also provides insights into fine-tuning data configurations that are more effective in practice.
AB - Recent advances in automatic speech recognition (ASR) have improved the overall performance of speech recognition, yet regional dialects continue to pose significant challenges. This is particularly critical in public service applications such as legal aid and housing support, where bias in ASR systems inadvertently disadvantages vulnerable groups. While fine-tuning existing models using data from the target application is a common approach to addressing bias, the sensitive nature of these services makes this approach infeasible. To overcome this and ensure inclusivity, we collected over 200 h of actor-generated simulated data, aimed at addressing regional dialects in the United Kingdom, where dialects and accents are interlinked with socioeconomic status. Through a set of rigorous experiments, including fine-tuning several models using simulated data, we demonstrate that simulated data not only improves the real-world performance of models but also provides insights into fine-tuning data configurations that are more effective in practice.
KW - debiasing automatic speech recognition
KW - inclusive speech technologies
KW - simulated data for debiasing
UR - https://www.scopus.com/pages/publications/105014411810
U2 - 10.1007/978-3-032-02548-7_28
DO - 10.1007/978-3-032-02548-7_28
M3 - Chapter in a published conference proceeding
AN - SCOPUS:105014411810
SN - 9783032025470
T3 - Lecture Notes in Computer Science
SP - 331
EP - 342
BT - Text, Speech, and Dialogue - 28th International Conference, TSD 2025, Proceedings
A2 - Ekštein, Kamil
A2 - Konopík, Miloslav
A2 - Pražák, Ondrej
A2 - Pártl, František
PB - Springer
CY - Cham, Switzerland
Y2 - 25 August 2025 through 28 August 2025
ER -