Projects per year
Abstract
Keyword Spotting (KWS) detects a set of pre-defined spo ken keywords. Building a KWS system for an arbitrary set re quires massive training datasets. We propose to use the text transcripts from an Automatic Speech Recognition (ASR) sys tem alongside triplets for KWS training. The intermediate rep resentation from the ASR system trained on a speech corpus is used as acoustic word embeddings for keywords. Triplet loss is added to the Connectionist Temporal Classification (CTC) loss in the ASR while training. This method achieves an Average Precision (AP) of 0.843 over 344 words unseen by the model trained on the TIMIT dataset. In contrast, the Multi-View re current method that learns jointly on the text and acoustic em beddings achieves only 0.218 for out-of-vocabulary words. This method is also applied to low-resource languages such as Tamil by converting Tamil characters to English using transliteration. This is a very challenging novel task for which we provide a dataset of transcripts for the keywords. Despite our model not generalizing well, we achieve a benchmark AP of 0.321 on over 38 words unseen by the model on the MSWC Tamil keyword set. The model also produces an accuracy of 96.2% for classifi cation tasks on the Google Speech Commands dataset.
Original language | English |
---|---|
Title of host publication | Proceedings Interspeech 2022 |
Publisher | ISCA |
Pages | 126-130 |
Number of pages | 5 |
Volume | 2022-September |
DOIs | |
Publication status | Published - 22 Sept 2022 |
Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
ISSN (Print) | 2308-457X |
Conference
Conference | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 |
---|---|
Country/Territory | Korea, Republic of |
City | Incheon |
Period | 18/09/22 → 22/09/22 |
Keywords
- keyword spotting
- low-resource languages
- speech recognition
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modelling and Simulation
Fingerprint
Dive into the research topics of 'Generalized Keyword Spotting using ASR embeddings'. Together they form a unique fingerprint.-
Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA) - 2.0
Campbell, N. (PI), Cosker, D. (PI), Bilzon, J. (CoI), Campbell, N. (CoI), Cazzola, D. (CoI), Colyer, S. (CoI), Cosker, D. (CoI), Lutteroth, C. (CoI), McGuigan, P. (CoI), O'Neill, E. (CoI), Petrini, K. (CoI), Proulx, M. (CoI) & Yang, Y. (CoI)
Engineering and Physical Sciences Research Council
1/11/20 → 31/10/25
Project: Research council
-
Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA)
Cosker, D. (PI), Bilzon, J. (CoI), Campbell, N. (CoI), Cazzola, D. (CoI), Colyer, S. (CoI), Fincham Haines, T. (CoI), Hall, P. (CoI), Kim, K. I. (CoI), Lutteroth, C. (CoI), McGuigan, P. (CoI), O'Neill, E. (CoI), Richardt, C. (CoI), Salo, A. (CoI), Seminati, E. (CoI), Tabor, A. (CoI) & Yang, Y. (CoI)
Engineering and Physical Sciences Research Council
1/09/15 → 28/02/21
Project: Research council