Abstract
It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012 |
Subtitle of host publication | Volume 2 |
Pages | 936-939 |
Number of pages | 4 |
Publication status | Published - 1 Dec 2012 |
Event | 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012 - Las Vegas, NV, USA United States Duration: 16 Jul 2012 → 19 Jul 2012 |
Conference
Conference | 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012 |
---|---|
Country/Territory | USA United States |
City | Las Vegas, NV |
Period | 16/07/12 → 19/07/12 |
Keywords
- Classification accuracy
- Gaussian mixture models
- Mel frequency cepstral coefficients
- Robust speaker recognition
ASJC Scopus subject areas
- Computer Graphics and Computer-Aided Design
- Computer Vision and Pattern Recognition