ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition

Amita Pal, Smarajit Bose, Mandar Mitra, Sandipan Roy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.

LanguageEnglish
Title of host publicationProceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012
Subtitle of host publicationVolume 2
Pages936-939
Number of pages4
StatusPublished - 1 Dec 2012
Event2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012 - Las Vegas, NV, USA United States
Duration: 16 Jul 201219 Jul 2012

Conference

Conference2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012
CountryUSA United States
CityLas Vegas, NV
Period16/07/1219/07/12

Fingerprint

Identification (control systems)
Experiments

Keywords

  • Classification accuracy
  • Gaussian mixture models
  • Mel frequency cepstral coefficients
  • Robust speaker recognition

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition

Cite this

Pal, A., Bose, S., Mitra, M., & Roy, S. (2012). ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition. In Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012: Volume 2 (pp. 936-939)

ISIS and NISIS : New bilingual dual-channel speech corpora for robust speaker recognition. / Pal, Amita; Bose, Smarajit; Mitra, Mandar; Roy, Sandipan.

Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012: Volume 2. 2012. p. 936-939.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pal, A, Bose, S, Mitra, M & Roy, S 2012, ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition. in Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012: Volume 2. pp. 936-939, 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012, Las Vegas, NV, USA United States, 16/07/12.
Pal A, Bose S, Mitra M, Roy S. ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition. In Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012: Volume 2. 2012. p. 936-939
Pal, Amita ; Bose, Smarajit ; Mitra, Mandar ; Roy, Sandipan. / ISIS and NISIS : New bilingual dual-channel speech corpora for robust speaker recognition. Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012: Volume 2. 2012. pp. 936-939
@inproceedings{24f2958f5f274d8da949f1bcac64d2f2,
title = "ISIS and NISIS: New bilingual dual-channel speech corpora for robust speaker recognition",
abstract = "It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.",
keywords = "Classification accuracy, Gaussian mixture models, Mel frequency cepstral coefficients, Robust speaker recognition",
author = "Amita Pal and Smarajit Bose and Mandar Mitra and Sandipan Roy",
year = "2012",
month = "12",
day = "1",
language = "English",
isbn = "9781601322258",
pages = "936--939",
booktitle = "Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012",

}

TY - GEN

T1 - ISIS and NISIS

T2 - New bilingual dual-channel speech corpora for robust speaker recognition

AU - Pal, Amita

AU - Bose, Smarajit

AU - Mitra, Mandar

AU - Roy, Sandipan

PY - 2012/12/1

Y1 - 2012/12/1

N2 - It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.

AB - It is standard practice to use benchmark datasets for comparing meaningfully the performance of a number of competing speaker identification systems. Generally, such datasets consist of speech recordings from different speakers made at a single point of time, typically in the same language. That is, the training and test sets both consist of speech recorded at the same point of time in the same language over the same recording channel. This is generally not the case in real-life applications. In this paper, we introduce a new database consisting of speech recordings of 105 speakers, made over four sessions, in two languages and simultaneously over two channels. This database provides scope for experimentation regarding loss in efficiency due to possible mismatch in language, channel and recording session. Results of experiments with MFCC-based GMM speaker models are presented to highlight the need of such benchmark datasets for identifying robust speaker identification systems.

KW - Classification accuracy

KW - Gaussian mixture models

KW - Mel frequency cepstral coefficients

KW - Robust speaker recognition

UR - http://www.scopus.com/inward/record.url?scp=84873303777&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781601322258

SP - 936

EP - 939

BT - Proceedings of the 2012 International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV 2012

ER -