Visual speech enhancement without a real visual stream

Sindhu B. Hegde, K. R. Prajwal, Rudrabha Mukhopadhyay, Vinay Namboodiri, C. V. Jawahar

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

17 Citations (SciVal)
52 Downloads (Pure)

Abstract

In this work, we re-think the task of speech enhancement in unconstrained real-world environments. Current state- of-the-art methods use only the audio stream and are limited in their performance in a wide range of real-world noises. Recent works using lip movements as additional cues improve the quality of generated speech over "audio-only "methods. But, these methods cannot be used for several applications where the visual stream is unreliable or completely absent. We propose a new paradigm for speech enhancement by exploiting recent breakthroughs in speech- driven lip synthesis. Using one such model as a teacher network, we train a robust student network to produce accurate lip movements that mask away the noise, thus acting as a "visual noise filter". The intelligibility of the speech enhanced by our pseudo-lip approach is comparable ( < 3% difference) to the case of using real lips. This implies that we can exploit the advantages of using lip movements even in the absence of a real video stream. We rigorously evaluate our model using quantitative metrics as well as human evaluations. Additional ablation studies and a demo video on our website containing qualitative comparisons and results clearly illustrate the effectiveness of our approach.

Original languageEnglish
Title of host publication2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
Place of PublicationU. S. A.
PublisherIEEE
Pages1925-1934
Number of pages10
ISBN (Electronic)9780738142661
DOIs
Publication statusPublished - 14 Jun 2021
Event2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021 - Virtual, Online, USA United States
Duration: 5 Jan 20219 Jan 2021

Publication series

NameProceedings - IEEE Winter Conference on Applications of Computer Vision
Volume2021
ISSN (Print)2472-6737
ISSN (Electronic)2642-9381

Conference

Conference2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Country/TerritoryUSA United States
CityVirtual, Online
Period5/01/219/01/21

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Visual speech enhancement without a real visual stream'. Together they form a unique fingerprint.

Cite this