Word Spotting in Silent Lip Videos

Abhishek Jha, Vinay P. Namboodiri, C. V. Jawahar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Our goal is to spot words in silent speech videos without explicitly recognizing the spoken words, where the lip motion of the speaker is clearly visible and audio is absent. Existing work in this domain has mainly focused on recognizing a fixed set of words in word-segmented lip videos, which limits the applicability of the learned model due to limited vocabulary and high dependency on the model's recognition performance. Our contribution is two-fold: 1) we develop a pipeline for recognition-free retrieval, and show its performance against recognition-based retrieval on a large-scale dataset and another set of out-of-vocabulary words. 2) We introduce a query expansion technique using pseudo-relevant feedback and propose a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates. Our word spotting method achieves 35% higher mean average precision over recognition-based method on large-scale LRWdataset. Finally, we demonstrate the application of the method by word spotting in a popular speech video ('The great dictator' by Charlie Chaplin) where we show that the word retrieval can be used to understand what was spoken perhaps in the silent movies.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
PublisherIEEE
Pages150-159
Number of pages10
ISBN (Electronic)9781538648865
DOIs
Publication statusPublished - 3 May 2018
Event18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018 - Lake Tahoe, USA United States
Duration: 12 Mar 201815 Mar 2018

Publication series

NameProceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Volume2018-January

Conference

Conference18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
CountryUSA United States
CityLake Tahoe
Period12/03/1815/03/18

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Word Spotting in Silent Lip Videos'. Together they form a unique fingerprint.

Cite this