TY - GEN
T1 - Word Spotting in Silent Lip Videos
AU - Jha, Abhishek
AU - Namboodiri, Vinay P.
AU - Jawahar, C. V.
PY - 2018/5/3
Y1 - 2018/5/3
N2 - Our goal is to spot words in silent speech videos without explicitly recognizing the spoken words, where the lip motion of the speaker is clearly visible and audio is absent. Existing work in this domain has mainly focused on recognizing a fixed set of words in word-segmented lip videos, which limits the applicability of the learned model due to limited vocabulary and high dependency on the model's recognition performance. Our contribution is two-fold: 1) we develop a pipeline for recognition-free retrieval, and show its performance against recognition-based retrieval on a large-scale dataset and another set of out-of-vocabulary words. 2) We introduce a query expansion technique using pseudo-relevant feedback and propose a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates. Our word spotting method achieves 35% higher mean average precision over recognition-based method on large-scale LRWdataset. Finally, we demonstrate the application of the method by word spotting in a popular speech video ('The great dictator' by Charlie Chaplin) where we show that the word retrieval can be used to understand what was spoken perhaps in the silent movies.
AB - Our goal is to spot words in silent speech videos without explicitly recognizing the spoken words, where the lip motion of the speaker is clearly visible and audio is absent. Existing work in this domain has mainly focused on recognizing a fixed set of words in word-segmented lip videos, which limits the applicability of the learned model due to limited vocabulary and high dependency on the model's recognition performance. Our contribution is two-fold: 1) we develop a pipeline for recognition-free retrieval, and show its performance against recognition-based retrieval on a large-scale dataset and another set of out-of-vocabulary words. 2) We introduce a query expansion technique using pseudo-relevant feedback and propose a novel re-ranking method based on maximizing the correlation between spatio-temporal landmarks of the query and the top retrieval candidates. Our word spotting method achieves 35% higher mean average precision over recognition-based method on large-scale LRWdataset. Finally, we demonstrate the application of the method by word spotting in a popular speech video ('The great dictator' by Charlie Chaplin) where we show that the word retrieval can be used to understand what was spoken perhaps in the silent movies.
UR - http://www.scopus.com/inward/record.url?scp=85050975448&partnerID=8YFLogxK
U2 - 10.1109/WACV.2018.00023
DO - 10.1109/WACV.2018.00023
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85050975448
T3 - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
SP - 150
EP - 159
BT - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
PB - IEEE
T2 - 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Y2 - 12 March 2018 through 15 March 2018
ER -