TY - GEN
T1 - Towards MOOCs for Lipreading
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
AU - Agarwal, Aditya
AU - Sen, Bipasha
AU - Mukhopadhyay, Rudrabha
AU - Namboodiri, Vinay
AU - Jawahar, C. V.
PY - 2023/1/7
Y1 - 2023/1/7
N2 - Many people with some form of hearing loss consider lipreading as their primary mode of day-to-day communication. However, finding resources to learn or improve one's lipreading skills can be challenging. This is further exacerbated in the COVID19 pandemic due to restrictions on direct interactions with peers and speech therapists. Today, online MOOCs platforms like Coursera and Udemy have become the most effective form of training for many types of skill development. However, online lipreading resources are scarce as creating such resources is an extensive process needing months of manual effort to record hired ac-tors. Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost. In this work, we investigate the possibility of replacing real human talking videos with synthetically generated videos. Synthetic data can easily incorporate larger vocabularies, variations in accent, and even local languages and many speakers. We propose an end-to-end automated pipeline to develop such a platform using state-of-the-art talking head video generator networks, text-to-speech models, and computer vision techniques. We then perform an extensive human evaluation using carefully thought out lipreading exercises to validate the quality of our designed platform against the existing lipreading platforms. Our studies concretely point toward the potential of our approach in developing a large-scale lipreading MOOC platform that can impact millions of people with hearing loss.
AB - Many people with some form of hearing loss consider lipreading as their primary mode of day-to-day communication. However, finding resources to learn or improve one's lipreading skills can be challenging. This is further exacerbated in the COVID19 pandemic due to restrictions on direct interactions with peers and speech therapists. Today, online MOOCs platforms like Coursera and Udemy have become the most effective form of training for many types of skill development. However, online lipreading resources are scarce as creating such resources is an extensive process needing months of manual effort to record hired ac-tors. Because of the manual pipeline, such platforms are also limited in vocabulary, supported languages, accents, and speakers and have a high usage cost. In this work, we investigate the possibility of replacing real human talking videos with synthetically generated videos. Synthetic data can easily incorporate larger vocabularies, variations in accent, and even local languages and many speakers. We propose an end-to-end automated pipeline to develop such a platform using state-of-the-art talking head video generator networks, text-to-speech models, and computer vision techniques. We then perform an extensive human evaluation using carefully thought out lipreading exercises to validate the quality of our designed platform against the existing lipreading platforms. Our studies concretely point toward the potential of our approach in developing a large-scale lipreading MOOC platform that can impact millions of people with hearing loss.
KW - Applications: Social good
KW - Education
UR - http://www.scopus.com/inward/record.url?scp=85149043145&partnerID=8YFLogxK
U2 - 10.1109/WACV56688.2023.00225
DO - 10.1109/WACV56688.2023.00225
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85149043145
T3 - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
SP - 2216
EP - 2225
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PB - IEEE
Y2 - 3 January 2023 through 7 January 2023
ER -