TY - GEN
T1 - Translating sign language videos to talking faces
AU - Mazumder, Seshadri
AU - Mukhopadhyay, Rudrabha
AU - Namboodiri, Vinay P.
AU - Jawahar, C. V.
PY - 2021/12/19
Y1 - 2021/12/19
N2 - Communication with the deaf community relies profoundly on the interpretation of sign languages performed by the signers. In light of the recent breakthroughs in sign language translations, we propose a pipeline that we term "Translating Sign Language Videos to Talking Faces". In this context, we improve the existing sign language translation systems by using POS tags to improve language modeling. We further extend the challenge to develop a system that can interpret a video from a signer to an avatar speaking in spoken languages. We focus on the translation systems that attempt to translate sign languages to text without glosses, an expensive annotation form. We critically analyze two state-of-the-art architectures, and based on their limitations, we improvise the systems. We propose a two-stage approach to translate sign language into intermediate text followed by a language model to get the final predictions. Quantitative evaluations on the challenging benchmarks on RWTH-PHOENIX-Weather 2014 T show that the translation accuracy of the texts generated by our translation model improves the state-of-the-art models by approximately 3 points. We then build a working text to talking face generation pipeline by bringing together multiple existing modules. The overall pipeline is capable of generating talking face videos with speech from sign language poses. Additional materials about this project including the codes and a demo video can be found in https://seshadri-c.github.io/SLV2TF/
AB - Communication with the deaf community relies profoundly on the interpretation of sign languages performed by the signers. In light of the recent breakthroughs in sign language translations, we propose a pipeline that we term "Translating Sign Language Videos to Talking Faces". In this context, we improve the existing sign language translation systems by using POS tags to improve language modeling. We further extend the challenge to develop a system that can interpret a video from a signer to an avatar speaking in spoken languages. We focus on the translation systems that attempt to translate sign languages to text without glosses, an expensive annotation form. We critically analyze two state-of-the-art architectures, and based on their limitations, we improvise the systems. We propose a two-stage approach to translate sign language into intermediate text followed by a language model to get the final predictions. Quantitative evaluations on the challenging benchmarks on RWTH-PHOENIX-Weather 2014 T show that the translation accuracy of the texts generated by our translation model improves the state-of-the-art models by approximately 3 points. We then build a working text to talking face generation pipeline by bringing together multiple existing modules. The overall pipeline is capable of generating talking face videos with speech from sign language poses. Additional materials about this project including the codes and a demo video can be found in https://seshadri-c.github.io/SLV2TF/
KW - POS tagging
KW - Sign language
KW - Sign language recognition
KW - Sign language to text
KW - Sign language translation
UR - http://www.scopus.com/inward/record.url?scp=85122034845&partnerID=8YFLogxK
U2 - 10.1145/3490035.3490286
DO - 10.1145/3490035.3490286
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85122034845
T3 - ACM International Conference Proceeding Series
SP - 1
EP - 10
BT - Proceedings of ICVGIP 2021 - 12th Indian Conference on Computer Vision, Graphics and Image Processing
PB - Association for Computing Machinery
CY - U. S. A.
T2 - 12th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 2021
Y2 - 20 December 2021 through 22 December 2021
ER -