TY - GEN
T1 - Multimodal humor dataset
T2 - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
AU - Patro, Badri N.
AU - Lunayach, Mayank
AU - Srivastava, Deepankar
AU - Sarvesh, Sarvesh
AU - Singh, Hunar
AU - Namboodiri, Vinay P.
PY - 2021/6/14
Y1 - 2021/6/14
N2 - A great number of situational comedies (sitcoms) are being regularly made and the task of adding laughter tracks to these is a critical task. Providing an ability to be able to predict whether something will be humorous to the audience is also crucial. In this project, we aim to automate this task. Towards doing so, we annotate an existing sitcom ('Big Bang Theory') and use the laughter cues present to obtain a manual annotation for this show. We provide detailed analysis for the dataset design and further evaluate various state of the art baselines for solving this task. We observe that existing LSTM and BERT based networks on the text alone do not perform as well as joint text and video or only video-based networks. Moreover, it is challenging to ascertain that the words attended to while predicting laughter are indeed humorous. Our dataset and analysis provided through this paper is a valuable resource towards solving this interesting semantic and practical task. As an additional contribution, we have developed a novel model for solving this task that is a multi-modal self-attention based model that outperforms currently prevalent models for solving this task. The project page for our paper is https://delta-lab-iitk.github.io/Multimodal-Humor-Dataset/.
AB - A great number of situational comedies (sitcoms) are being regularly made and the task of adding laughter tracks to these is a critical task. Providing an ability to be able to predict whether something will be humorous to the audience is also crucial. In this project, we aim to automate this task. Towards doing so, we annotate an existing sitcom ('Big Bang Theory') and use the laughter cues present to obtain a manual annotation for this show. We provide detailed analysis for the dataset design and further evaluate various state of the art baselines for solving this task. We observe that existing LSTM and BERT based networks on the text alone do not perform as well as joint text and video or only video-based networks. Moreover, it is challenging to ascertain that the words attended to while predicting laughter are indeed humorous. Our dataset and analysis provided through this paper is a valuable resource towards solving this interesting semantic and practical task. As an additional contribution, we have developed a novel model for solving this task that is a multi-modal self-attention based model that outperforms currently prevalent models for solving this task. The project page for our paper is https://delta-lab-iitk.github.io/Multimodal-Humor-Dataset/.
UR - http://www.scopus.com/inward/record.url?scp=85116131175&partnerID=8YFLogxK
U2 - 10.1109/WACV48630.2021.00062
DO - 10.1109/WACV48630.2021.00062
M3 - Chapter in a published conference proceeding
AN - SCOPUS:85116131175
T3 - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
SP - 576
EP - 585
BT - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PB - IEEE
CY - U. S. A.
Y2 - 5 January 2021 through 9 January 2021
ER -