VQuAD: Video Question Answering Diagnostic Dataset

Vivek Gupta, Badri N. Patro, Hemant Parihar, Vinay P. Namboodiri

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding

6 Citations (SciVal)
95 Downloads (Pure)

Abstract

In this paper, we investigate the task of Video based Question Answering. We provide a diagnostic dataset that can be used to evaluate the extent of reasoning abilities of various methods for solving this task. Previous datasets proposed for this task do not have this ability. Our dataset is large scale (around 1.3 million questions jointly for train and test) and evaluates both the spatial and temporal properties and the relationship between various objects for these properties. We evaluate state of the art language model (BERT) as a baseline to understand the extent of correlation based on language features alone. Other existing networks are then used to combine video features along with language features for solving this task. Unfortunately, we observe that the currently prevalent systems do not perform significantly better than the language baseline. We hypothesise that this is due to our efforts in ensuring that no obvious biases exist in this dataset and the dataset is balanced. To make progress, the learning techniques needs to obtain an ability to reason, going beyond basic correlation of biases. This is an interesting and significant challenge provided through our work. We release our dataset and source code for our baseline modules in the following webpage https://delta-lab-iitk.github.io/vquad/.

Original languageEnglish
Title of host publication2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
Place of PublicationU. S. A.
PublisherIEEE
Pages282-291
Number of pages10
Volume2022
ISBN (Electronic)9781665458245
DOIs
Publication statusPublished - 15 Feb 2022
Event2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022 - Waikoloa, USA United States
Duration: 4 Jan 20228 Jan 2022

Publication series

NameProceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022
ISSN (Electronic)2690-621X

Conference

Conference2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022
Country/TerritoryUSA United States
CityWaikoloa
Period4/01/228/01/22

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'VQuAD: Video Question Answering Diagnostic Dataset'. Together they form a unique fingerprint.

Cite this