Projects per year
Abstract
In this paper, we investigate the task of Video based Question Answering. We provide a diagnostic dataset that can be used to evaluate the extent of reasoning abilities of various methods for solving this task. Previous datasets proposed for this task do not have this ability. Our dataset is large scale (around 1.3 million questions jointly for train and test) and evaluates both the spatial and temporal properties and the relationship between various objects for these properties. We evaluate state of the art language model (BERT) as a baseline to understand the extent of correlation based on language features alone. Other existing networks are then used to combine video features along with language features for solving this task. Unfortunately, we observe that the currently prevalent systems do not perform significantly better than the language baseline. We hypothesise that this is due to our efforts in ensuring that no obvious biases exist in this dataset and the dataset is balanced. To make progress, the learning techniques needs to obtain an ability to reason, going beyond basic correlation of biases. This is an interesting and significant challenge provided through our work. We release our dataset and source code for our baseline modules in the following webpage https://delta-lab-iitk.github.io/vquad/.
Original language | English |
---|---|
Title of host publication | 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) |
Place of Publication | U. S. A. |
Publisher | IEEE |
Pages | 282-291 |
Number of pages | 10 |
Volume | 2022 |
ISBN (Electronic) | 9781665458245 |
DOIs | |
Publication status | Published - 15 Feb 2022 |
Event | 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022 - Waikoloa, USA United States Duration: 4 Jan 2022 → 8 Jan 2022 |
Publication series
Name | Proceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022 |
---|---|
ISSN (Electronic) | 2690-621X |
Conference
Conference | 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, WACVW 2022 |
---|---|
Country/Territory | USA United States |
City | Waikoloa |
Period | 4/01/22 → 8/01/22 |
ASJC Scopus subject areas
- Computer Science Applications
- Computer Vision and Pattern Recognition
Fingerprint
Dive into the research topics of 'VQuAD: Video Question Answering Diagnostic Dataset'. Together they form a unique fingerprint.-
Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA) - 2.0
Campbell, N. (PI), Cosker, D. (PI), Bilzon, J. (CoI), Campbell, N. (CoI), Cazzola, D. (CoI), Colyer, S. (CoI), Cosker, D. (CoI), Lutteroth, C. (CoI), McGuigan, P. (CoI), O'Neill, E. (CoI), Petrini, K. (CoI), Proulx, M. (CoI) & Yang, Y. (CoI)
Engineering and Physical Sciences Research Council
1/11/20 → 31/10/25
Project: Research council
-
Centre for the Analysis of Motion, Entertainment Research and Applications (CAMERA)
Cosker, D. (PI), Bilzon, J. (CoI), Campbell, N. (CoI), Cazzola, D. (CoI), Colyer, S. (CoI), Fincham Haines, T. (CoI), Hall, P. (CoI), Kim, K. I. (CoI), Lutteroth, C. (CoI), McGuigan, P. (CoI), O'Neill, E. (CoI), Richardt, C. (CoI), Salo, A. (CoI), Seminati, E. (CoI), Tabor, A. (CoI) & Yang, Y. (CoI)
Engineering and Physical Sciences Research Council
1/09/15 → 28/02/21
Project: Research council