A Review on VQA: Methods, Tools and Datasets

Mayank Agrawal, Anand Singh Jalal, Himanshu Sharma

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

An new area called 'visual question answering' (VQA) seeks to integrate CV with NLP. In order to get correct results, it entails creating models that can comprehend both textual questions and visual input, represented by videos or images. Applications for VQA systems include content-based image retrieval, medical image analysis, autonomous vehicles, and human-computer interaction. However, there are a number of difficulties in developing efficient VQA models, such as ambiguity in questions, sophisticated reasoning, processing multi-modal data, and data bias. To solve these issues and enhance the functionality and interpretability of VQA models, researchers are consistently investigating novel techniques, such as attention mechanisms, fusion tactics, and transformer-based architectures. In addition to a review of the existing problems and potential future developments in the area of visual question answering, this paper offers an overview of VQA methods, datasets, and tools. Finally, we go through potential routes for VQA and image comprehension research in the future.

Original languageEnglish
Title of host publication2023 International Conference on Computer Science and Emerging Technologies, CSET 2023
PublisherInstitute of Electrical and Electronics Engineers
ISBN (Electronic)9798350341737
DOIs
StatePublished - 1 Jan 2023
Externally publishedYes
Event1st International Conference on Computer Science and Emerging Technologies, CSET 2023 - Hybrid, Bangalore, India
Duration: 10 Oct 202312 Oct 2023

Publication series

Name2023 International Conference on Computer Science and Emerging Technologies, CSET 2023

Conference

Conference1st International Conference on Computer Science and Emerging Technologies, CSET 2023
Country/TerritoryIndia
CityHybrid, Bangalore
Period10/10/2312/10/23

Keywords

  • Multimodal Learning
  • VQA
  • Visual Attention

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Decision Sciences (miscellaneous)
  • Information Systems and Management
  • Control and Optimization
  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Review on VQA: Methods, Tools and Datasets'. Together they form a unique fingerprint.

Cite this