TY - GEN
T1 - A Review on VQA
T2 - 1st International Conference on Computer Science and Emerging Technologies, CSET 2023
AU - Agrawal, Mayank
AU - Jalal, Anand Singh
AU - Sharma, Himanshu
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - An new area called 'visual question answering' (VQA) seeks to integrate CV with NLP. In order to get correct results, it entails creating models that can comprehend both textual questions and visual input, represented by videos or images. Applications for VQA systems include content-based image retrieval, medical image analysis, autonomous vehicles, and human-computer interaction. However, there are a number of difficulties in developing efficient VQA models, such as ambiguity in questions, sophisticated reasoning, processing multi-modal data, and data bias. To solve these issues and enhance the functionality and interpretability of VQA models, researchers are consistently investigating novel techniques, such as attention mechanisms, fusion tactics, and transformer-based architectures. In addition to a review of the existing problems and potential future developments in the area of visual question answering, this paper offers an overview of VQA methods, datasets, and tools. Finally, we go through potential routes for VQA and image comprehension research in the future.
AB - An new area called 'visual question answering' (VQA) seeks to integrate CV with NLP. In order to get correct results, it entails creating models that can comprehend both textual questions and visual input, represented by videos or images. Applications for VQA systems include content-based image retrieval, medical image analysis, autonomous vehicles, and human-computer interaction. However, there are a number of difficulties in developing efficient VQA models, such as ambiguity in questions, sophisticated reasoning, processing multi-modal data, and data bias. To solve these issues and enhance the functionality and interpretability of VQA models, researchers are consistently investigating novel techniques, such as attention mechanisms, fusion tactics, and transformer-based architectures. In addition to a review of the existing problems and potential future developments in the area of visual question answering, this paper offers an overview of VQA methods, datasets, and tools. Finally, we go through potential routes for VQA and image comprehension research in the future.
KW - Multimodal Learning
KW - VQA
KW - Visual Attention
UR - http://www.scopus.com/inward/record.url?scp=85182019367&partnerID=8YFLogxK
U2 - 10.1109/CSET58993.2023.10346815
DO - 10.1109/CSET58993.2023.10346815
M3 - Conference contribution
AN - SCOPUS:85182019367
T3 - 2023 International Conference on Computer Science and Emerging Technologies, CSET 2023
BT - 2023 International Conference on Computer Science and Emerging Technologies, CSET 2023
PB - Institute of Electrical and Electronics Engineers
Y2 - 10 October 2023 through 12 October 2023
ER -