TY - GEN
T1 - Multilingual Augmentation for Robust Visual Question Answering in Remote Sensing Images
AU - Yuan, Zhenghang
AU - Mou, Lichao
AU - Zhu, Xiao Xiang
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.
AB - Aiming at answering questions based on the content of remotely sensed images, visual question answering for remote sensing data (RSVQA) has attracted much attention nowadays. However, previous works in RSVQA have focused little on the robustness of RSVQA. As we aim to enhance the reliability of RSVQA models, how to learn robust representations against new words and different question templates with the same meaning is the key challenge. With the proposed augmented dataset, we are able to obtain more questions in addition to the original ones with the same meaning. To make better use of this information, in this study, we propose a contrastive learning strategy for training robust RSVQA models against diverse question templates and words. Experimental results demonstrate that the proposed augmented dataset is effective in improving the robustness of the RSVQA model. In addition, the contrastive learning strategy performs well on the low resolution (LR) dataset.
KW - Remote sensing
KW - deep learning
KW - robustness
KW - visual question answering (VQA)
UR - https://www.scopus.com/pages/publications/85163735916
U2 - 10.1109/JURSE57346.2023.10144189
DO - 10.1109/JURSE57346.2023.10144189
M3 - Conference contribution
AN - SCOPUS:85163735916
T3 - 2023 Joint Urban Remote Sensing Event, JURSE 2023
BT - 2023 Joint Urban Remote Sensing Event, JURSE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Joint Urban Remote Sensing Event, JURSE 2023
Y2 - 17 May 2023 through 19 May 2023
ER -