TY - GEN
T1 - Graphhopper
T2 - 20th International Semantic Web Conference, ISWC 2021
AU - Koner, Rajat
AU - Li, Hang
AU - Hildebrandt, Marcel
AU - Das, Deepan
AU - Tresp, Volker
AU - Günnemann, Stephan
N1 - Publisher Copyright:
© 2021, The Author(s).
PY - 2021
Y1 - 2021
N2 - Visual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.
AB - Visual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.
KW - Knowledge graph reasoning
KW - Multi-modal reasoning
KW - Reinforcement learning
KW - Scene graph reasoning
KW - Visual Question Answering (VQA)
UR - http://www.scopus.com/inward/record.url?scp=85116858375&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-88361-4_7
DO - 10.1007/978-3-030-88361-4_7
M3 - Conference contribution
AN - SCOPUS:85116858375
SN - 9783030883607
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 111
EP - 127
BT - The Semantic Web – ISWC 2021 - 20th International Semantic Web Conference, ISWC 2021, Proceedings
A2 - Hotho, Andreas
A2 - Blomqvist, Eva
A2 - Dietze, Stefan
A2 - Fokoue, Achille
A2 - Ding, Ying
A2 - Barnaghi, Payam
A2 - Haller, Armin
A2 - Dragoni, Mauro
A2 - Alani, Harith
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 24 October 2021 through 28 October 2021
ER -