Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering

Rajat Koner, Hang Li, Marcel Hildebrandt, Deepan Das, Volker Tresp, Stephan Günnemann

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Scopus citations

Abstract

Visual Question Answering (VQA) is concerned with answering free-form questions about an image. Since it requires a deep semantic and linguistic understanding of the question and the ability to associate it with various objects that are present in the image, it is an ambitious task and requires multi-modal reasoning from both computer vision and natural language processing. We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques. Concretely, our method is based on performing context-driven, sequential reasoning based on the scene entities and their semantic and spatial relationships. As a first step, we derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships. Subsequently, a reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths, which are the basis for deriving answers. We conduct an experimental study on the challenging dataset GQA, based on both manually curated and automatically generated scene graphs. Our results show that we keep up with human performance on manually curated scene graphs. Moreover, we find that Graphhopper outperforms another state-of-the-art scene graph reasoning model on both manually curated and automatically generated scene graphs by a significant margin.

Original languageEnglish
Title of host publicationThe Semantic Web – ISWC 2021 - 20th International Semantic Web Conference, ISWC 2021, Proceedings
EditorsAndreas Hotho, Eva Blomqvist, Stefan Dietze, Achille Fokoue, Ying Ding, Payam Barnaghi, Armin Haller, Mauro Dragoni, Harith Alani
PublisherSpringer Science and Business Media Deutschland GmbH
Pages111-127
Number of pages17
ISBN (Print)9783030883607
DOIs
StatePublished - 2021
Event20th International Semantic Web Conference, ISWC 2021 - Virtual, Online
Duration: 24 Oct 202128 Oct 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12922 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Semantic Web Conference, ISWC 2021
CityVirtual, Online
Period24/10/2128/10/21

Keywords

  • Knowledge graph reasoning
  • Multi-modal reasoning
  • Reinforcement learning
  • Scene graph reasoning
  • Visual Question Answering (VQA)

Fingerprint

Dive into the research topics of 'Graphhopper: Multi-hop Scene Graph Reasoning for Visual Question Answering'. Together they form a unique fingerprint.

Cite this