Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network

Hao Xing, Darius Burschka

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Human activities recognition is an important task for an intelligent robot, especially in the field of human-robot collaboration, it requires not only the label of sub-activities but also the temporal structure of the activity. In order to automatically recognize both the label and the temporal structure in sequence of human-object interaction, we propose a novel Pyramid Graph Convolutional Network (PGCN), which employs a pyramidal encoder-decoder architecture consisting of an attention based graph convolution network and a temporal pyramid pooling module for downsampling and upsampling interaction sequence on the temporal axis, respectively. The system represents the 2D or 3D spatial relation of human and objects from the detection results in video data as a graph. To learn the human-object relations, a new attention graph convolutional network is trained to extract condensed information from the graph representation. To segment action into sub-actions, a novel temporal pyramid pooling module is proposed, which upsamples compressed features back to the original time scale and classifies actions per frame. We explore various attention layers, namely spatial attention, temporal attention and channel attention, and combine different upsampling decoders to test the performance on action recognition and segmentation. We evaluate our model on two challenging datasets in the field of human-object interaction recognition, i.e. Bimanual Actions and IKEA Assembly datasets. We demonstrate that our classifier significantly improves both framewise action recognition and segmentation, e.g., F1 micro and F1@50 scores on Bimanual Actions dataset are improved by 4.3% and 8.5% respectively.

Original languageEnglish
Title of host publication2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5195-5201
Number of pages7
ISBN (Electronic)9781665479271
DOIs
StatePublished - 2022
Event2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022 - Kyoto, Japan
Duration: 23 Oct 202227 Oct 2022

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
Volume2022-October
ISSN (Print)2153-0858
ISSN (Electronic)2153-0866

Conference

Conference2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022
Country/TerritoryJapan
CityKyoto
Period23/10/2227/10/22

Fingerprint

Dive into the research topics of 'Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network'. Together they form a unique fingerprint.

Cite this