TY - GEN
T1 - Modeling Action Spatiotemporal Relationships Using Graph-Based Class-Level Attention Network for Long-Term Action Detection
AU - Wu, Yuankai
AU - Su, Xin
AU - Salihu, Driton
AU - Xing, Hao
AU - Zakour, Marsil
AU - Patsch, Constantin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In recent years, Action Detection has become an active research topic in various fields such as human-robot interaction and assistive robots. Most of the previous methods in this field focus on temporally processing the action representation, without considering the dependencies among the action classes. However, actions that occur in a video are constantly related, and this correlation could offer effective clues for detection tasks. In this work, we propose to exploit the information of related action classes with the help of a graph neural network in conjunction with temporal modeling. We introduce the attention-based temporal class module (ATC), which models the inherent action dependencies on the graph and learns action-specific features among temporal dimensions with a dual-branch attention mechanism. Further, we present the Graph-based Class-level Attention Network (GCAN), which is built upon ATC modules with increasing temporal receptive fields to handle actions instances in complex untrimmed videos. Our network is evaluated on two challenging benchmark datasets with dense annotations: Charades and MultiTHUMOS. Experimental results show that our approach demonstrates highly competitive results with a significantly reduced model complexity.
AB - In recent years, Action Detection has become an active research topic in various fields such as human-robot interaction and assistive robots. Most of the previous methods in this field focus on temporally processing the action representation, without considering the dependencies among the action classes. However, actions that occur in a video are constantly related, and this correlation could offer effective clues for detection tasks. In this work, we propose to exploit the information of related action classes with the help of a graph neural network in conjunction with temporal modeling. We introduce the attention-based temporal class module (ATC), which models the inherent action dependencies on the graph and learns action-specific features among temporal dimensions with a dual-branch attention mechanism. Further, we present the Graph-based Class-level Attention Network (GCAN), which is built upon ATC modules with increasing temporal receptive fields to handle actions instances in complex untrimmed videos. Our network is evaluated on two challenging benchmark datasets with dense annotations: Charades and MultiTHUMOS. Experimental results show that our approach demonstrates highly competitive results with a significantly reduced model complexity.
UR - http://www.scopus.com/inward/record.url?scp=85182526477&partnerID=8YFLogxK
U2 - 10.1109/IROS55552.2023.10341409
DO - 10.1109/IROS55552.2023.10341409
M3 - Conference contribution
AN - SCOPUS:85182526477
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 6719
EP - 6726
BT - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
Y2 - 1 October 2023 through 5 October 2023
ER -