LONG-TERM ACTION ANTICIPATION BASED ON CONTEXTUAL ALIGNMENT

Constantin Patsch, Jinghan Zhang, Yuankai Wu, Marsil Zakour, Driton Salihu, Eckehard Steinbach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In action anticipation, the model predicts the next future action after a certain observation period. In long-term action anticipation, this idea is further extended to predicting multiple actions and their respective duration. Thus, in this problem setting the model should not only capture relationships between past actions but also predict several future actions that fit into a certain context. Compared to autoregressive models, our model employs an encoder decoder structure to determine future actions and durations in parallel, which prevents the accumulation of prediction errors and reduces the inference time. Furthermore, it is ensured that the predicted actions are aligned with respect to a context representation, which resembles the way humans approach this task as the feasible action set is restricted by the respective context. We evaluate our model on the long-term anticipation benchmark datasets, Breakfast, and 50Salads, where we achieve state-of-the-art results.

Original languageEnglish
Title of host publication2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5920-5924
Number of pages5
ISBN (Electronic)9798350344851
DOIs
StatePublished - 2024
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • Action Anticipation
  • Computer Vision
  • Long-term Activity Understanding

Fingerprint

Dive into the research topics of 'LONG-TERM ACTION ANTICIPATION BASED ON CONTEXTUAL ALIGNMENT'. Together they form a unique fingerprint.

Cite this