Intention-Conditioned Long-Term Human Egocentric Action Anticipation

Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

To anticipate how a person would act in the future, it is essential to understand the human intention since it guides the subject towards a certain action. In this paper, we propose a hierarchical architecture which assumes a sequence of human action (low-level) can be driven from the human intention (high-level). Based on this, we deal with long-term action anticipation task in egocentric videos. Our framework first extracts this low- and high-level human information over the observed human actions in a video through a Hierarchical Multi-task Multi-Layer Perceptrons Mixer (H3M). Then, we constrain the uncertainty of the future through an Intention-Conditioned Variational Auto-Encoder (I-CVAE) that generates multiple stable predictions of the next actions that the observed human might perform. By leveraging human intention as high-level information, we claim that our model is able to anticipate more time-consistent actions in the long-term, thus improving the results over the baseline in Ego4D dataset. This work results in the state-of-the-art for Long-Term Anticipation (LTA) task in Ego4D by providing more plausible anticipated sequences, improving the anticipation scores of nouns and actions. Our work ranked first in both CVPR@2022 and ECCV@2022 Ego4D LTA Challenge.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6037-6046
Number of pages10
ISBN (Electronic)9781665493468
DOIs
StatePublished - 2023
Externally publishedYes
Event23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023 - Waikoloa, United States
Duration: 3 Jan 20237 Jan 2023

Publication series

NameProceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023

Conference

Conference23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
Country/TerritoryUnited States
CityWaikoloa
Period3/01/237/01/23

Keywords

  • Algorithms: Video recognition and understanding (tracking, action recognition, etc.)
  • Machine learning architectures
  • Robotics
  • and algorithms (including transfer, low-shot, semi-, self-, and un-supervised learning)
  • formulations

Fingerprint

Dive into the research topics of 'Intention-Conditioned Long-Term Human Egocentric Action Anticipation'. Together they form a unique fingerprint.

Cite this