TY - GEN
T1 - Masked Event Modeling
T2 - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
AU - Klenk, Simon
AU - Bonello, David
AU - Koestler, Lukas
AU - Araslanov, Nikita
AU - Cremers, Daniel
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods for classification and other semantic tasks with the event modality. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task, leading to a consistent improvement of the task accuracy. For example, our method reaches state-of-the-art classification accuracy across three datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy of previous work by significant margins. When tested on real-world event data, MEM is even superior to supervised RGB-based pretraining. The models pretrained with MEM are also label-efficient and generalize well to the dense task of semantic image segmentation.
AB - Event cameras asynchronously capture brightness changes with low latency, high temporal resolution, and high dynamic range. However, annotation of event data is a costly and laborious process, which limits the use of deep learning methods for classification and other semantic tasks with the event modality. To reduce the dependency on labeled event data, we introduce Masked Event Modeling (MEM), a self-supervised framework for events. Our method pretrains a neural network on unlabeled events, which can originate from any event camera recording. Subsequently, the pretrained model is finetuned on a downstream task, leading to a consistent improvement of the task accuracy. For example, our method reaches state-of-the-art classification accuracy across three datasets, N-ImageNet, N-Cars, and N-Caltech101, increasing the top-1 accuracy of previous work by significant margins. When tested on real-world event data, MEM is even superior to supervised RGB-based pretraining. The models pretrained with MEM are also label-efficient and generalize well to the dense task of semantic image segmentation.
KW - Algorithms
KW - Algorithms
KW - Image recognition and understanding
KW - Machine learning architectures
KW - and algorithms
KW - formulations
UR - http://www.scopus.com/inward/record.url?scp=85190279769&partnerID=8YFLogxK
U2 - 10.1109/WACV57701.2024.00237
DO - 10.1109/WACV57701.2024.00237
M3 - Conference contribution
AN - SCOPUS:85190279769
T3 - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
SP - 2367
EP - 2377
BT - Proceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 January 2024 through 8 January 2024
ER -