Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

Yi Wang, Conrad M. Albrecht, Nassim Ait Ali Braham, Chenying Liu, Zhitong Xiong, Xiao Xiang Zhu

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

1 Zitat (Scopus)

Abstract

The increasing availability of multi-sensor data sparks wide interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. We evaluate DeCUR in three common multimodal scenarios (radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent improvement regardless of architectures and for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide valuable insights and raise more interest in researching the hidden relationships of multimodal representations (https://github.com/zhu-xlab/DeCUR).

OriginalspracheEnglisch
TitelComputer Vision – ECCV 2024 - 18th European Conference, Proceedings
Redakteure/-innenAleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten286-303
Seitenumfang18
ISBN (Print)9783031733963
DOIs
PublikationsstatusVeröffentlicht - 2025
Veranstaltung18th European Conference on Computer Vision, ECCV 2024 - Milan, Italien
Dauer: 29 Sept. 20244 Okt. 2024

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band15087 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Konferenz

Konferenz18th European Conference on Computer Vision, ECCV 2024
Land/GebietItalien
OrtMilan
Zeitraum29/09/244/10/24

Fingerprint

Untersuchen Sie die Forschungsthemen von „Decoupling Common and Unique Representations for Multimodal Self-supervised Learning“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren