Attention Fusion for Audio-Visual Person Verification Using Multi-Scale Features

Stefan Hormann, Abdul Moiz, Martin Knoche, Gerhard Rigoll

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

7 Zitate (Scopus)

Abstract

In the domain of audio-visual person recognition, many approaches use naive fusion techniques, such as scorelevel fusion or concatenation, to fuse the features obtained by face and audio extraction networks. More sophisticated methods fuse both features taking into account the quality of their corresponding inputs. In this paper, we propose a novel architecture to improve the prediction of feature quality. In contrary to previous works, which estimate feature quality based on the features themselves, we combine the information obtained from different layers of the feature extraction networks. In our analysis, we show that our approach outperforms state-of-the-art fusion approaches on well-established benchmarks for multimodal person verification. Moreover, we show that our model is robust against degradation of the visual input.

OriginalspracheEnglisch
TitelProceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
Redakteure/-innenVitomir Struc, Francisco Gomez-Fernandez
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten281-285
Seitenumfang5
ISBN (elektronisch)9781728130798
DOIs
PublikationsstatusVeröffentlicht - Nov. 2020
Veranstaltung15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020 - Buenos Aires, Argentinien
Dauer: 16 Nov. 202020 Nov. 2020

Publikationsreihe

NameProceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020

Konferenz

Konferenz15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
Land/GebietArgentinien
OrtBuenos Aires
Zeitraum16/11/2020/11/20

Fingerprint

Untersuchen Sie die Forschungsthemen von „Attention Fusion for Audio-Visual Person Verification Using Multi-Scale Features“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren