TY - GEN
T1 - Attention Fusion for Audio-Visual Person Verification Using Multi-Scale Features
AU - Hormann, Stefan
AU - Moiz, Abdul
AU - Knoche, Martin
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - In the domain of audio-visual person recognition, many approaches use naive fusion techniques, such as scorelevel fusion or concatenation, to fuse the features obtained by face and audio extraction networks. More sophisticated methods fuse both features taking into account the quality of their corresponding inputs. In this paper, we propose a novel architecture to improve the prediction of feature quality. In contrary to previous works, which estimate feature quality based on the features themselves, we combine the information obtained from different layers of the feature extraction networks. In our analysis, we show that our approach outperforms state-of-the-art fusion approaches on well-established benchmarks for multimodal person verification. Moreover, we show that our model is robust against degradation of the visual input.
AB - In the domain of audio-visual person recognition, many approaches use naive fusion techniques, such as scorelevel fusion or concatenation, to fuse the features obtained by face and audio extraction networks. More sophisticated methods fuse both features taking into account the quality of their corresponding inputs. In this paper, we propose a novel architecture to improve the prediction of feature quality. In contrary to previous works, which estimate feature quality based on the features themselves, we combine the information obtained from different layers of the feature extraction networks. In our analysis, we show that our approach outperforms state-of-the-art fusion approaches on well-established benchmarks for multimodal person verification. Moreover, we show that our model is robust against degradation of the visual input.
KW - attention
KW - audio visual
KW - face recognition
KW - fusion
KW - multimodal
KW - person verfication
UR - http://www.scopus.com/inward/record.url?scp=85101463730&partnerID=8YFLogxK
U2 - 10.1109/FG47880.2020.00074
DO - 10.1109/FG47880.2020.00074
M3 - Conference contribution
AN - SCOPUS:85101463730
T3 - Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
SP - 281
EP - 285
BT - Proceedings - 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
A2 - Struc, Vitomir
A2 - Gomez-Fernandez, Francisco
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2020
Y2 - 16 November 2020 through 20 November 2020
ER -