TY - JOUR
T1 - Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction
AU - Niu, Mingyue
AU - Zhao, Ziping
AU - Tao, Jianhua
AU - Li, Ya
AU - Schuller, Bjorn W.
N1 - Publisher Copyright:
© 2010-2012 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Physiological studies have identified that facial dynamics can be considered as biomarkers to analyze depression severity. This paper accordingly develops a Dual Attention and Element Recalibration (DAER) network to extract facial changes to predict the depression level. In this model, we propose two blocks: a Dual Attention (DA) block and Element Recalibration (ER) block. The DA block uses the self-attention to investigate the dynamic changes in the representation sequence of a facial video segment. It further examines the influence of feature components of the representation sequence on depression level prediction through bilinear-attention. Moreover, to improve the representation ability of network, the ER block is used to obtain the global information to recalibrate each element of the tensor. Adopting this approach, for the depression level prediction task, we first divide the long-term video into fixed-length segments and use the trained ResNet50 to encode each frame to generate the representation sequences of video segments. Second, the representation sequences are input into DAER network to obtain the depression level scores. Finally, the average of these scores yields the prediction result corresponding to the long-term video. Experiments on publicly available AVEC 2013 and AVEC 2014 depression databases illustrate the effectiveness of our method.
AB - Physiological studies have identified that facial dynamics can be considered as biomarkers to analyze depression severity. This paper accordingly develops a Dual Attention and Element Recalibration (DAER) network to extract facial changes to predict the depression level. In this model, we propose two blocks: a Dual Attention (DA) block and Element Recalibration (ER) block. The DA block uses the self-attention to investigate the dynamic changes in the representation sequence of a facial video segment. It further examines the influence of feature components of the representation sequence on depression level prediction through bilinear-attention. Moreover, to improve the representation ability of network, the ER block is used to obtain the global information to recalibrate each element of the tensor. Adopting this approach, for the depression level prediction task, we first divide the long-term video into fixed-length segments and use the trained ResNet50 to encode each frame to generate the representation sequences of video segments. Second, the representation sequences are input into DAER network to obtain the depression level scores. Finally, the average of these scores yields the prediction result corresponding to the long-term video. Experiments on publicly available AVEC 2013 and AVEC 2014 depression databases illustrate the effectiveness of our method.
KW - DAER network
KW - depression level prediction
KW - dual attention block
KW - element recalibration block
KW - facial differences
UR - http://www.scopus.com/inward/record.url?scp=85174344313&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2022.3177737
DO - 10.1109/TAFFC.2022.3177737
M3 - Article
AN - SCOPUS:85174344313
SN - 1949-3045
VL - 14
SP - 1954
EP - 1965
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
IS - 3
ER -