TY - GEN
T1 - Dense Coordinate Channel Attention Network for Depression Level Estimation from Speech
AU - Zhao, Ziping
AU - Liu, Shizhao
AU - Niu, Mingyue
AU - Wang, Haishuai
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Automatic depression level estimation from speech is currently an active research topic in the field of computational emotion recognition. One symptom commonly exhibited by patients with depression is erratic speech volume; thus, patients’ voices can be used as a bio-signature to identify their level of depression. However, speech signals have time-frequency properties; different frequencies and different timestamps contribute to depression detection in different ways. Accordingly, we design a Coordinate Channel Attention (CCA) block for differentiating tensor information with different contributions. We use a dense block to extract profound speech features with the above-mentioned blocks to form our proposed Dense Coordinate Channel Attention Network (DCCANet). Subsequently, a vectorization block is utilized to fuse the high-dimensional information. We split the original long speech into short audio segments of equal length, then feed these short segments into the network after feature extraction to determine BDI-II scores. Ultimately, the mean of the scores is used as the individual’s depression level. Experiments on both the AVEC2013 and AVEC2014 datasets prove the effectiveness of DCCANet, which outperforms several existing methods.
AB - Automatic depression level estimation from speech is currently an active research topic in the field of computational emotion recognition. One symptom commonly exhibited by patients with depression is erratic speech volume; thus, patients’ voices can be used as a bio-signature to identify their level of depression. However, speech signals have time-frequency properties; different frequencies and different timestamps contribute to depression detection in different ways. Accordingly, we design a Coordinate Channel Attention (CCA) block for differentiating tensor information with different contributions. We use a dense block to extract profound speech features with the above-mentioned blocks to form our proposed Dense Coordinate Channel Attention Network (DCCANet). Subsequently, a vectorization block is utilized to fuse the high-dimensional information. We split the original long speech into short audio segments of equal length, then feed these short segments into the network after feature extraction to determine BDI-II scores. Ultimately, the mean of the scores is used as the individual’s depression level. Experiments on both the AVEC2013 and AVEC2014 datasets prove the effectiveness of DCCANet, which outperforms several existing methods.
KW - Coordinate Channel Attention
KW - Depression Level Estimation
KW - feature extraction
KW - speech signals
KW - time-frequency properties
UR - http://www.scopus.com/inward/record.url?scp=85211792849&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-78201-5_26
DO - 10.1007/978-3-031-78201-5_26
M3 - Conference contribution
AN - SCOPUS:85211792849
SN - 9783031782008
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 402
EP - 413
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -