TY - JOUR
T1 - DGPN
T2 - 25th Interspeech Conferece 2024
AU - Ge, Zirui
AU - Xu, Xinzhou
AU - Guo, Haiyan
AU - Wang, Tingting
AU - Yang, Zhen
AU - Schuller, Björn W.
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - As synthetic speech technologies rapidly advance, accurately classifying these synthesis algorithms has become increasingly critical in the speech anti-spoofing. Nevertheless, in the incipient stage of emerging spoofing algorithms, the acquisition of ample generated speech samples is often constrained, impeding the efficacy of conventional models. To this end, we introduce a novel methodology within the realm of few-shot learning, named Dual Graph Prototypical Network (DGPN), in view of this limitation for the Speech Spoofing Algorithm Recognition (SSAR) task. The proposed method consists of intra-speech graph and inter-speech graph modules, where the former employs graph attention networks to model the low-level representations of an utterance, and the latter utilizes graph neural networks to depict high-level representations of different utterances. Experimental evaluations demonstrate that the proposed method outperforms existing models in classification accuracy, showcasing its effectiveness in addressing the challenge of the few-shot SSAR task.
AB - As synthetic speech technologies rapidly advance, accurately classifying these synthesis algorithms has become increasingly critical in the speech anti-spoofing. Nevertheless, in the incipient stage of emerging spoofing algorithms, the acquisition of ample generated speech samples is often constrained, impeding the efficacy of conventional models. To this end, we introduce a novel methodology within the realm of few-shot learning, named Dual Graph Prototypical Network (DGPN), in view of this limitation for the Speech Spoofing Algorithm Recognition (SSAR) task. The proposed method consists of intra-speech graph and inter-speech graph modules, where the former employs graph attention networks to model the low-level representations of an utterance, and the latter utilizes graph neural networks to depict high-level representations of different utterances. Experimental evaluations demonstrate that the proposed method outperforms existing models in classification accuracy, showcasing its effectiveness in addressing the challenge of the few-shot SSAR task.
KW - Few-shot learning
KW - graph neural networks
KW - speech anti-spoofing
KW - speech spoofing algorithm recognition
UR - http://www.scopus.com/inward/record.url?scp=85214804011&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-1724
DO - 10.21437/Interspeech.2024-1724
M3 - Conference article
AN - SCOPUS:85214804011
SN - 2308-457X
SP - 1125
EP - 1129
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 1 September 2024 through 5 September 2024
ER -