Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement

Jiaming Cheng, Ruiyu Liang, Lin Zhou, Li Zhao, Chengwei Huang, Bjorn W. Schuller

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In recent years, a great deal of research has focused on in developing neural network (NN)-based speech enhancement (SE) models, which have achieved promising results. However, NN-based models typically require expensive computations to achieve remarkable performance, constraining their deployment in real-world scenarios, especially when hardware resources are limited or when latency requirements are strict. To reduce this computational burden, we propose a unified residual fusion probabilistic knowledge distillation (KD) method for the SE task, in which knowledge is transferred from a deep teacher to a shallower student model. Previous KD approaches commonly focused on narrowing the output distances between teachers and students, but research on the intermediate representation of these models is lacking. In this paper, we first study the cross-layer residual feature fusion strategy, which enables the student model to distill knowledge contained in multiple teacher layers from shallow to deep. Second, a frame weighting probabilistic distillation loss is proposed to assign more emphasis to frames containing essential information and preserve pairwise probabilistic similarities in the representation space. The proposed distillation framework is applied to the dual-path dilated convolutional recurrent network (DPDCRN), which won the championship of the SE track in the L3DAS23 challenge. Extensive experiments are conducted on single-channel and multichannel SE datasets. Objective evaluations show that the proposed KD strategy outperforms other distillation methods and considerably improves the enhancement effect of the low-complexity student model (with only 17% of the teacher's parameters).

Original languageEnglish
Pages (from-to)2680-2691
Number of pages12
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
StatePublished - 2024
Externally publishedYes

Keywords

  • Speech enhancement
  • frame weighting
  • knowledge distillation (KD)
  • low-complexity
  • residual fusion

Fingerprint

Dive into the research topics of 'Residual Fusion Probabilistic Knowledge Distillation for Speech Enhancement'. Together they form a unique fingerprint.

Cite this