Multichannel Speech Enhancement Based on Neural Beamforming and a Context-Focused Post-Filtering Network

Cong Pang, Jingjie Fan, Qifan Shen, Yue Xie, Chengwei Huang, Bjorn W. Schuller

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Both spatial and temporal contextual information are essential for the multichannel speech enhancement (MCSE) task. In this work, we propose a unified MCSE network composed of neural beamforming and a context-focused post-filtering network in order to fully exploit these two types of information. The network is used to estimate the optimum complex ideal ratio masks (cIRMs) which can more effectively utilize the phase information in the frequency domain to reconstruct the speech waveform. To assign adaptive weights to channels, we first adopt a dilated convolution-based network which simulates the beamforming on the original multichannel input spectrum as the front-end of the multichannel acoustic model. Furthermore, we propose a post-filtering network that inputs the suggested U-Net's output to a convolutional long short-term memory (ConvLSTM) layer, as it can properly capture contextual information and spatial correlation information of features. We conduct experiments on the VOICES, CHiME-3, and WMIR data sets, respectively. Experiments show that, in various scenarios, the proposed algorithm shows improvements over the previous state-of-the-art algorithms in terms of PESQ, STOI, and SI-SNR.

Original languageEnglish
Pages (from-to)973-983
Number of pages11
JournalIEEE Transactions on Cognitive and Developmental Systems
Volume16
Issue number3
DOIs
StatePublished - 1 Jun 2024
Externally publishedYes

Keywords

  • Convolutional long short-term memory (ConvLSTM)
  • U-Net
  • dilated convolution
  • multichannel speech enhancement (MCSE)
  • neural beamforming

Fingerprint

Dive into the research topics of 'Multichannel Speech Enhancement Based on Neural Beamforming and a Context-Focused Post-Filtering Network'. Together they form a unique fingerprint.

Cite this