Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration

Jun Deng, Björn Schuller, Florian Eyben, Dagmar Schuller, Zixing Zhang, Holly Francois, Eunmi Oh

Research output: Contribution to journalArticlepeer-review

29 Scopus citations

Abstract

Perceptual audio coding is heavily and successfully applied for audio compression. However, perceptual audio coders may inject audible coding artifacts when encoding audio at low bitrates. Low-bitrate audio restoration is a challenging problem, which tries to recover a high-quality audio sample close to the uncompressed original from a low-quality encoded version. In this paper, we propose a novel data-driven method for audio restoration, where temporal and spectral dynamics are explicitly captured by a deep time-frequency-LSTM recurrent neural networks. Leveraging the captured temporal and spectral information can facilitate the task of learning a nonlinear mapping from the magnitude spectrogram of low-quality audio to that of high-quality audio. The proposed method substantially attenuates audible artifacts caused by codecs and is conceptually straightforward. Extensive experiments were carried out and the experimental results show that for low-bitrate audio at 96 kbps (mono), 64 kbps (mono), and 96 kbps (stereo), the proposed method can efficiently generate improved-quality audio that is competitive or even superior in perceptual quality to the audio produced by other state-of-the-art deep neural network methods and the LAME-MP3 codec.

Original languageEnglish
Pages (from-to)1095-1107
Number of pages13
JournalNeural Computing and Applications
Volume32
Issue number4
DOIs
StatePublished - 1 Feb 2020
Externally publishedYes

Keywords

  • Audio restoration
  • Deep learning
  • LSTM
  • MP3

Fingerprint

Dive into the research topics of 'Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration'. Together they form a unique fingerprint.

Cite this