Skip to main navigation Skip to search Skip to main content

Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks

  • Erik Marchi
  • , Giacomo Ferroni
  • , Florian Eyben
  • , Leonardo Gabrielli
  • , Stefano Squartini
  • , Bjorn Schuller
  • Technical University of Munich
  • Università di Ancona
  • Imperial College London

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

83 Scopus citations

Abstract

A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2164-2168
Number of pages5
ISBN (Print)9781479928927
DOIs
StatePublished - 2014
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
Duration: 4 May 20149 May 2014

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/TerritoryItaly
CityFlorence
Period4/05/149/05/14

Keywords

  • Audio Onset Detection
  • Bidirectional LongShort Term Memory
  • Discrete Wavelet Transform
  • Linear Prediction
  • Neural Networks

Fingerprint

Dive into the research topics of 'Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks'. Together they form a unique fingerprint.

Cite this