Audio onset detection: A wavelet packet based approach with recurrent neural networks

Erik Marchi, Giacomo Ferroni, Florian Eyben, Stefano Squartini, Bjorn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.

Original languageEnglish
Title of host publicationProceedings of the International Joint Conference on Neural Networks
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3585-3591
Number of pages7
ISBN (Electronic)9781479914845
DOIs
StatePublished - 3 Sep 2014
Externally publishedYes
Event2014 International Joint Conference on Neural Networks, IJCNN 2014 - Beijing, China
Duration: 6 Jul 201411 Jul 2014

Publication series

NameProceedings of the International Joint Conference on Neural Networks

Conference

Conference2014 International Joint Conference on Neural Networks, IJCNN 2014
Country/TerritoryChina
CityBeijing
Period6/07/1411/07/14

Fingerprint

Dive into the research topics of 'Audio onset detection: A wavelet packet based approach with recurrent neural networks'. Together they form a unique fingerprint.

Cite this