TY - GEN
T1 - Audio onset detection
T2 - 2014 International Joint Conference on Neural Networks, IJCNN 2014
AU - Marchi, Erik
AU - Ferroni, Giacomo
AU - Eyben, Florian
AU - Squartini, Stefano
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/3
Y1 - 2014/9/3
N2 - This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.
AB - This paper concerns the exploitation of multi-resolution time-frequency features via Wavelet Packet Transform to improve audio onset detection. In our approach, Wavelet Packet Energy Coefficients (WPEC) and Auditory Spectral Features (ASF) are processed by Bidirectional Long Short-Term Memory (BLSTM) recurrent neural network that yields the onsets location. The combination of the two feature sets, together with the BLSTM based detector, form an advanced energy-based approach that takes advantage from the multi-resolution analysis given by the wavelet decomposition of the audio input signal. The neural network is trained with a large database of onset data covering various genres and onset types. Due to its data-driven nature, our approach does not require the onset detection method and its parameters to be tuned to a particular type of music. We show a comparison with other types and sizes of recurrent neural networks and we compare results with state-of-the-art methods on the whole onset dataset. We conclude that our approach significantly increase performance in terms of F-measure without any music genres or onset type constraints.
UR - http://www.scopus.com/inward/record.url?scp=84908474237&partnerID=8YFLogxK
U2 - 10.1109/IJCNN.2014.6889669
DO - 10.1109/IJCNN.2014.6889669
M3 - Conference contribution
AN - SCOPUS:84908474237
T3 - Proceedings of the International Joint Conference on Neural Networks
SP - 3585
EP - 3591
BT - Proceedings of the International Joint Conference on Neural Networks
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 July 2014 through 11 July 2014
ER -