TY - GEN
T1 - Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks
AU - Marchi, Erik
AU - Ferroni, Giacomo
AU - Eyben, Florian
AU - Gabrielli, Leonardo
AU - Squartini, Stefano
AU - Schuller, Bjorn
PY - 2014
Y1 - 2014
N2 - A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.
AB - A plethora of different onset detection methods have been proposed in the recent years. However, few attempts have been made with respect to widely-applicable approaches in order to achieve superior performances over different types of music and with considerable temporal precision. In this paper, we present a multi-resolution approach based on discrete wavelet transform and linear prediction filtering that improves time resolution and performance of onset detection in different musical scenarios. In our approach, wavelet coefficients and forward prediction errors are combined with auditory spectral features and then processed by a bidirectional Long Short-Term Memory recurrent neural network, which acts as reduction function. The network is trained with a large database of onset data covering various genres and onset types. We compare results with state-of-the-art methods on a dataset that includes Bello, Glover and ISMIR 2004 Ballroom sets, and we conclude that our approach significantly outperforms existing methods in terms of F-Measure. For pitched non percussive music an absolute improvement of 7.5% is reported.
KW - Audio Onset Detection
KW - Bidirectional LongShort Term Memory
KW - Discrete Wavelet Transform
KW - Linear Prediction
KW - Neural Networks
UR - https://www.scopus.com/pages/publications/84905251106
U2 - 10.1109/ICASSP.2014.6853982
DO - 10.1109/ICASSP.2014.6853982
M3 - Conference contribution
AN - SCOPUS:84905251106
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2164
EP - 2168
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -