TY - GEN
T1 - Syllabification of conversational speech using bidirectional long-short-term memory neural networks
AU - Landsiedel, Christian
AU - Edlund, Jens
AU - Eyben, Florian
AU - Neiberg, Daniel
AU - Schuller, Björn
PY - 2011
Y1 - 2011
N2 - Segmentation of speech signals is a crucial task in many types of speech analysis. We present a novel approach at segmentation on a syllable level, using a Bidirectional Long-Short-Term Memory Neural Network. It performs estimation of syllable nucleus positions based on regression of perceptually motivated input features to a smooth target function. Peak selection is performed to attain valid nuclei positions. Performance of the model is evaluated on the levels of both syllables and the vowel segments making up the syllable nuclei. The general applicability of the approach is illustrated by good results for two common databases - Switchboard and TIMIT - for both read and spontaneous speech, and a favourable comparison with other published results.
AB - Segmentation of speech signals is a crucial task in many types of speech analysis. We present a novel approach at segmentation on a syllable level, using a Bidirectional Long-Short-Term Memory Neural Network. It performs estimation of syllable nucleus positions based on regression of perceptually motivated input features to a smooth target function. Peak selection is performed to attain valid nuclei positions. Performance of the model is evaluated on the levels of both syllables and the vowel segments making up the syllable nuclei. The general applicability of the approach is illustrated by good results for two common databases - Switchboard and TIMIT - for both read and spontaneous speech, and a favourable comparison with other published results.
KW - Dialogue Systems
KW - Recurrent Neural Networks
KW - Speech Analysis
KW - Syllabification
UR - https://www.scopus.com/pages/publications/80051628297
U2 - 10.1109/ICASSP.2011.5947543
DO - 10.1109/ICASSP.2011.5947543
M3 - Conference contribution
AN - SCOPUS:80051628297
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5256
EP - 5259
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -