Skip to main navigation Skip to search Skip to main content

Syllabification of conversational speech using bidirectional long-short-term memory neural networks

  • Christian Landsiedel
  • , Jens Edlund
  • , Florian Eyben
  • , Daniel Neiberg
  • , Björn Schuller
  • Center for Autonomous Systems
  • Technical University of Munich

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Segmentation of speech signals is a crucial task in many types of speech analysis. We present a novel approach at segmentation on a syllable level, using a Bidirectional Long-Short-Term Memory Neural Network. It performs estimation of syllable nucleus positions based on regression of perceptually motivated input features to a smooth target function. Peak selection is performed to attain valid nuclei positions. Performance of the model is evaluated on the levels of both syllables and the vowel segments making up the syllable nuclei. The general applicability of the approach is illustrated by good results for two common databases - Switchboard and TIMIT - for both read and spontaneous speech, and a favourable comparison with other published results.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages5256-5259
Number of pages4
DOIs
StatePublished - 2011
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 22 May 201127 May 2011

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Country/TerritoryCzech Republic
CityPrague
Period22/05/1127/05/11

Keywords

  • Dialogue Systems
  • Recurrent Neural Networks
  • Speech Analysis
  • Syllabification

Fingerprint

Dive into the research topics of 'Syllabification of conversational speech using bidirectional long-short-term memory neural networks'. Together they form a unique fingerprint.

Cite this