Prosodic and spectral features within segment-based acoustic modeling

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Apart from the usually employed MFCC, PLP, and energy feature information, also duration, low order formants, pitch, and center-of-gravity-based features are known to carry valuable information for phoneme recognition. This work investigates their individual performance within segment-based acoustic modeling. Also, experiments optimizing a feature space spanned by this set, exclusively, are reported, using CFSS feature space optimization and speaker adaptation. All tests are carried out with SVM on the open IFA-corpus of 47 Dutch handlabeled phonemes with a total of 178k instances. Extensive speaker dependent vs. independent test-runs are discussed as well as four different speaking styles reaching from informal to formal: informal and retold story telling, and read aloud with fixed and variable content. Results show the potential of these rather uncommon features, as e.g. based on F3 or pitch.

Original languageEnglish
Pages (from-to)2370-2373
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2008
EventINTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, QLD, Australia
Duration: 22 Sep 200826 Sep 2008

Keywords

  • ASR
  • Acoustic modeling
  • Feature space optimization
  • Phoneme-recognition
  • Prosodic features

Fingerprint

Dive into the research topics of 'Prosodic and spectral features within segment-based acoustic modeling'. Together they form a unique fingerprint.

Cite this