TY - GEN
T1 - Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches
AU - Larson, Martha
AU - Willett, Daniel
AU - Köhler, Joachim
AU - Rigoll, Gerhard
PY - 2000
Y1 - 2000
N2 - This paper proposes a novel combined compound splitting and phrase recombination method that optimizes the composition of the speech recognition lexicon for a given domain. Data-driven compound word splitting is followed by iterative recombination of high frequency combinations. Language model perplexity and size are the criteria used to identify a balance between compound decomposition, which reduces OOV, and lexical unit recombination, which packs additional context into a fixed-size vocabulary. The method provides a basis for lexicon design for a LVCSR system on the domain of German parliamentary speeches that is to be used as the foundation of a spoken document information retrieval system. The approach achieves a 35% reduction in OOV without a prohibitively large sacrifice in recognition performance.
AB - This paper proposes a novel combined compound splitting and phrase recombination method that optimizes the composition of the speech recognition lexicon for a given domain. Data-driven compound word splitting is followed by iterative recombination of high frequency combinations. Language model perplexity and size are the criteria used to identify a balance between compound decomposition, which reduces OOV, and lexical unit recombination, which packs additional context into a fixed-size vocabulary. The method provides a basis for lexicon design for a LVCSR system on the domain of German parliamentary speeches that is to be used as the foundation of a spoken document information retrieval system. The approach achieves a 35% reduction in OOV without a prohibitively large sacrifice in recognition performance.
UR - http://www.scopus.com/inward/record.url?scp=85009061098&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009061098
T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000
BT - 6th International Conference on Spoken Language Processing, ICSLP 2000
PB - International Speech Communication Association
T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000
Y2 - 16 October 2000 through 20 October 2000
ER -