TY - JOUR
T1 - Towards Constructing HMM Structure for Speech Recognition with Deep Neural Fenonic Baseform Growing
AU - Li, Lujun
AU - Watzel, Tobias
AU - Kurzinger, Ludwig
AU - Rigoll, Gerhard
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - For decades, acoustic models in speech recognition systems pivot on Hidden Markov Models (HMMs), e.g., Gaussian Mixture Model-HMM system, Deep Neural Network-HMM system, etc., and achieve remarkable results. However, the popular HMM model is the three-state left-to-right structure, without the superiority certainty. There are multiple studies on the HMM structure's optimization, but none of them addresses this problem leveraging deep learning algorithms. For the first time, this paper proposes a new training method based on Deep Neural Fenonic Baseform Growing to optimize the HMM structure, which is concisely designed and computationally cheap. Moreover, this data-driven method customizes the HMM structure for each phone precisely without external assumptions concerning the number of states or transition patterns. Experimental results on both TIMIT and TEDliumv2 corpora indicate that the proposed HMM structure improves both the monophone system and the triphone system substantially. Besides, its adoption further improves state-of-the-art speech recognition systems with remarkably reduced parameters.
AB - For decades, acoustic models in speech recognition systems pivot on Hidden Markov Models (HMMs), e.g., Gaussian Mixture Model-HMM system, Deep Neural Network-HMM system, etc., and achieve remarkable results. However, the popular HMM model is the three-state left-to-right structure, without the superiority certainty. There are multiple studies on the HMM structure's optimization, but none of them addresses this problem leveraging deep learning algorithms. For the first time, this paper proposes a new training method based on Deep Neural Fenonic Baseform Growing to optimize the HMM structure, which is concisely designed and computationally cheap. Moreover, this data-driven method customizes the HMM structure for each phone precisely without external assumptions concerning the number of states or transition patterns. Experimental results on both TIMIT and TEDliumv2 corpora indicate that the proposed HMM structure improves both the monophone system and the triphone system substantially. Besides, its adoption further improves state-of-the-art speech recognition systems with remarkably reduced parameters.
KW - Deep neural network
KW - HMM topology
KW - speech recognition
KW - vector quantization
UR - http://www.scopus.com/inward/record.url?scp=85102613622&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3064197
DO - 10.1109/ACCESS.2021.3064197
M3 - Article
AN - SCOPUS:85102613622
SN - 2169-3536
VL - 9
SP - 39098
EP - 39110
JO - IEEE Access
JF - IEEE Access
M1 - 9371697
ER -