TY - GEN
T1 - Two-stage speaker adaptation of hybrid tied-posterior acoustic models
AU - Stadermann, Jan
AU - Rigoll, Gerhard
PY - 2005
Y1 - 2005
N2 - For Gaussian distribution-acoustic models there exist many established technologies for speaker adaptation. Contrary to that, there are almost no well-functioning adaptation methods for hybrid systems, consisting of a combination of HMMs and neural networks. In this paper, strategies are explored to adapt hybrid NN/HMM systems based on the tied-posterior paradigm. We investigate the retraining of selected important parts of the neural network and a gradient based adaptation strategy for the HMM's mixture co-efficients based on maximizing the scaled likelihood. The paper presents the following innovations: First it introduces one of the first adaptation methods for hybrid systems where the HMM component contributes significantly to the adaptation success. Second, it presents a novel approach to the neural network's adaptation, based on the selection of suitable neurons for adaptation. Results on the WSJ speaker adaptation test show the capability of our methods to adapt to new speakers especially in case of adapting the neural net and that both methods can be combined to achieve additional improvement of the word error rate in most cases.
AB - For Gaussian distribution-acoustic models there exist many established technologies for speaker adaptation. Contrary to that, there are almost no well-functioning adaptation methods for hybrid systems, consisting of a combination of HMMs and neural networks. In this paper, strategies are explored to adapt hybrid NN/HMM systems based on the tied-posterior paradigm. We investigate the retraining of selected important parts of the neural network and a gradient based adaptation strategy for the HMM's mixture co-efficients based on maximizing the scaled likelihood. The paper presents the following innovations: First it introduces one of the first adaptation methods for hybrid systems where the HMM component contributes significantly to the adaptation success. Second, it presents a novel approach to the neural network's adaptation, based on the selection of suitable neurons for adaptation. Results on the WSJ speaker adaptation test show the capability of our methods to adapt to new speakers especially in case of adapting the neural net and that both methods can be combined to achieve additional improvement of the word error rate in most cases.
UR - http://www.scopus.com/inward/record.url?scp=33646794050&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415279
DO - 10.1109/ICASSP.2005.1415279
M3 - Conference contribution
AN - SCOPUS:33646794050
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 977
EP - 980
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -