TY - GEN
T1 - Discriminatively trained recurrent neural networks for single-channel speech separation
AU - Weninger, Felix
AU - Hershey, John R.
AU - Le Roux, Jonathan
AU - Schuller, Bjorn
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/2/5
Y1 - 2014/2/5
N2 - This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.
AB - This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.
KW - Deep neural networks
KW - Discriminative training
KW - Speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=84941334311&partnerID=8YFLogxK
U2 - 10.1109/GlobalSIP.2014.7032183
DO - 10.1109/GlobalSIP.2014.7032183
M3 - Conference contribution
AN - SCOPUS:84941334311
T3 - 2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014
SP - 577
EP - 581
BT - 2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2014
Y2 - 3 December 2014 through 5 December 2014
ER -