IEEE/CAA Journal of Automatica Sinica
Citation: | Dong Yu and Jinyu Li, "Recent Progresses in Deep Learning Based Acoustic Models," IEEE/CAA J. Autom. Sinica, vol. 4, no. 3, pp. 396-409, July 2017. doi: 10.1109/JAS.2017.7510508 |
[1] |
D. Yu, L. Deng, and G. E. Dahl, "Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition, " in Proc. NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning, 2010. https://www.researchgate.net/publication/228631482_Roles_of_Pre-Training_and_Fine-Tuning_in_Context-Dependent_DBN-HMMs_for_Real-World_Speech_Recognition
|
[2] |
G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition, " IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, pp. 30-42, Jan. 2012.
|
[3] |
D. Yu, F. Seide, and G. Li, "Conversational speech transcription using context-dependent deep neural networks, " in Proc. 29th Int. Conf. Int. Conf. Machine Learning, Edinburgh, Scotland, 2011, pp. 437-440. http://www.isca-speech.org/archive/archive_papers/interspeech_2011/i11_0437.pdf
|
[4] |
G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups, " IEEE Signal Processing Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
|
[5] |
O. Abdel-Hamid, A. R. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Proc. 2012 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012, pp. 4277-4280. https://www.researchgate.net/publication/261119155_Applying_Convolutional_Neural_Networks_concepts_to_hybrid_NN-HMM_model_for_speech_recognition
|
[6] |
L. Deng, J. Li, J. T. Huang, K. S. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. D. He, J. Williams, Y. F. Gong, and A. Acero, "Recent advances in deep learning for speech research at microsoft, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 8604-8608.
|
[7] |
O. Abdel-Hamid, A. R. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional neural networks for speech recognition, " IEEE/ACM Trans. Audio Speech Lang Processing, vol. 22, no. 10, pp. 1533-1545, Oct. 2014.
|
[8] |
H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling, " in 15th Proc. Interspeech, Singapore, Singapore, 2014, pp. 338-342. https://www.researchgate.net/publication/279714069_Long_short-term_memory_recurrent_neural_network_architectures_for_large_scale_acoustic_modeling
|
[9] |
H. Sak, A. Senior, K. Rao, O. İrsoy, A. Graves, F. Beaufays, and J. Schalkwyk, "Learning acoustic frame labeling for speech recognition with recurrent neural networks, " in Prco. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing. Brisbane, QLD, Australia, 2015, pp. 4280-4284. https://www.researchgate.net/publication/304525733_Learning_acoustic_frame_labeling_for_speech_recognition_with_recurrent_neural_networks
|
[10] |
T. N. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, long short-term memory, fully connected deep neural networks, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015, pp. 4580-4584. https://www.researchgate.net/publication/308872979_Convolutional_Long_Short-Term_Memory_fully_connected_Deep_Neural_Networks
|
[11] |
M. X. Bi, Y. M. Qian, and K. Yu, "Very deep convolutional neural networks for LVCSR, " in 16th Proc. Interspeech, Dresden, Germany, 2015, pp. 3259-3263. http://www.isca-speech.org/archive/interspeech_2015/i15_3259.html
|
[12] |
V. Mitra and H. Franco, "Time-frequency convolutional networks for robust speech recognition, " in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AZ, USA, 2015, pp. 317-323. https://www.sri.com/sites/default/files/publications/time-frequency_convolutional_networks_for_robust_speech_recognition.pdf
|
[13] |
V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts, " in 16th Proc. Interspeech, Dresden, Germany, 2015, pp. 3214-3218. http://www.isca-speech.org/archive/interspeech_2015/papers/i15_3214.pdf
|
[14] |
T. Sercu, C. Puhrsch, B. Kingsbury, and Y. LeCun, "Very deep multilingual convolutional neural networks for LVCSR, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 4955-4959. http://cims.nyu.edu/~ts2387/talks/sercu_icassp16_verydeepCNN.pdf
|
[15] |
D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. D. Chen, M. Chrzanowski, A. Coates, G. Diamos, E. Elsen, J. Engel, L. X. Fan, C. Fougner, T. Han, A. Hannun, B. Jun, P. LeGresley, L. Lin, S. Narang, A. Ng, S. Ozair, R. Prenger, J. Raiman, S. Satheesh, D. Seetapun, S. Sengupta, Y. Wang, Z. Q. Wang, C. Wang, B. Xiao, D. N. Yogatama, J. Zhan, and Z. Y. Zhu, "Deep speech 2:End-to-end speech recognition in English and Mandarin, " arXiv:1512.02595, 2015. https://www.researchgate.net/publication/286513561_Deep_Speech_2_End-to-End_Speech_Recognition_in_English_and_Mandarin
|
[16] |
S. L. Zhang, C. Liu, H. Jiang, S. Wei, L. R. Dai, and Y. Hu, "Feedforward sequential memory networks:A new structure to learn long-term dependency, " arXiv:1512.08301, 2015. http://arxiv.org/pdf/1512.08301
|
[17] |
D. Yu, W. Xiong, J. Droppo, A. Stolcke, G. L. Ye, J. Li, and G. Zweig, "Deep convolutional neural networks with layer-wise context expansion and attention, " in 17th Proc. Interspeech, San Francisco, USA, 2016. http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0251.PDF
|
[18] |
H. Soltau, H. Liao, and H. Sak, "Neural speech recognizer:acousticto-word LSTM model for large vocabulary speech recognition, " arXiv:1610.09975, 2016. https://www.researchgate.net/publication/309572693_Neural_Speech_Recognizer_Acoustic-to-Word_LSTM_Model_for_Large_Vocabulary_Speech_Recognition
|
[19] |
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, "Achieving human parity in conversational speech recognition, " arXiv:1610.05256, 2016. https://www.researchgate.net/publication/309207213_Achieving_Human_Parity_in_Conversational_Speech_Recognition
|
[20] |
D. Yu, M. Kolbaek, Z. H. Tan, and J. Jensen, "Permutation invariant training of deep models for speaker-independent multi-talker speech separation, " arXiv:1607.00325, 2017. https://www.researchgate.net/publication/304758178_Permutation_Invariant_Training_of_Deep_Models_for_Speaker-Independent_Multi-talker_Speech_Separation
|
[21] |
M. Kolbaek, D. Yu, Z. H. Tan, and J. Jensen, "Multi-talker speech separation and tracing with permutation invariant training of deep recurrent neural networks, " arXiv:1703.06284, 2017. https://www.researchgate.net/publication/315455037_Multi-talker_Speech_Separation_and_Tracing_with_Permutation_Invariant_Training_of_Deep_Recurrent_Neural_Networks
|
[22] |
D. Yu and L. Deng, Automatic Speech Recognition:A Deep Learning approach. London:Springer, 2015. http://www.worldcat.org/title/automatic-speech-recognition-a-deep-learning-approach/oclc/895161787
|
[23] |
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Comput., vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
|
[24] |
A. Graves, A. R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 6645-6649. http://www.cs.toronto.edu/~fritz/absps/RNN13.pdf
|
[25] |
X. G. Li and X. H. Wu, "Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition, " in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015, pp. 4520-4524. https://www.researchgate.net/publication/267046395_Constructing_Long_Short-Term_Memory_based_Deep_Recurrent_Neural_Networks_for_Large_Vocabulary_Speech_Recognition
|
[26] |
Y. J. Miao and F. Metze, "On speaker adaptation of long shortterm memory recurrent neural networks, " in 16th Proc. Interspeech, Dresden, Germany, 2015, pp. 1101-1105. https://www.cs.cmu.edu/~ymiao/pub/is2015_lstm.pdf
|
[27] |
Y. J. Miao, J. Li, Y. Q. Wang, S. X. Zhang, and Y. F. Gong, "Simplifying long short-term memory acoustic models for fast training and decoding, " in Prco. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016. https://www.researchgate.net/publication/304372410_Simplifying_long_short-term_memory_acoustic_models_for_fast_training_and_decoding
|
[28] |
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling, " arXiv:1412.3555, 2014. https://www.researchgate.net/publication/269416998_Empirical_Evaluation_of_Gated_Recurrent_Neural_Networks_on_Sequence_Modeling
|
[29] |
Y. Zhang, G. G. Chen, D. Yu, K. S. Yao, S. Khudanpur, and J. Glass, "Highway long short-term memory RNNS for distant speech recognition, " in Proc. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 2016. https://www.researchgate.net/publication/304372363_Highway_long_short-term_memory_RNNS_for_distant_speech_recognition
|
[30] |
Y. Y. Zhao, S. Xu, and B. Xu, "Multidimensional residual learning based on recurrent neural networks for acoustic modeling, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 3419-3423. https://www.researchgate.net/publication/307889265_Multidimensional_Residual_Learning_Based_on_Recurrent_Neural_Networks_for_Acoustic_Modeling
|
[31] |
J. Kim, M. El-Khamy, and J. Lee, "Residual LSTM:Design of a deep recurrent architecture for distant speech recognition, " arXiv:1701.03360, 2017. https://www.researchgate.net/publication/312283320_Residual_LSTM_Design_of_a_Deep_Recurrent_Architecture_for_Distant_Speech_Recognition
|
[32] |
K. He, X. Y. Zhang, S. Q. Ren, and J. Sun, "Deep residual learning for image recognition, " arXiv:1512.03385, 2015. https://www.researchgate.net/publication/286512696_Deep_Residual_Learning_for_Image_Recognition
|
[33] |
A. R. Mohamed, G. Hinton, and G. Penn, "Understanding how deep belief networks perform acoustic modelling, " in Proc. 2012 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012, pp. 4273-4276. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.224.2314&rep=rep1&type=pdf
|
[34] |
J. Li, D. Yu, J. T. Huang, and Y. F. Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM, " in Proc. 2012 IEEE Spoken Language Technology Workshop, Miami, FL, USA, 2012, pp. 131-136. https://www.researchgate.net/publication/261421822_Improving_wideband_speech_recognition_using_mixed-bandwidth_training_data_in_CD-DNN-HMM
|
[35] |
J. Li, A. Mohamed, G. Zweig, and Y. F. Gong, "LSTM time and frequency recurrence for automatic speech recognition, " in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, Scottsdale, AZ, USA, 2015. https://www.researchgate.net/publication/300412435_LSTM_time_and_frequency_recurrence_for_automatic_speech_recognition
|
[36] |
J. Li, A. Mohamed, G. Zweig, and Y. F. Gong, "Exploring multidimensional LSTMs for large vocabulary ASR, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016. https://www.researchgate.net/publication/304372654_Exploring_multidimensional_lstms_for_large_vocabulary_ASR
|
[37] |
T. N. Sainath and B. Li, "Modeling time-frequency patterns with LSTM vs. convolutional architectures for LVCSR tasks, " in 17th Proc. Interspeech, San Francisco, USA, 2016. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45401.pdf
|
[38] |
N. Kalchbrenner, I. Danihelka, and A. Graves, "Grid long short-term memory, " arXiv:1507.01526, 2015. https://www.researchgate.net/publication/279864537_Grid_Long_Short-Term_Memory
|
[39] |
W. N. Hsu, Y. Zhang, and J. Glass, "A prioritized grid long short-term memory RNN for speech recognition, " in Proc. 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, California, USA, 2016, pp. 467-473. http://people.csail.mit.edu/jrg/2016/Wei-Ning-SLT-16.pdf
|
[40] |
A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures, " Neural Netw., vol. 18, no. 5-6, pp. 602-610, Jul.-Aug. 2005.
|
[41] |
S. F. Xue and Z. J. Yan, "Improving latency-controlled BLSTM acoustic models for online speech recognition, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, USA, 2017. http://linkinghub.elsevier.com/retrieve/pii/S0167639304001086
|
[42] |
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time-series, " in The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed. Cambridge:MIT Press, 1995.
|
[43] |
K. J. Lang, A. H. Waibel, and G. E. Hinton, "A time-delay neural network architecture for isolated word recognition, " Neural Netw., vol. 3, no. 1, pp. 23-43, Dec. 1990.
|
[44] |
O. Abdel-Hamid, L. Deng, and D. Yu, "Exploring convolutional neural network structures and optimization techniques for speech recognition, " in 14th Proc. Interspeech, Lyon, France, 2013, pp. 3366-3370. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.703.648
|
[45] |
T. N. Sainath, A. R. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for LVCSR, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 8614-8618. https://www.researchgate.net/publication/261153442_Deep_convolutional_neural_networks_for_LVCSR
|
[46] |
T. Sercu and V. Goel, "Dense prediction on sequences with time-dilated convolutions for speech recognition, " arXiv:1611.09288, 2016. https://www.researchgate.net/publication/311066943_Dense_Prediction_on_Sequences_with_Time-Dilated_Convolutions_for_Speech_Recognition
|
[47] |
L. Tóth, "Modeling long temporal contexts in convolutional neural network-based phone recognition, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015, pp. 4575-4579. https://www.researchgate.net/publication/308862096_Modeling_long_temporal_contexts_in_convolutional_neural_network-based_phone_recognition
|
[48] |
T. Zhao, Y. X. Zhao, and X. Chen, "Time-frequency kernel-based CNN for speech recognition, " in 16th Proc. Interspeech, Dresden, Germany, 2015. https://www.researchgate.net/publication/293653051_Time-frequency_kernel_based_CNN_for_speech_recognition
|
[49] |
N. Jaitly and G. Hinton, "Learning a better representation of speech soundwaves using restricted boltzmann machines, " in Proc. 2011 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011, pp. 5884-5887. http://www.academia.edu/11686402/Learning_a_better_representation_of_speech_soundwaves_using_restricted_boltzmann_machines
|
[50] |
D. Palaz, R. Collobert, and M. Magimai-Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, " in 14th Proc. Interspeech, Lyon, France, 2014. https://www.researchgate.net/publication/237145551_Estimating_Phoneme_Class_Conditional_Probabilities_from_Raw_Speech_Signal_using_Convolutional_Neural_Networks
|
[51] |
Z. Tüske, P. Golik, R. Schl uter, and H. Ney, "Acoustic modeling with deep neural networks using raw time signal for LVCSR, " in 15th Proc. Interspeech, Singapore, Singapore, 2014, pp. 890-894. http://www.academia.edu/18376509/Acoustic_Modeling_with_Deep_Neural_Networks_Using_Raw_Time_Signal_for_LVCSR
|
[52] |
T. N. Sainath, R. J. Weiss, A. W. Senior, K. W. Wilson, and O. Vinyals, "Learning the speech front-end with raw waveform CLDNNs, " in 16th Proc. Interspeech, Dresden, Germany, 2015, pp. 1-5. http://www.ee.columbia.edu/~ronw/pubs/interspeech2015-waveform_cldnn.pdf
|
[53] |
H. Dinkel, N. X. Chen, Y. M. Qian, and K. Yu, "End-to-end spoofing detection with raw waveform CLDNNS, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, USA, 2017. https://arxiv.org/pdf/1610.00564v1
|
[54] |
T. Yoshioka, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita, M. Fujimoto, C. Z. Yu, W. J. Fabian, M. Espi, T. Higuchi, S. Araki, and T. Nakatani, "The NTT chime-3 system:Advances in speech enhancement and recognition for mobile multi-microphone devices, " in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, 2015, pp. 436-443.
|
[55] |
X. Xiao, S. Watanabe, H. Erdogan, L. Lu, J. Hershey, M. L. Seltzer, G. G. Chen, Y. Zhang, M. Mandel, and D. Yu, "Deep beamforming networks for multi-channel speech recognition, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 5745-5749.
|
[56] |
T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, M. Bacchiani et al., "Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms, " in Proc. 2015 IEEE Int. Conf. Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, 2015, pp. 30-36. http://www.ee.columbia.edu/~ronw/pubs/asru2015-multichannel_cldnn.pdf
|
[57] |
T. N. Sainath, R. J. Weiss, K. W. Wilson, A. Narayanan, and M. Bacchiani, "Factored spatial and spectral multichannel raw waveform CLDNNS, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 5075-5079. http://www.ee.columbia.edu/~ronw/pubs/icassp2016-factored_cldnn.pdf
|
[58] |
T. N. Sainath, R. J. Weiss, K. W. Wilson, B. Li, A. Narayanan, E. Variani, M. Bacchiani, I. Shafran, A. Senior, K. W. Chin, A. Misra, and C. Kim, "Multichannel signal processing with deep neural networks for automatic speech recognition, " IEEE/ACM Trans. Audio Speech Language Processing, vol. 25, no. 5, pp. 965-979, May 2017.
|
[59] |
E. Variani, T. N. Sainath, I. Shafran, and M. Bacchiani, "Complex linear projection (CLP):A discriminative approach to joint feature extraction and acoustic modeling, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 808-812.
|
[60] |
H. Sak, A. Senior, K. Rao, and F. Beaufays, "Fast and accurate recurrent neural network acoustic models for speech recognition, " in 16th Proc. Interspeech, Dresden, Germany, 2015. http://www.isca-speech.org/archive/interspeech_2015/papers/i15_1468.pdf
|
[61] |
A. Senior, H. Sak, F. de Chaumont Quitry, T. Sainath, and K. Rao, "Acoustic modelling with CD-CTC-SMBR LSTM RNNS, " in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, 2015, pp. 604-609. https://www.researchgate.net/publication/304407558_Acoustic_modelling_with_CD-CTC-SMBR_LSTM_RNNS
|
[62] |
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, "Connec-tionist temporal classification:labelling unsegmented sequence data with recurrent neural networks, " in Proc. the 23rd Int. Conf. Machine Learning, Pittsburgh, Pennsylvania, USA, 2006, pp. 369-376. http://www.docin.com/p-262226469.html
|
[63] |
A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, and A. Y. Ng, "Deep speech:Scaling up end-to-end speech recognition, " arXiv:1412.5567, 2014. https://www.researchgate.net/publication/269722411_DeepSpeech_Scaling_up_end-to-end_speech_recognition
|
[64] |
Y. Miao, M. Gowayyed, and F. Metze, "EESEN:End-to-end speech recognition using deep RNN models and WFST-based decoding, " in Proc. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, 2015, pp. 167-174. https://www.researchgate.net/publication/280589906_EESEN_End-to-end_speech_recognition_using_deep_RNN_models_and_WFST-based_decoding
|
[65] |
Y. J. Miao, M. Gowayyed, X. Y. Na, T. Ko, F. Metze, and A. Waibel, "An empirical exploration of ctc acoustic models, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 2623-2627. https://www.cs.cmu.edu/~ymiao/pub/icassp2016_ctc.pdf
|
[66] |
K. Rao and H. Sak, "Multi-accent speech recognition with hierarchical grapheme based models, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, USA, 2017. http://www.cstr.ed.ac.uk/downloads/publications/2013/gaelic_graphemes_icassp13.pdf
|
[67] |
G. Zweig, C. Z. Yu, J. Droppo, and A. Stolcke, "Advances in all-neural speech recognition, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, USA, 2017.
|
[68] |
H. R. Liu, Z. Y. Zhu, X. G. Li, and S. Satheesh, "Gram-CTC:Automatic unit selection and target decomposition for sequence labelling, " arXiv:1703.00096, 2017. https://www.researchgate.net/publication/314153409_Gram-CTC_Automatic_Unit_Selection_and_Target_Decomposition_for_Sequence_Labelling
|
[69] |
Z. H. Chen, Y. M. Zhuang, Y. M. Qian, and K. Yu, "Phone synchronous speech recognition with ctc lattices, " IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 25, no. 1, pp. 90-101, 2017. doi: 10.1109/TASLP.2016.2625459
|
[70] |
D. Povey, V. Peddinti, D. Galvez, P. Ghahrmani, V. Manohar, X. Y. Na, Y. M. Wang, and S. Khudanpur, "Purely sequence-trained neural networks for ASR based on lattice-free MMI, " in 17th Proc. Interspeech, San Francisco, USA, 2016. http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0595.PDF
|
[71] |
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, "End-to-end attention-based large vocabulary speech recognition, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 4945-4949. http://mirlab.org/conference_papers/International_Conference/ICASSP%202016/pdfs/0004945.pdf
|
[72] |
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell:A neural network for large vocabulary conversational speech recognition, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 4960-4964. https://www.coursehero.com/file/17867718/Listen-attend-and-spell-A-neural-network-for-large-vocabulary-conversational-speech-recognition-ica/
|
[73] |
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate, " arXiv:1409.0473, 2014. http://www.cs.bilkent.edu.tr/~gcinbis/courses/Spring17/CS559/presentations/W13b_KeremSener.pptx
|
[74] |
V. Mnih, N. Heess, A. Graves, and K. Kavukcuoglu, "Recurrent models of visual attention, " in Advances in Neural Information Processing Systems 27:28th Annual Conference on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2204-2212. http://www.doc88.com/p-0367697812831.html
|
[75] |
K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation, " arXiv:1406.1078, 2014. http://www.academia.edu/11462276/Learning_Phrase_Representations_using_RNN_Encoder-Decoder_for_Statistical_Machine_Translation
|
[76] |
S. Kim, T. Hori, and S. Watanabe, "Joint CTC-attention based end-toend speech recognition using multi-task learning, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, New Orleans, USA, 2017. https://www.merl.com/publications/docs/TR2017-016.pdf
|
[77] |
Y. Zhang, W. Chan, and N. Jaitly, "Very deep convolutional networks for end-to-end speech recognition, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, New Orleans, USA, 2017. https://www.researchgate.net/publication/308981069_Very_Deep_Convolutional_Networks_for_End-to-End_Speech_Recognition
|
[78] |
J. Li, L. Deng, Y. F. Gong, and R. Haeb-Umbach, "An overview of noise-robust automatic speech recognition, " IEEE/ACM Trans. Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 745-777, Apr. 2014.
|
[79] |
J. Li, L. Deng, R. Haeb-Umbach, and Y. F. Gong, Robust Automatic Speech Recognition:A Bridge to Practical Applications, Waltham:Academic Press, 2015.
|
[80] |
F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in contextdependent deep neural networks for conversational speech transcription, " in Proc. 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 2011, pp. 24-29. https://www.researchgate.net/publication/239765773_Feature_engineering_in_Context-Dependent_Deep_Neural_Networks_for_conversational_speech_transcription
|
[81] |
H. Liao, "Speaker adaptation of context dependent deep neural networks, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 7947-7951. https://www.researchgate.net/profile/Natalia_Tomashenko/publication/286249884_Speaker_adaptation_of_context_dependent_deep_neural_networks_based_on_MAP-adaptation_and_GMM-derived_feature_processing/links/56671e5e08ae34c89a022a03.pdf?origin=publication_list
|
[82] |
D. Yu, K. S. Yao, H. Su, G. Li, and F. Seide, "KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 7893-7897. https://www.researchgate.net/publication/261194718_KL-divergence_regularized_deep_neural_network_adaptation_for_improved_large_vocabulary_speech_recognition
|
[83] |
Z. Huang, J. Li, S. M. Siniscalchi, I. F. Chen, J. Wu, and C. H. Lee, "Rapid adaptation for deep neural networks through multitask learning, in 16th Proc. Interspeech, Dresden, Germany, 2015, pp. 3625-3629. http://staff.ustc.edu.cn/~jundu/Speech%20signal%20processing/publications/IS2015_Xu.pdf
|
[84] |
J. Xue, J. Li, D. Yu, M. Seltzer, and Y. F. Gong, "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network, " in Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Florence, Italy, 2014, pp. 6359-6363. https://www.researchgate.net/publication/269295270_Singular_value_decomposition_based_low-footprint_speaker_adaptation_and_personalization_for_deep_neural_network
|
[85] |
J. Xue, J. Li, and Y. F. Gong, "Restructuring of deep neural network acoustic models with singular value decomposition, " in 14th Proc. Interspeech, Lyon, France, 2013, pp. 2365-2369. http://www.microsoft.com/en-us/research/wp-content/uploads/2013/01/svd_v2.pdf
|
[86] |
P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, " in Proc. 2014 IEEE Spoken Language Technology Workshop, South Lake Tahoe, NV, USA, 2014. http://www.cstr.ed.ac.uk/downloads/publications/2014/ps-slt14.pdf
|
[87] |
P. Swietojanski, J. Li, and S. Renals, "Learning hidden unit contributions for unsupervised acoustic model adaptation, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 8, pp. 1450-1463, Aug. 2016.
|
[88] |
Y. Zhao, J. Li, J. Xue, and Y. F. Gong, "Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015, pp. 4310-4314. https://www.researchgate.net/publication/285612590_Investigating_online_low-footprint_speaker_adaptation_using_generalized_linear_regression_and_click-through_data
|
[89] |
G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors, " in Proc. 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 2013, pp. 55-59. https://www.researchgate.net/profile/George_Saon/publication/261485126_Speaker_adaptation_of_neural_network_acoustic_models_using_i-vectors/links/558d70f108ae15962d8939c7.pdf
|
[90] |
A. Senior and I. Lopez-Moreno, "Improving DNN speaker independence with i-vector inputs, " in Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Florence, Italy, 2014, pp. 225-229. https://www.researchgate.net/publication/269294930_Improving_DNN_speaker_independence_with_I-vector_inputs
|
[91] |
O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 7942-7946. https://www.researchgate.net/publication/261500509_Fast_speaker_adaptation_of_hybrid_NNHMM_model_for_speech_recognition_based_on_discriminative_learning_of_speaker_code
|
[92] |
M. L. Seltzer, D. Yu, and Y. Q. Wang, "An investigation of deep neural networks for noise robust speech recognition, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 7398-7402. https://www.researchgate.net/publication/261125912_An_investigation_of_deep_neural_networks_for_noise_robust_speech_recognition
|
[93] |
D. Yu and L. Deng, "Adaptation of deep neural networks, " in Automatic Speech Recognition, D. Yu and L. Deng, Eds. London:Springer, 2015, pp. 193-215.
|
[94] |
Y. J. Miao, H. Zhang, and F. Metze, "Towards speaker adaptive training of deep neural network acoustic models, " in 15th Proc. Interspeech, Singapore, Singapore, 2014. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1068&context=lti
|
[95] |
J. Li, J. T. Huang, and Y. F. Gong, "Factorized adaptation for deep neural network, " in Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Florence, Italy, 2014. https://www.researchgate.net/publication/271468476_Factorized_adaptation_for_deep_neural_network
|
[96] |
T. Tan, Y. M. Qian, M. F. Yin, Y. M. Zhuang, and K. Yu, "Cluster adaptive training for deep neural network, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 2015, pp. 4325-4329. https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/cluster-adaptive-training-for-deep-neural-network-3uPnOT7eJV
|
[97] |
C. Y. Wu and M. J. F. Gales, "Multi-basis adaptive neural network for rapid adaptation in speech recognition, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015, pp. 4315-4319. https://www.researchgate.net/publication/285612709_Multi-basis_adaptive_neural_network_for_rapid_adaptation_in_speech_recognition
|
[98] |
L. Samarakoon and K. C. Sim, "Factorized hidden layer adaptation for deep neural network based acoustic modeling, " IEEE/ACM Trans. Audio Speech Language Processing, vol. 24, no. 12, pp. 2241-2250, 2016. doi: 10.1109/TASLP.2016.2601146
|
[99] |
L. Samarakoon, K. C. Sim, and B. Mak, "An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, New Orleans, USA, 2017. https://www.researchgate.net/publication/273137714_An_investigation_into_speaker_informed_DNN_front-end_for_LVCSR
|
[100] |
R. Kuhn, P. Nguyen, J. C. Junqua, L. Goldwasser, N. Niedzielski, S. Fincke, K. L. Field, and M. Contolini, "Eigenvoices for speaker adaptation, " in Proc. 1998 IEEE the 5th Int. Conf. Spoken Language Processing, Sydney, Australia, 1998, pp. 1774-1777. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenSemester1_2007_8/kuhn-junqua-eigenvoice-icslp1998.pdf
|
[101] |
M. J. F. Gales, "Cluster adaptive training for speech recognition, " in Proc. 5th Int. Conf. Spoken Language Processing, Sydney, Australia, 1998, pp. Article ID 0375. https://www.researchgate.net/publication/289138649_Cluster_adaptive_training_with_factorized_decision_trees_for_speech_recognition
|
[102] |
M. Delcroix, K. Kinoshita, T. Hori, and T. Nakatani, "Context adaptive deep neural networks for fast acoustic model adaptation, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing South Brisbane, QLD, Australia, 2015, pp. 4535-4539. http://ieeexplore.ieee.org/document/7178829/
|
[103] |
M. Delcroix, K. Kinoshita, C. Z. Yu, A. Ogawa, T. Yoshioka, and T. Nakatani, "Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 5270-5274. https://www.researchgate.net/publication/304372396_Context_adaptive_deep_neural_networks_for_fast_acoustic_model_adaptation_in_noisy_conditions
|
[104] |
Y. Zhao, J. Li, K. Kumar, and Y. Gong, "Extended low-rank plus diagonal adaptation for deep and recurrent neural networks, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, USA, 2017.
|
[105] |
M. Cooke, J. R. Hershey, and S. J. Rennie, "Monaural speech separation and recognition challenge, " Computer Speech Lang., vol. 24, no. 1, pp. 1-15, Jan. 2010.
|
[106] |
C. Weng, D. Yu, M. L. Seltzer, and J. Droppo, "Deep neural networks for single-channel multi-talker speech recognition, " IEEE/ACM Trans. Audio Speech Lang Processing, vol. 23, no. 10, pp. 1670-1679, Oct. 2015.
|
[107] |
Y. X. Wang, A. Narayanan, and D. L. Wang, "On training targets for supervised speech separation, " IEEE/ACM Trans. Audio Speech Lang Processing, vol. 22, no. 12, pp. 1849-1858, Dec. 2014.
|
[108] |
Y. Xu, J. Du, L. R. Dai, and C. H. Lee, "An experimental study on speech enhancement based on deep neural networks, " IEEE Signal Processing Lett., vol. 21, no. 1, pp. 65-68, Jan. 2014.
|
[109] |
F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J. R. Hershey, and B. Schuller, "Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, " in Latent Variable Analysis and Signal Separation. LVA/ICA 2015. Lecture Notes in Computer Science, Vincent E., Yeredor A., Koldovsky Z., Tichavsk y P, Eds. Cham:Springer, 2015, pp. 91-99. https://www.researchgate.net/publication/278747057_Speech_enhancement_with_LSTM_recurrent_neural_networks_and_its_application_to_noise-robust_ASR
|
[110] |
P. S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Joint optimization of masks and deep recurrent neural networks for monaural source separation, " IEEE/ACM Trans. Audio Speech Lang Processing, vol. 23, no. 12, pp. 2136-2147, Dec. 2015.
|
[111] |
J. R. Hershey, Z. Chen, J. Le Roux, and S. Watanabe, "Deep clustering:Discriminative embeddings for segmentation and separation, " in Proc. 2016 IEEE Int. Conf. Acoust. Speech Signal Process, Shanghai, China, 2016, pp. 31-35. http://labrosa.ee.columbia.edu/cuneuralnet/chen111815.pdf
|
[112] |
Y. Isik, J. Le Roux, Z. Chen, S. Watanabe, and J. R. Hershey, "Singlechannel multi-speaker separation using deep clustering, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 545-549. https://www.merl.com/publications/docs/TR2016-073.pdf
|
[113] |
M. Cooke, Modelling Auditory Processing and Organisation. Cambridge:Cambridge Univ. Press, 2005.
|
[114] |
D. P. Ellis, "Prediction-driven computational auditory scene analysis, " Ph.D. dissertation, Massachusetts:Massachusetts Inst. Technol., 1996. http://academiccommons.columbia.edu/download/fedora_content/download/ac:144539/CONTENT/dpwe-phd-thesis.pdf
|
[115] |
M. Wertheimer, Laws of organization in perceptual forms, in A Source Book of Gestalt Psychology, W. D. Ellis, Ed. Trench:Trubner & Company, 1938.
|
[116] |
M. N. Schmidt and R. K. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization, " in Proc. 2006-ICSLP the 9th Int. Conf. on Spoken Language Processing, Pittsburgh, PA, USA, 2006. https://www.researchgate.net/publication/221491907_Single-channel_speech_separation_using_sparse_non-negative_matrix_factorization
|
[117] |
P. Smaragdis, "Convolutive speech bases and their application to supervised speech separation, " IEEE/ACM Trans. Audio Speech Lang. Processing, vol. 15, no. 1, pp. 1-12, Jan. 2007.
|
[118] |
J. Le Roux, F. Weninger, and J. Hershey, "Sparse NMF-half-baked or well done, " Mitsubishi Electr. Res. Labs (MERL), Cambridge, MA, USA, Tech. Rep. TR2015-023, Mar. 2015.
|
[119] |
T. T. Kristjansson, J. R. Hershey, P. A. Olsen, S. J. Rennie, and R. A. Gopinath, "Super-human multi-talker speech recognition:the IBM 2006 speech separation challenge system, " in Proc. 2006-ICSLP the 9th Int. Conf. Spoken Language Processing, Pittsburgh, PA, USA, 2006, Article ID 1775-Mon1WeS.7. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.4496
|
[120] |
T. Virtanen, "Speech recognition using factorial hidden markov models for separation in the feature space, " in Proc. 2006-ICSLP 9th Int.e Conf. Spoken Language Processing, Pittsburgh, PA, USA, 2006. https://www.researchgate.net/publication/221489983_Speech_recognition_using_factorial_hidden_Markov_models_for_separation_in_the_feature_space
|
[121] |
R. J. Weiss and D. P. W. Ellis, "Monaural speech separation using source-adapted models, " in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 2007, pp. 114-117. https://www.researchgate.net/publication/4295684_Monaural_Speech_Separation_using_Source-Adapted_Models
|
[122] |
Z. Ghahramani and M. I. Jordan, "Factorial hidden Markov models, " Mach. Learn., vol. 29, no. 2-3, pp. 245-273, Nov. 1997.
|
[123] |
Z. Chen, Y. Luo, and N. Mesgarani, "Deep attractor network for singlemicrophone speaker separation, " in Proc. 2017 IEEE Int. Conf. Acoust. Speech Signal Process, New Orleans, USA, 2017. https://arxiv.org/pdf/1611.08930v1
|
[124] |
D. Yu, X. Chang, and Y. M. Qian, "Recognizing multi-talker speech with permutation invariant training, " in 18th Proc. Interspeech, Stockholm, Sweden, 2017. https://www.researchgate.net/publication/315835420_Recognizing_Multi-talker_Speech_with_Permutation_Invariant_Training
|
[125] |
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Proc. the 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672-2680. http://www.foldl.me/uploads/2015/conditional-gans-face-generation/paper.pdf
|
[126] |
Y. Shinohara, "Adversarial multi-task learning of deep neural networks for robust speech recognition, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 2369-2372. http://www.isca-speech.org/archive/Interspeech_2016/pdfs/0879.PDF
|
[127] |
D. Serdyuk, K. Audhkhasi, P. Brakel, B. Ramabhadran, S. Thomas, and Y. Bengio, "Invariant representations for noisy speech recognition, " arXiv:1612.01928, 2016. https://www.researchgate.net/publication/311458959_Invariant_Representations_for_Noisy_Speech_Recognition
|
[128] |
S. N. Sun, B. B. Zhang, L. Xie, and Y. N. Zhang, "An unsupervised deep domain adaptation approach for robust speech recognition, " Neurocomputing, to be published. http://www.sciencedirect.com/science/article/pii/S0925231217301492
|
[129] |
Y. Ganin and V. Lempitsky, "Unsupervised domain adaptation by backpropagation, " arXiv:1409.7495, 2014. https://www.researchgate.net/publication/266204110_Unsupervised_Domain_Adaptation_by_Backpropagation
|
[130] |
R. Lippmann, E. Martin, and D. Paul, "Multi-style training for robust isolated-word speech recognition, " in Proc. IEEE Int. Conf. ICASSP'87 Acoustics, Speech, and Signal Processing, Dallas, TX, USA, USA, 1987, pp. 705-708. https://www.researchgate.net/publication/224737757_Multi-style_training_for_robust_isolated-word_speech_recognition
|
[131] |
T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, "A study on data augmentation of reverberant speech for robust speech recognition, " in Proc. 2017 IEEE Int. Conf. Acoust. Speech Signal Process, New Orleans, USA, 2017.
|
[132] |
J. Li, M. L. Seltzer, X. Wang, R. Zhao, and Y. Gong, "Largescale domain adaptation via teacher-student learning, " in 18th Proc. Interspeech, 2017.
|
[133] |
G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network, " arXiv:1503.02531, 2015. https://www.researchgate.net/publication/273387909_Distilling_the_Knowledge_in_a_Neural_Network
|
[134] |
K. Markov and T. Matsui, "Robust speech recognition using generalized distillation framework, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 2364-2368. https://www.researchgate.net/profile/Konstantin_Markov/publication/307889099_Robust_Speech_Recognition_Using_Generalized_Distillation_Framework/links/57e0c25608aece48e9e20398.pdf?origin=publication_detail
|
[135] |
S. Watanabe, T. Hori, J. Le Roux, and J. R. Hershey, "Student-teacher network learning with enhanced features, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Broadway, USA, 2017. http://www.merl.com/publications/docs/TR2017-011.pdf
|
[136] |
Z. Y. Lu, V. Sindhwani, and T. N. Sainath, "Learning compact recurrent neural networks, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Shanghai, China, 2016, pp. 5960-5964. https://www.researchgate.net/publication/301876248_Learning_Compact_Recurrent_Neural_Networks
|
[137] |
R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, "On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Shanghai, China, 2016, pp. 5970-5974. http://adsabs.harvard.edu/abs/2016arXiv160308042P
|
[138] |
W. Chan, N. R. Ke, and I. Lane, "Transferring knowledge from a RNN to a DNN, " arXiv:1504.01483, 2015. https://www.researchgate.net/publication/274730673_Transferring_Knowledge_from_a_RNN_to_a_DNN
|
[139] |
K. J. Geras, A. R. Mohamed, R. Caruana, G. Urban, S. J. Wang, O. Aslan, M. Philipose, M. Richardson, and C. Sutton, "Blending LSTMs into CNNs, " arXiv:1511.06433, 2015. https://www.researchgate.net/publication/301548752_Blending_LSTMs_into_CNNs
|
[140] |
L. Lu, M. Guo, and S. Renals, "Knowledge distillation for smallfootprint highway networks, " in Proc. 2017 IEEE Int. Conf. Acoustics Speech and Signal Processing, New Orleans, USA, 2017. https://www.researchgate.net/publication/305780312_Knowledge_Distillation_for_Small-footprint_Highway_Networks
|
[141] |
J. Cui, B. Kingsbury, B. Ramabhadran, G. Saon, T. Sercu, K. Audhkhasi, A. Sethy, M. Nussbaum-Thom, and A. Rosenberg, Knowledge distillation across ensembles of multilingual models for low-resource languages, in ICASSP, 2017. doi: 10.1109/ICASSP.2017.7953073
|
[142] |
J. Li, R. Zhao, J. T. Huang, and Y. F. Gong, "Learning small-size DNN with output-distribution-based criteria, " in 15th Proc. Interspeech, Singapore, Singapore, 2014, pp. 1910-1914. https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/zhao.pdf
|
[143] |
W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and G. Zweig, "Achieving human parity in conversational speech recognition, " arXiv:1610.05256, 2016. https://www.researchgate.net/publication/309207213_Achieving_Human_Parity_in_Conversational_Speech_Recognition
|
[144] |
G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. D. Cui, B. Ramabhadran, M. Picheny, L. L. Lim, B. Roomi, amd P. Hall, "English conversational telephone speech recognition by humans and machines, " arXiv:1703.02136, 2017. https://www.researchgate.net/publication/314283069_English_Conversational_Telephone_Speech_Recognition_by_Humans_and_Machines
|
[145] |
V. Vanhoucke, A. Senior, and M. Z. Mao, "Improving the speed of neural networks on CPUs, " in Proc. 2011 Deep Learning and Unsupervised Feature Learning NIPS Workshop, Granada, Spain, 2011. https://www.researchgate.net/publication/267429210_Improving_the_speed_of_neural_networks_on_CPUs
|
[146] |
R. Alvarez, R. Prabhavalkar, and A. Bakhtin, "On the efficient representation and execution of deep acoustic models, " arXiv:1607.04683v1, 2016. https://www.researchgate.net/publication/307889746_On_the_Efficient_Representation_and_Execution_of_Deep_Acoustic_Models
|
[147] |
R. Takeda, K. Nakadai, and K. Komatani, "Acoustic model training based on node-wise weight boundary model for fast and small-footprint deep neural networks, " Computer Speech & Language, to be published. http://www.sciencedirect.com/science/article/pii/S0885230816300699
|
[148] |
Y. Q. Wang, J. Li, and Y. F. Gong, "Small-footprint high-performance deep neural network-based speech recognition using split-VQ, " in Proc. 2015 IEEE Int. Conf. Acoustics, Speech and Signal Processing Brisbane, QLD, Australia, 2015, pp. 4984-4988. https://www.researchgate.net/publication/308821136_Small-footprint_high-performance_deep_neural_network-based_speech_recognition_using_split-VQ
|
[149] |
V. Vanhoucke, M. Devin, and G. Heigold, "Multiframe deep neural networks for acoustic modeling, " in Proc. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing Vancouver, BC, Canada, 2013, pp. 7582-7585. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.2471&rep=rep1&type=pdf
|
[150] |
G. Pundak and T. N. Sainath, "Lower frame rate neural network acoustic models, " in 17th Proc. Interspeech, San Francisco, USA, 2016, pp. 22-26. https://www.researchgate.net/publication/307889457_Lower_Frame_Rate_Neural_Network_Acoustic_Models
|