深度神经网络技术在汉语语音识别声学建模中的优化策略 Optimization of deep neural network in acoustic modeling for mandarin speech recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

深度神经网络技术在汉语语音识别声学建模中的优化策略

引用本文：	肖业鸣,张晴晴,宋黎明,潘接林,颜永红.深度神经网络技术在汉语语音识别声学建模中的优化策略[J].重庆邮电大学学报(自然科学版),2014,26(3):373-379.

作者姓名：	肖业鸣张晴晴宋黎明潘接林颜永红

作者单位：	中国科学院语言声学与内容理解重点实验室，北京 100190;中国科学院语言声学与内容理解重点实验室，北京 100190;中国科学院语言声学与内容理解重点实验室，北京 100190;中国科学院语言声学与内容理解重点实验室，北京 100190;中国科学院语言声学与内容理解重点实验室，北京 100190

基金项目：	国家自然科学基金（10925419, 90920302, 61072124, 11074275, 11161140319，91120001，61271426）；中国科学院战略性先导科技专项（XDA06030100，XDA06030500）；国家“863”计划（2012AA012503）；中科院重点部署项目（KGZD-EW-103-2）

摘要：	将深度神经网络作为声学模型引入面向汉语电话自然口语交谈语音识别系统。针对自然口语中识别字错误率较高的问题，从语音的声学特征类型选择、模型训练时元参数调节以及改善模型泛化能力等方面出发，对基于深度神经网络的声学模型建模技术进行了一系列的优化。针对训练样本中状态先验概率分布稀疏的情况，提出了一种状态先验概率平滑算法，在一定程度上缓解了这种数据稀疏问题，经平滑后，字错误率下降超过1%。在所采用的3个电话自然口语交谈测试集上，相对于优化前的深度神经网络模型，经过优化后的模型取得了性能的一致提升，字错误率平均相对降低15%。实验结果表明，所采用优化策略可以有效地改善深度神经网络声学模型性能。
关键词：	深层神经网络语音识别隐马尔科夫模型概率平滑
收稿时间：	2014/1/14 0:00:00
修稿时间：	2014/4/15 0:00:00
Optimization of deep neural network in acoustic modeling for mandarin speech recognition

XIAO Yeming,ZHANG Qingqing,SONG Liming,PAN Jielin and YAN Yonghong.Optimization of deep neural network in acoustic modeling for mandarin speech recognition[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(3):373-379.

Authors:	XIAO Yeming ZHANG Qingqing SONG Liming PAN Jielin and YAN Yonghong

Abstract:	The deep neural network (DNN) as acoustic model is introduced into the Mandarin Conversational Telephone Speech recognition system. Firstly, as the character error rate is high for the spontaneous speech recognition, started from the acoustic feature type selection, meta-parameters tuning during training and the optimization of the model generalization capability, a series of optimizations have been implemented to the DNN based acoustic modeling. Secondly, a smoothing algorithm is proposed for the sparse distribution of the states prior probabilities in the training samples, with this algorithm the character error rate is reduced by 1% absolutely. And finally, on our three conversational telephone speech test sets, the optimized-DNN model achieves a consistent performance enhancement over the baseline-DNN model, the average relative character error rate decreases by 15%. This experimental results demonstrate that these optimized strategies can improve the performance of the DNN based acoustic models.

Keywords:	deep neural network speech recognition hidden Markov model probability smoothing

	点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏