首页 | 本学科首页   官方微博 | 高级检索  
     检索      

深度神经网络技术在汉语语音识别声学建模中的优化策略
引用本文:肖业鸣,张晴晴,宋黎明,潘接林,颜永红.深度神经网络技术在汉语语音识别声学建模中的优化策略[J].重庆邮电大学学报(自然科学版),2014,26(3):373-379.
作者姓名:肖业鸣  张晴晴  宋黎明  潘接林  颜永红
作者单位:中国科学院 语言声学与内容理解重点实验室,北京 100190;中国科学院 语言声学与内容理解重点实验室,北京 100190;中国科学院 语言声学与内容理解重点实验室,北京 100190;中国科学院 语言声学与内容理解重点实验室,北京 100190;中国科学院 语言声学与内容理解重点实验室,北京 100190
基金项目:国家自然科学基金(10925419, 90920302, 61072124, 11074275, 11161140319,91120001,61271426);中国科学院战略性先导科技专项(XDA06030100,XDA06030500);国家“863”计划(2012AA012503);中科院重点部署项目(KGZD-EW-103-2)
摘    要:将深度神经网络作为声学模型引入面向汉语电话自然口语交谈语音识别系统。针对自然口语中识别字错误率较高的问题,从语音的声学特征类型选择、模型训练时元参数调节以及改善模型泛化能力等方面出发,对基于深度神经网络的声学模型建模技术进行了一系列的优化。针对训练样本中状态先验概率分布稀疏的情况,提出了一种状态先验概率平滑算法,在一定程度上缓解了这种数据稀疏问题,经平滑后,字错误率下降超过1%。在所采用的3个电话自然口语交谈测试集上,相对于优化前的深度神经网络模型,经过优化后的模型取得了性能的一致提升,字错误率平均相对降低15%。实验结果表明,所采用优化策略可以有效地改善深度神经网络声学模型性能。

关 键 词:深层神经网络  语音识别  隐马尔科夫模型  概率平滑
收稿时间:2014/1/14 0:00:00
修稿时间:2014/4/15 0:00:00

Optimization of deep neural network in acoustic modeling for mandarin speech recognition
XIAO Yeming,ZHANG Qingqing,SONG Liming,PAN Jielin and YAN Yonghong.Optimization of deep neural network in acoustic modeling for mandarin speech recognition[J].Journal of Chongqing University of Posts and Telecommunications,2014,26(3):373-379.
Authors:XIAO Yeming  ZHANG Qingqing  SONG Liming  PAN Jielin and YAN Yonghong
Abstract:The deep neural network (DNN) as acoustic model is introduced into the Mandarin Conversational Telephone Speech recognition system. Firstly, as the character error rate is high for the spontaneous speech recognition, started from the acoustic feature type selection, meta-parameters tuning during training and the optimization of the model generalization capability, a series of optimizations have been implemented to the DNN based acoustic modeling. Secondly, a smoothing algorithm is proposed for the sparse distribution of the states prior probabilities in the training samples, with this algorithm the character error rate is reduced by 1% absolutely. And finally, on our three conversational telephone speech test sets, the optimized-DNN model achieves a consistent performance enhancement over the baseline-DNN model, the average relative character error rate decreases by 15%. This experimental results demonstrate that these optimized strategies can improve the performance of the DNN based acoustic models.
Keywords:deep neural network  speech recognition  hidden Markov model  probability smoothing
点击此处可从《重庆邮电大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《重庆邮电大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号