首页 | 本学科首页   官方微博 | 高级检索  
     

基于双向循环神经网络的汉语语音识别*
引用本文:李鹏,杨元维,杜李慧,高贤君,周意,蒋梦月,张净波. 基于双向循环神经网络的汉语语音识别*[J]. 应用声学, 2020, 39(3): 464-471
作者姓名:李鹏  杨元维  杜李慧  高贤君  周意  蒋梦月  张净波
作者单位:长江大学,长江大学,长江大学,长江大学地球科学学院,长江大学,长江大学,长江大学
基金项目:湖北省教育厅科学研究计划资助项目(Q20181317);长江大学大学生创新创业基金项目(2018012);地理国情监测国家测绘地理信息局重点实验室开发基金项目(2017NGCM07)。
摘    要:当前基于深度神经网络模型中,虽然其隐含层可设置多层,对复杂问题适应能力强,但每层之间的节点连接是相互独立的,这种结构特性导致了在语音序列中无法利用上下文相关信息来提高识别效果,而传统的循环神经网络虽然做出了改进,但是只能对上文信息进行利用。针对以上问题,该文采用可以同时利用语音序列中上下文相关信息的双向循环神经网络模型与深度神经网络模型相结合,并应用于语音识别。构建具有5层隐含层的模型,其中第3层为双向循环神经网络结构,其他层采用深度神经网络结构。实验结果表明:加入了双向循环神经网络结构的模型与其他模型相比,较好地提高了识别正确率;噪声对双向循环神经网络汉语识别有重要影响,尤其是训练集和测试集附加噪声类型不同时,单一的含噪声语音的训练模型无法适应不同噪声类型的语音识别;调整神经网络模型中隐含层神经元数量后,识别正确率并不是一直随着隐含层中神经元数量的增加而增加,神经元数量数目增加到一定程度后正确率出现了降低的趋势。

关 键 词:语音识别  深度学习  深度神经网络  循环神经网络
收稿时间:2019-03-19
修稿时间:2020-05-03

Study of Chinese speech recognition based on Bi-RNN
LI Peng,YANG Yuanwei,GAO Xianjun,DU Lihui,ZHOU Yi,JANG Mengyue,ZHANG Jingbo. Study of Chinese speech recognition based on Bi-RNN[J]. Applied Acoustics(China), 2020, 39(3): 464-471
Authors:LI Peng  YANG Yuanwei  GAO Xianjun  DU Lihui  ZHOU Yi  JANG Mengyue  ZHANG Jingbo
Affiliation:Yangtze University,Yangtze University,Yangtze University,College of Geosciences, Yangtze University,Yangtze University,Yangtze University,Yangtze University
Abstract:Within deep neural network (DNN) models, the hidden layer can be set up multi-level, adaptable to complicated problem, but the node connected between each layer is independent of each other, the structure characteristics make it impossible to use contextual information in the speech sequence to improve the effect of recognition, and while a traditional recurrent neural network (RNN) has made the improvement, but only to use the above information. To solve the above problems, the bidirectional RNN (Bi-RNN) model and DNN model were combined in this paper, which can simultaneously utilize the context-related information in speech sequences, and apply them to speech recognition. A model with five hidden layers was constructed, in which the third layer was Bi-RNN structure and the other layers were DNN structure. The experimental results show that: compared with other models, the model with Bi-RNN structure improves the recognition accuracy. Noise plays an important role in Bi-RNN Chinese language recognition. In particular, the training set and test set have different types of additional noise. After adjusting the number of neurons in the hidden layer in the neural network model, the recognition accuracy does not always increase with the increase of the number of neurons in the hidden layer, but decreases after the number of neurons increases to a certain extent.
Keywords:Speech recognition  Deep learning  Deep neural network  Recurrent neural network
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号