首页 | 官方网站   微博 | 高级检索  
     

早晚期混响划分对理想比值掩蔽在语音识别性能上的影响
引用本文:高飞,黄哲莹,王子腾,李军锋,颜永红.早晚期混响划分对理想比值掩蔽在语音识别性能上的影响[J].声学学报,2019,44(4):788-795.
作者姓名:高飞  黄哲莹  王子腾  李军锋  颜永红
作者单位:1. 中国科学院声学研究所 语言声学与内容理解重点实验室 北京 100190;
基金项目:国家自然科学基金项目(11590770-4,61650202,11722437,U1536117,61671442,11674352,11504406,61601453)国家重点研究开发计划项目(2016yfb0801203,2016YFC0800503,2017YFB1002803)新疆维吾尔自治区科技攻关项目(2016A03007-1)资助
摘    要:真实环境中存在的噪声和混响会降低语音识别系统的性能。封闭空间中的混响包括直达声、早期反射和后期混响3部分,它们对语音识别系统具有不同的影响.我们研究了早期反射和后期混响的不同划分方法,以其中的早期反射为目标语音,计算出了不同的理想比值掩蔽并研究了它们对语音识别系统性能的影响;在此基础上,利用双向长短时记忆网络(BLSTM)估计理想比值掩蔽,测试它们对语音识别系统性能的影响.实验结果表明,基于Abel早期反射和后期混响的划分方法,理想比值掩蔽能够降低词错误率约2.8%;基于BLSTM的估计方法过低估计了理想比值掩蔽,未能有效提高语音识别系统的性能。 

关 键 词:语音识别    直达声    早期反射声    理想比值声掩蔽效应
收稿时间:2019-02-19

Effect of ideal ratio mask using different early and late reverberation partition methods on speech recognition performance
Affiliation:1. Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences Beijing 100190;2. University of Chinese Academy of Sciences Beijing 100049;3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang TechnicalInstitute of Physics and Chemistry, Chinese Academy of Sciences Urumqi 830011
Abstract:In the real world,noise and reverberation can degrade the performance of speech recognition systems.Reverberation in closed space includes the direct sound,early reflections and late reverberation,which have different effects on speech recognition systems.We focus on different methods of dividing the early and late reverberation,and take the early reflections as the target signals,which is used to calculate different ideal ratio masks whose effects on the performance of speech recognition systems are evaluated.Based on this,we estimate the masks using Bidirectional Long Short-Term Memory network(BLSTM) and test their impact on the performance of speech recognition systems.The experimental results show that the ideal ratio masks can reduce the word error rate by about 2.8%using the Abel's method for dividing early reflection and late reverberation.The BLSTM method underestimates the ideal ratio masks and fails to improve the performance of the speech recognition systems. 
Keywords:
点击此处可从《声学学报》浏览原始摘要信息
点击此处可从《声学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号