感知听觉场景分析的说话人识别 Perception auditory scene analysis for speaker recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

感知听觉场景分析的说话人识别

引用本文：	吴迪,陶智,张晓俊,周燕,潘欣裕,肖仲喆,赵鹤鸣.感知听觉场景分析的说话人识别[J].声学学报,2016,41(2):260-272.

作者姓名：	吴迪陶智张晓俊周燕潘欣裕肖仲喆赵鹤鸣

作者单位：	1 苏州大学物理与光电·能源学部苏州 215006;

基金项目：	国家自然科学基金(61271359,61372146)和江苏省自然科学青年基金(BK20140354)资助

摘要：	针对低信噪比说话人识别中缺失数据特征方法鲁棒性下降的问题,提出了一种采用感知听觉场景分析的缺失数据特征提取方法。首先求取语音的缺失数据特征谱,并由语音的感知特性求出感知特性的语音含量。含噪语音经过感知特性的语音增强和对其语谱的二维增强后求解出语音的分布,联合感知特性语音含量和缺失强度参数提取出感知听觉因子。再结合缺失数据特征谱把特征的提取过程分解为不同听觉场景进行区分地分析和处理,以增强说话人识别系统的鲁棒性能。实验结果表明,在-10 dB到10 dB的低信噪比环境下,对于4种不同的噪声,提出的方法比5种对比方法的鲁棒性均有提高,平均识别率分别提高26.0%,19.6%,12.7%,4.6%和6.5%。论文提出的方法,是一种在时-频域中寻找语音鲁棒特征的方法,更适合于低信噪比环境下的说话人识别。
收稿时间：	2015-03-05
Perception auditory scene analysis for speaker recognition

Institution:	1 College of Physics, Optoelectronics and Energy, Soochow University Suzhou 215006;2 School of Electronic Information, Soochow University Suzhou 215006

Abstract:	For the decreasing robustness of missing data features method of speaker recognition in low-SNRs environ- ment, a missing data features extraction method based on Perception Auditory Scene Analysis is proposed. Missing data features spectrum is calculated firstly. And perception speech content is solved by speech perception characteristic. After speech enhancement based on auditory perceptual characteristic and a 2 dimension enhancement for spectrogram, speech distribution is obtained from noisy speech, which is combined with perception speech content and missing inten- sity parameter to extract Perception Auditory Factor. Perception Auditory Factor and missing data features spectrum resolve the features extraction process into different auditory scenes, which are treated respectively in order to improve robustness of speaker recognition system. Experimental results show that, the proposed method improves the robustness to other five methods in four different noisy low-SNRs environments from -10 dB to 10 dB. The average recognition rates of the proposed method increase 26.0%, 19.6%, 12.7%, 4.6% and 6.5% respectively. The proposed method is to find out the robust features in time- frequency domain, and more suitable for speaker recognition in low-SNRs environment.

Keywords:
本文献已被 CNKI 等数据库收录！
	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏