人耳听觉相关代价函数深度学习单通道语声增强算法 Deep learning-based single-channel speech enhancement based on human auditory related cost function期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

人耳听觉相关代价函数深度学习单通道语声增强算法

引用本文：	程琳娟,彭任华,郑成诗,李晓东.人耳听觉相关代价函数深度学习单通道语声增强算法[J].应用声学,2022,41(4):654-666.

作者姓名：	程琳娟彭任华郑成诗李晓东

作者单位：	中国科学院噪声与振动重点实验室声学研究所北京,中国科学院噪声与振动重点实验室声学研究所北京,中国科学院噪声与振动重点实验室声学研究所北京,中国科学院噪声与振动重点实验室声学研究所北京

基金项目：	国家自然科学基金项目（面上项目，重点项目，重大项目）

摘要：	均方误差函数是深度学习单通道语声增强算法最常用的一种代价函数。然而，均方误差值的大小与语声质量好坏并非完全相关。为了提高算法性能，该文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数，考虑了人耳听觉掩蔽效应；第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数，强调语声谱峰的重要性，侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能，并与均方误差代价函数进行对比。实验结果表明，基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。
关键词：	语声增强深度学习人耳听觉
收稿时间：	2021/5/26 0:00:00
修稿时间：	2022/6/30 0:00:00
Deep learning-based single-channel speech enhancement based on human auditory related cost function

CHENG Linjuan,PENG Renhu,ZHENG Chengshi and LI Xiaodong.Deep learning-based single-channel speech enhancement based on human auditory related cost function[J].Applied Acoustics,2022,41(4):654-666.

Authors:	CHENG Linjuan PENG Renhu ZHENG Chengshi and LI Xiaodong

Institution:	Key Laboratory of Noise and Vibration Research,Key Laboratory of Noise and Vibration Research,Key Laboratory of Noise and Vibration Research,Key Laboratory of Noise and Vibration Research

Abstract:	Mean-square error (MSE) function is one of the most commonly used cost functions in deep learning-based single-channel speech enhancement. However, the value of MSE is not completely related to speech quality. In order to improve the performance of speech enhancement algorithm, we introduce two classes of cost functions related to human auditory during training network in this paper. The first class is a weighted-Euclidean cost function, which takes auditory masking effect into account. The second class of cost functions include Itakura-Satio cost function, COSH cost function, and weighted likelihood ratio cost function, which place more emphasis on spectral peaks than spectral valleys. The performance of these perceptually motivated cost functions in single-channel speech enhancement is analysed and compared based on long short-term memory and compared with the MSE cost function. Experimental results indicate that the deep neural network-based single-channel speech enhancement with weighted-Euclidean cost function can achieve better speech quality and lower residual noise.

Keywords:	speech enhancement deep learning human auditory

	点击此处可从《应用声学》浏览原始摘要信息
	点击此处可从《应用声学》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏