联合深度编解码网络和时频掩蔽估计的单通道语音增强 Time frequency masking based speech enhancement using deep encoder-decoder neural network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

联合深度编解码网络和时频掩蔽估计的单通道语音增强

引用本文：	时文华, 张雄伟, 邹霞, 孙蒙, 李莉. 联合深度编解码网络和时频掩蔽估计的单通道语音增强[J]. 声学学报, 2020, 45(3): 299-307. DOI: 10.15949/j.cnki.0371-0025.2020.03.002

作者姓名：	时文华张雄伟邹霞孙蒙李莉

作者单位：	陆军工程大学南京210007;北京航空工程技术研究中心南京210028;陆军工程大学南京210007

基金项目：	国家自然科学基金项目(61471394)和江苏省优秀青年基金项目(BK20180080)资助

摘要：	提出了一种联合深度编解码神经网络和时频掩蔽估计的语音增强方法。该方法利用深度编解码网络估计时频掩蔽表示,并联合带噪语音的幅度谱学习带噪语音与纯净语音幅度谱之间的非线性映射关系。深度编解码网络采用卷积-反卷积网络结构。在编码端,利用卷积网络的局部感知特性,对带噪语音的时频域结构特征进行建模,提取语音特征,同时抑制背景噪声。在解码端,利用编码端提取到的语音特征逐层恢复局部细节信息并重构语音信号。同时,在编解码端对应层之间引入跳跃连接,以减少由于池化和全连接操作导致的低层细节信息丢失的问题。在TIMIT语音库和不完全匹配噪声集下进行仿真实验,实验结果表明,该方法可以有效抑制噪声,且能较好地恢复出语音细节成分。
收稿时间：	2018-09-10
修稿时间：	2019-06-11
Time frequency masking based speech enhancement using deep encoder-decoder neural network

SHI Wenhua, ZHANG Xiongwei, ZOU Xia, SUN Meng, LI Li. Time frequency masking based speech enhancement using deep encoder-decoder neural network[J]. ACTA ACUSTICA, 2020, 45(3): 299-307. DOI: 10.15949/j.cnki.0371-0025.2020.03.002

Authors:	SHI Wenhua ZHANG Xiongwei ZOU Xia SUN Meng LI Li

Affiliation:	1 Army Engineering University Nanjing 210007;2 Beijing Aeronautical Technology Research Center Nanjing 210028

Abstract:	A time-frequency masking estimation based speech enhancement method with deep encoder-decoder neural network is presented.In this method,the time-frequency masking representation is estimated using deep encoder-decoder neural network,and it is combined with the amplitude spectrum of noisy speech to get the nonlinear mapping relationship between noisy and target speech.The convolutional and de-convolutional structures are employed in the deep encoderdecoder neural network.At the encoder,the local perception characteristics are used to model the typical structural features of noisy speech in the time-frequency domain,speech features are extracted and the influence of background noise is suppressed.At the decoder,the speech signal is reconstructed from the extracted speech features and local details of speech are recovered layer by layer.Meanwhile,skip connections are introduced between the corresponding layers of encoder and decoder to reduce the loss of details at low levels which is induced by pooling and full connection operations.Experiments are carried out with speech from TIMIT database and noise from NOISEX-92 database.The simulation results demonstrate that the proposed method can effectively suppress noise and recover the detailed information of speech.

Keywords:
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏