深度复卷积递归网络模型的师生学习语声增强方法 Teacher-student learning for speech enhancement based on deep complex convolution recurrent network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

深度复卷积递归网络模型的师生学习语声增强方法

引用本文：	卞金洪,吴瑞琦,周锋,赵力.深度复卷积递归网络模型的师生学习语声增强方法[J].应用声学,2023,42(2):269-275.

作者姓名：	卞金洪吴瑞琦周锋赵力

作者单位：	盐城工学院信息工程学院,盐城工学院信息工程学院,盐城工学院信息工程学院,东南大学信息科学与工程学院

基金项目：	(61673108),江苏省高等学校自然科学研究重大项目(19KJA110002),江苏省高校自然科学研究面上项目(19KJB510061),江苏省自然科学(BK20181050),江苏省产学研指导项目(BY2020358,BY2020335)

摘要：	基于深度神经网络的方法已经在语声增强领域得到了广泛的应用，然而若想取得理想的性能，一般需要规模较大且复杂度较高的模型。因此，在计算资源有限的设备或对延时要求高的环境下容易出现部署困难的问题。为了解决此问题，提出了一种基于深度复卷积递归网络的师生学习语声增强方法。在师生深度复卷积递归网络模型结构中间的复长短时记忆递归模块提取实部和虚部特征流，并分别计算帧级师生距离损失以进行知识转移。同时使用多分辨率频谱损失以进一步提升低复杂度学生模型的性能。实验在公开数据集Voice Bank Demand和DNS Challenge上进行，结果显示所提方法相对于基线学生模型在各项指标上均有明显提升。
关键词：	语声增强递归神经网络长短期记忆网络知识蒸馏
收稿时间：	2021/12/2 0:00:00
修稿时间：	2023/3/2 0:00:00
Teacher-student learning for speech enhancement based on deep complex convolution recurrent network

Bian jinhong,WU Ruiqi,Zhou Feng and Zhao Li.Teacher-student learning for speech enhancement based on deep complex convolution recurrent network[J].Applied Acoustics,2023,42(2):269-275.

Authors:	Bian jinhong WU Ruiqi Zhou Feng and Zhao Li

Institution:	School of Information Technology, Yancheng Institute of Technology,School of Information Technology, Yancheng Institute of Technology,School of Information Technology, Yancheng Institute of Technology,School of Information Science and Engineering, Southeast University

Abstract:	Deep learning-based methods have been widely used in the field of speech enhancement. However, a model with large scale and high complexity is typically required to achieve the desired performance. Hence, deployment difficulties may occur in devices with limited hardware resources or in applications with strict latency requirements. In order to solve this problem, a teacher-student learning method for speech enhancement based on deep complex convolution recurrent network (DCCRN) is proposed. The real and imaginary feature streams are extracted from the output of the complex long short term memory (Complex LSTM) in the middle of the DCCRN model, and the frame-level teacher-student distance loss is calculated to transfer knowledge. Meanwhile, the multi-resolution spectrum loss is used to further improve the performance of the low-complexity student model. The experiment was conducted on the open source dataset Voice Bank Corpus, and the results show that the proposed method has a significant improvement in various indicators compared with the baseline student model.

Keywords:	Speech Enhancement Recurrent neural networks Long short-term memory networks Knowledge Distillation

	点击此处可从《应用声学》浏览原始摘要信息
	点击此处可从《应用声学》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏