两阶段复数谱卷积循环网络立体声回声消除 Convolutional recurrent network-based complex stereophonic acoustic echo cancellation with a two-stage approach期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

两阶段复数谱卷积循环网络立体声回声消除

引用本文：	程琳娟, 彭任华, 郑成诗, 李晓东. 两阶段复数谱卷积循环网络立体声回声消除[J]. 声学学报, 2023, 48(1): 199-214. DOI: 10.15949/j.cnki.0371-0025.2023.01.028

作者姓名：	程琳娟彭任华郑成诗李晓东

作者单位：	1 中国科学院声学研究所, 噪声与振动重点实验室北京 100190;

基金项目：	国家自然科学基金项目(61801468)资助

摘要：	提出了一种两阶段复数谱卷积循环网络(CRN)的立体声回声消除(SAEC)算法,该算法无需对立体声信号进行去相关,因而能够在保证立体声音质和空间感的同时,解决自适应滤波SAEC算法非唯一解问题。所提算法采用两个阶段进行回声消除,第一阶段根据传声器接收信号和参考信号估计回声信号,第二阶段将估计回声信号作为先验信息,联合传声器接收信号作为输入特征,估计近端语音。相对于单阶段CRN算法,该方法能够提高网络对回声和近端语音的区分度,有助于近端语音的提取。另外,网络的输入特征和训练目标均采用复数谱,降低了近端语音的相位估计误差,因而可以进一步提升算法性能。实验表明,基于两阶段复数谱CRN的SAEC算法在单端讲话时的回声抑制量和双端讲话时的语音质量都明显优于传统算法以及单阶段CRN算法。
关键词：	立体声回声消除深度学习复数谱两阶段
收稿时间：	2022-04-06
修稿时间：	2022-10-07
Convolutional recurrent network-based complex stereophonic acoustic echo cancellation with a two-stage approach

CHENG Linjuan, PENG Renhua, ZHENG Chengshi, LI Xiaodong. Convolutional recurrent network-based complex stereophonic acoustic echo cancellation with a two-stage approach[J]. ACTA ACUSTICA, 2023, 48(1): 199-214. DOI: 10.15949/j.cnki.0371-0025.2023.01.028

Authors:	CHENG Linjuan PENG Renhua ZHENG Chengshi LI Xiaodong

Affiliation:	1 Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences Beijing 100190;2 University of Chinese Academy of Sciences Beijing 100049

Abstract:	We propose to use a two-stage Convolutional Recurrent Network (CRN) to address the Stereophonic Acoustic Echo Cancellation (SAEC) problem with complex spectral input features. The proposed algorithm avoids the decorrelation of far-end signals, which solves the non-unique solution problem of the adaptive filter-based SAEC and ensures the stereo sound quality and spatial perception. It deals with SAEC problem in two stages. In the first stage, a CRN model is used to estimate the echo signal based on the microphone and the far-end signals. In the second stage, a CRN model is used to estimate the near-end speech based on the microphone input signal and the estimated echo signal from the first stage. The discrimination between echo and near-end signal of the model can be improved by using the estimated echo signal as a priori information, which benefits the estimation of near-end signal. The input features and training targets used in the network are the complex spectral of signals, which can recover the phase information of the near-end speech. Experimental results show that the SAEC algorithm based on the proposed two-stage CRN model has significantly better performance than the traditional algorithms and single-stage CRN model in terms of both echo suppression in single-talk period and speech quality in double-talk period.

Keywords:	Stereophonic acoustic echo cancellation Deep learning Complex spectrum Two stage

	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏