首页 | 本学科首页   官方微博 | 高级检索  
     检索      

用于自适应波束形成语音增强的球谐域掩蔽函数估计方法
引用本文:柯雨璇,厉剑,彭任华,郑成诗,李晓东.用于自适应波束形成语音增强的球谐域掩蔽函数估计方法[J].声学学报,2021,46(1):67-80.
作者姓名:柯雨璇  厉剑  彭任华  郑成诗  李晓东
作者单位:1. 中国科学院声学研究所 噪声与振动重点实验室 北京 100190;
基金项目:国家自然科学基金项目(61571435,61801468)资助。
摘    要:提出一种用于球形阵列自适应波束形成的掩蔽函数估计方法。该方法利用包含空间信息的球谐系数提取低维空间向量,并采用复高斯混合模型和深度学习两种方案来估计掩蔽函数,最终利用估计的掩蔽函数设计最小方差无失真响应波束形成器,以达到空域滤波的效果。理论分析和仿真实验证明,对于相同时长的声信号,球谐域掩蔽函数估计方法的计算复杂度比传统阵元域估计方法低了一个数量级。并且在大部分声场环境中,尤其在低信噪比情况下,所提方法的语音质量感知评估测度得分、分段信噪比和短时客观可懂度明显高于阵元域方法,三者最高分别可提升1.31 dB,4.54 dB和35%。另外,实际声学环境的测量实验也验证了所提方法在不影响可懂度的条件下比传统阵元域方法具备更高的降噪量。 

关 键 词:球谐函数    传声器阵列    波束形成    掩蔽函数
收稿时间:2019-03-15

Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement
KE Yuxuan,LI Jian,PENG Renhua,ZHENG Chengshi,LI Xiaodong.Mask estimation method in the spherical harmonic domain used by adaptive beamforming for speech enhancement[J].Acta Acustica,2021,46(1):67-80.
Authors:KE Yuxuan  LI Jian  PENG Renhua  ZHENG Chengshi  LI Xiaodong
Institution:1. Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190;2. University of Chinese Academy of Sciences, Beijing 100049
Abstract:A mask estimation method for adaptive beamforming for spherical microphone arrays is proposed which at first extracts the low-dimensional spatial vector containing spatial information from the spherical harmonic coefficients of the received signals,and then employs a Complex Gaussian Mixture Model(CGMM) or a deep learning network to estimate the mask.Finally,the estimate mask is used to design the Minimum Variance Distortionless Response(MVDR) beamformer,so that the directional interferences can be suppressed.The simulation results show that the computational complexity of the proposed method is one-level magnitude lower than the conventional method processing in microphone domain,and the corresponding MVDR beamformer can achieve much better performance in terms of Perceptual Evaluation of Speech Quality(PESQ),segmental Signal-to-Noise Ratio(segSNR),and Short-Time Objective Intelligibility(STOI) in most acoustic scenarios,especially when the Signal-to-Noise Ratio(SNR) is relatively low.The maximal improvement of that three objective metrics are about 1.31 dB,4.54 dB and 35%,respectively.In addition,the experiments conducted in real acoustic environment indicate that the proposed method can achieve more noise reduction amount than the conventional method without impacting the speech intelligibility.
Keywords:
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《声学学报》浏览原始摘要信息
点击此处可从《声学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号