期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于修正Mel域掩蔽模型和无语音概率的耳语音增强 总被引：1，自引：0，他引：1

陶智赵鹤鸣吴迪陈大庆张晓俊《声学学报》2009,34(4):370-377

提出了一种基于修正Mel域听觉掩蔽模型和无语音概率的耳语音增强方法。该方法根据耳语音的发音特点对Mel频率进行修正,对每一帧耳语音信号进行Mel域频带滤波,同时通过无语音概率(SAP)动态地确定每个频带的听觉掩蔽阈值,对不同的听觉掩蔽阈值自适应地调整谱减系数来进行耳语音增强。对增强后的耳语音进行客观和主观测试,结果表明,该方法与其它谱减法相比,能将残留噪声和背景噪声控制在人耳掩蔽阈值下,取得更小的语音失真,主观听觉也得到了很大的改善。相似文献

2.

联合深度编解码网络和时频掩蔽估计的单通道语音增强 总被引：1，自引：0，他引：1

下载免费PDF全文

时文华张雄伟邹霞孙蒙李莉《声学学报》2020,45(3):299-307

提出了一种联合深度编解码神经网络和时频掩蔽估计的语音增强方法。该方法利用深度编解码网络估计时频掩蔽表示,并联合带噪语音的幅度谱学习带噪语音与纯净语音幅度谱之间的非线性映射关系。深度编解码网络采用卷积-反卷积网络结构。在编码端,利用卷积网络的局部感知特性,对带噪语音的时频域结构特征进行建模,提取语音特征,同时抑制背景噪声。在解码端,利用编码端提取到的语音特征逐层恢复局部细节信息并重构语音信号。同时,在编解码端对应层之间引入跳跃连接,以减少由于池化和全连接操作导致的低层细节信息丢失的问题。在TIMIT语音库和不完全匹配噪声集下进行仿真实验,实验结果表明,该方法可以有效抑制噪声,且能较好地恢复出语音细节成分。相似文献

3.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

4.

Application of spectral subtraction method on enhancement of electrolarynx speech

Liu H Zhao Q Wan M Wang S 《The Journal of the Acoustical Society of America》2006,120(1):398-406

Although electrolarynx (EL) serves as an important method of phonation for the laryngectomees, the resulting speech is of poor intelligibility due to the presence of a steady background noise caused by the instrument, even worse in the case of additive noise. This paper investigates the problem of EL speech enhancement by taking into account the frequency-domain masking properties of the human auditory system. One approach is incorporating an auditory masking threshold (AMT) for parametric adaptation in a subtractive-type enhancement process. The other is the supplementary AMT (SAMT) algorithm, which applies a cross-correlation spectral subtraction (CCSS) approach as a post-processing scheme to enhancing EL speech dealt with the AMT method. The performance of these two algorithms was evaluated as compared to the power spectral subtraction (PSS) algorithm. The best performance of EL speech enhancement was associated with the SAMT algorithm, followed by the AMT algorithm and the PSS algorithm. Acoustic and perceptual analyses indicated that the AMT and SAMT algorithms achieved the better performances of noise reduction and the enhanced EL speech was more pleasant to human listeners as compared to the PSS algorithm. 相似文献

5.

面向自定义语音唤醒的关键词相关的单通道语音增强

下载免费PDF全文

刘作桢吴愁黎塔赵庆卫《声学学报》2023,48(2):415-424

提出一种面向自定义语音唤醒的单通道语音增强方法。该方法预先将关键词音素信息存入文本编码矩阵,并在常规语音增强模型基础上添加一个基于注意力机制的音素偏置模块。该模块利用语音增强模型中间特征从文本编码矩阵中获取当前帧的音素信息,并将其融入语音增强模型的后续计算中,从而提升语音增强模型对关键词相关音素的增强效果。在不同噪声环境下的实验结果表明,该方法可以更有效地抑制关键词部分噪声。同时所提出方法对比常规语音增强方法与其他文本相关语音增强方法,在自定义语音唤醒性能上可以分别获得14.3%和7.6%的相对提升。相似文献

6.

听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法

下载免费PDF全文

王玥李平崔杰《声学学报》2013,38(4):501-508

为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。相似文献

7.

Blind speech source separation via nonlinear time-frequency masking

XU Shun CHEN Shaorong LIU Yulin 《声学学报：英文版》2008,27(3):203-214

Aim at the underdetermined convolutive mixture model, a blind speech source separation method based on nonlinear time-frequency masking was proposed, where the approximate W-disjoint orthogonality （W-DO） property among independent speech signals in time-frequency domain is utilized. In this method, the observation mixture signal from multimicrophones is normalized to be independent of frequency in the time-frequency domain at first, then the dynamic clustering algorithm is adopted to obtain the active source information in each time-frequency slot, a nonlinear function via deflection angle from the cluster center is selected for time-frequency masking, finally the blind separation of mixture speech signals can be achieved by inverse STFT （short-time Fourier transformation）. This method can not only solve the problem of frequency permutation which may be met in most classic frequency-domain blind separation techniques, but also suppress the spatial direction diffusion of the separation matrix. The simulation results demonstrate that the proposed separation method is better than the typical BLUES method, the signal-noise-ratio gain （SNRG） increases 1.58 dB averagely. 相似文献

8.

U-net网络中融合多头注意力机制的单通道语音增强EI北大核心CSCD

下载免费PDF全文

范君怡杨吉斌张雄伟郑昌艳《声学学报》2022,47(6):703-716

在低信噪比和突发背景噪声条件下,已有的深度学习网络模型在单通道语音增强方面效果并不理想,而人类可以利用语音的长时相关性对不同的语音信号形成综合感知。因此刻画语音的长时依赖关系有助于改进低信噪比和突发背景噪声下的增强性能。受该特性的启发,提出一种融合多头注意力机制和U-net深度网络的增强模型TU-net,实现基于时域的端到端单通道语音增强。TU-net网络模型采用U-net网络的编解码层对带噪语音信号进行多尺度特征融合,并利用多头注意力机制实现双路径Transformer,用于计算语音掩模,更好地建模长时相关性。该模型在时域、时频域和感知域计算损失函数,并通过加权组合损失函数指导训练。仿真实验结果表明,TU-net在低信噪比和突发背景噪声条件下增强语音信号的语音质量感知评估(PESQ)、短时客观可懂度(STOI)和信噪比增益等多个评价指标都优于同类的单通道增强网络模型,且保持相对较少的网络模型参数。相似文献

9.

Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise

Rhebergen KS Versfeld NJ Dreschler WA 《The Journal of the Acoustical Society of America》2006,120(6):3988-3997

The extension to the speech intelligibility index (SII; ANSI S3.5-1997 (1997)) proposed by Rhebergen and Versfeld [Rhebergen, K.S., and Versfeld, N.J. (2005). J. Acoust. Soc. Am. 117(4), 2181-2192] is able to predict for normal-hearing listeners the speech intelligibility in both stationary and fluctuating noise maskers with reasonable accuracy. The extended SII model was validated with speech reception threshold (SRT) data from the literature. However, further validation is required and the present paper describes SRT experiments with nonstationary noise conditions that are critical to the extended model. From these data, it can be concluded that the extended SII model is able to predict the SRTs for the majority of conditions, but that predictions are better when the extended SII model includes a function to account for forward masking. 相似文献

10.

联合精确比值掩蔽与深度神经网络的单通道语音增强方法

下载免费PDF全文

柏浩钧张天骐刘鉴兴叶绍鹏《声学学报》2022,47(3):394-404

针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。相似文献

11.

人耳听觉相关代价函数深度学习单通道语声增强算法*

下载免费PDF全文

程琳娟彭任华郑成诗李晓东《应用声学》2022,41(4):654-666

均方误差（Mean-Square Error,MSE）函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,MSE误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,本文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与MSE代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。相似文献

12.

在波形网络中融合相位信息的骨导语音增强 总被引：3，自引：0，他引：3

下载免费PDF全文

郑昌艳杨吉斌张雄伟孙蒙《声学学报》2021,46(2):309-320

已有骨导语音增强算法重点关注语音幅度谱增强,在波形合成时会因为相位不匹配导致语音质量下降。为解决该问题,提出了一种融合相位信息的波形网络(WaveNet)模型实现骨导语音增强波形生成。该方法以频带扩展WaveNet为基础,融合骨导语音相位谱信息与增强的语音幅度谱作为模型的条件特征,根据融合特征生成增强语音波形,实现了相位信息的有效利用。仿真实验综合对比了群时延谱和瞬时频率偏差谱相位特征,主客观结果表明,不论是采用串联融合还是卷积融合方式,骨导语音相位信息均有效补充了原有幅度谱条件特征,改善了语音增强效果。利用串联方式融合群时延谱特征可得到最佳结果,相比于原始骨导语音,平均意见得分(MOS)提升了约54.3%。相似文献

13.

语音通信降噪研究

下载免费PDF全文

田玉静左红伟王超《应用声学》2020,39(6):932-939

语音通信系统中，语音通过信道传输将不可避免地引入码间串扰和信号畸变，同时受到噪声污染。本文在分析自适应盲均衡算法CMA(constant modulus algorithm）和改进盲均衡算法的基础上，考虑到自适应盲均衡技术在语音噪声控制方面能力有限，将自适应盲均衡技术与小波包掩蔽阈值降噪算法联合使用，形成一种基带语音增强新方法。仿真试验结果显示自适应盲均衡技术可以使星座图变得清晰而紧凑，有效减小误码率。研究证实该方法在语音信号ISI和畸变严重情况下，在白噪及有色噪声不同的噪声环境中都具有稳定的降噪能力，消噪同时可获得汉语普通话良好的听觉效果。相似文献

14.

基于字典学习和稀疏表示的单通道语音增强算法综述* 总被引：1，自引：0，他引：1

下载免费PDF全文

叶中付朱媛媛贾翔宇《应用声学》2019,38(4):645-652

如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。相似文献

15.

结合注意力机制的改进U-Net网络在端到端语音增强中的应用

下载免费PDF全文

武瑞沁陈雪勤俞杰王丽荣赵鹤鸣《声学学报》2022,47(2):266-275

设计了一个适用于端到端语音增强的改进的U-Net (Attention Dilated Convolution U-Net,ADC-U-Net)网络模型.与基线U-Net网络相比,一方面通过加入空洞卷积减小由采样带来的信息损失;另一方面引入了注意力机制结构,结合了含噪语音更多的上下文信息,提取更深层次和更丰富的特征信息... 相似文献

16.

波达方向初始化空间混合概率模型的语音增强

下载免费PDF全文

石倩陈航艇张鹏远《声学学报》2022,47(1):139-150

提出了波达方向初始化空间混合概率模型的语音增强算法.通过声源定位估计出声源波达方向,再根据此计算相对传递函数,进而构造空间协方差矩阵来初始化空间混合概率模型.论证了相对传递函数在作为模型参数中语音协方差矩阵的主特征向量时,空间混合概率模型对应的概率分布可达到最大值,进而使期望最大化算法在迭代时更易收敛,以得到期望的掩蔽... 相似文献

17.

Selection of meaningless steady noise for masking of speech

Tetsuro Saeki Takahiro Tamesue Shizuma Yamaguchi Kazuya Sunada 《Applied Acoustics》2004,65(2):203-210

This paper focuses on masking speech with meaningless steady noise as a way of realizing a comfortable sound environment. As a basis for research, meaningless steady noise at minimum sound pressure levels for masking of male or female meaningful speech is considered, based on psychological experiments using a method of adjustment. From the results, band-limited pink noise can be selected as the most effective noise for masking of speech. In the case of speech with a lower sound pressure level, the sound pressure level of the meaningless steady noise needs to be a little higher. 相似文献

18.

采用性别相关的深度神经网络及非负矩阵分解模型用于单通道语音增强 总被引：3，自引：0，他引：3

下载免费PDF全文

李煦王子腾王晓飞付强颜永红《声学学报》2019,44(2):221-230

为了从带噪信号中得到纯净的语音信号,提出了一种采用性别相关模型的单通道语音增强算法。具体而言,在训练阶段,分别训练了与性别相关的深度神经网络-非负矩阵分解模型用于估计非负矩阵分解中的权重参数;在测试阶段,提出了一种基于非负矩阵分解和组稀疏惩罚的算法用于判断测试语音中说话人的性别信息,然后再采用对应的模型估计权重,并结合已训练好的字典进行语音增强。实验结果表明所提算法在噪声抑制量及语音质量上,均优于一些基于非负矩阵分解的算法和基于深度神经网络的算法。相似文献

19.

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

Jørgensen S Dau T 《The Journal of the Acoustical Society of America》2011,130(3):1475-1487

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. 相似文献

20.

Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities

Seongjae Lee David K. Han Hanseok Ko 《Applied Acoustics》2017

In this paper, a novel single microphone channel-based speech enhancement technique is presented. While most of the conventional nonnegative matrix factorization-based approaches focus on generating a basis matrix of speech and noise for enhancement, the proposed algorithm performs an additional process to reconstruct speech from noisy speech when these two elements are highly overlapped in selected spectral bands. This process involves a log-spectral amplitude based estimator, which provides the spectrotemporal speech presence probability to obtain a more accurate reconstruction. Moreover, the proposed algorithm applies an unsupervised learning method to the input noise, so it is adaptable to any type of environmental noise without a pre-trained dictionary. The experimental results demonstrate that the proposed algorithm obtains improved speech enhancement performance compared with conventional single channel-based approaches. 相似文献