首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 250 毫秒
1.
联合深度神经网络和凸优化的单通道语音增强算法   总被引:1,自引:1,他引:0       下载免费PDF全文
噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。   相似文献   

2.
时文华  张雄伟  邹霞  孙蒙  李莉 《声学学报》2020,45(3):299-307
提出了一种联合深度编解码神经网络和时频掩蔽估计的语音增强方法。该方法利用深度编解码网络估计时频掩蔽表示,并联合带噪语音的幅度谱学习带噪语音与纯净语音幅度谱之间的非线性映射关系。深度编解码网络采用卷积-反卷积网络结构。在编码端,利用卷积网络的局部感知特性,对带噪语音的时频域结构特征进行建模,提取语音特征,同时抑制背景噪声。在解码端,利用编码端提取到的语音特征逐层恢复局部细节信息并重构语音信号。同时,在编解码端对应层之间引入跳跃连接,以减少由于池化和全连接操作导致的低层细节信息丢失的问题。在TIMIT语音库和不完全匹配噪声集下进行仿真实验,实验结果表明,该方法可以有效抑制噪声,且能较好地恢复出语音细节成分。   相似文献   

3.
基于多窗谱的心理声学语音增强   总被引:5,自引:2,他引:5  
吴红卫  吴镇扬  赵力 《声学学报》2007,32(3):275-281
与传统的周期谱图相比,多窗谱具有更小的估计方差。从含噪语音的多窗谱对噪声及噪声与含噪语音之比(NNSR)进行估计,用基于NNSR的幅度谱减实现用于计算人耳掩蔽阈值的预增强语音,用集成了人耳掩蔽阈值的心理声学加权规则实现最终的增强语音。考虑到多窗谱的特点对掩蔽偏移量进行了修正,修正后的重建语音,其客观测量指标修正巴克谱测度比修正前有一定的改进。再对心理声学加权规则作最大值小于1的限制,则输入信噪比越大(0 dB以上),分段信噪比和总体信噪比提高得越多。非正式试听表明重建语音失真较小,背景噪声大大降低,且没有音乐噪声。  相似文献   

4.
张天骐  熊梅  张婷  杨强 《声学学报》2019,44(3):393-400
针对音乐信号中的歌声与伴奏相互关联难以分离的问题,提出了一种区分性训练深度神经网络(Deep Neural Network,DNN)的音乐分离方法。首先,在DNN模型的基础上同时考虑歌声与伴奏间的重建误差和区分性信息,提出了一种改进的目标函数进行区分性训练;其次,在DNN模型上额外添加一层,引入时频掩蔽对估计出的歌声伴奏进行联合优化,相应的时域信号由傅里叶逆变换获得;最后,验证不同参数设置对分离性能的影响,并与现有的音乐分离方法进行对比.实验结果表明,改进的目标函数和时频掩蔽的引入明显提高了DNN的分离性能,且与现有的音乐分离方法相比分离性能最高提高了4 dB从而证实所提方法是一种有效的音乐分离方法。   相似文献   

5.
结合幅度谱和功率谱字典的语音增强方法   总被引:1,自引:0,他引:1       下载免费PDF全文
从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法。在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表。对分别由幅度谱和功率谱恢复而来的两路信号进行自适应加权重构,结合相位补偿函数得到增强后的语音信号。实验结果表明,该方法在平稳、非平稳噪声环境下相比于单一谱特征的语音增强方法平均提高31.6%,改善了语音增强方法的性能。   相似文献   

6.
李哲军  周萍  景新幸 《应用声学》2016,24(4):155-157, 162
针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪;实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。  相似文献   

7.
针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪。实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。  相似文献   

8.
为了给双耳听力设备佩戴者带来更好的语音可懂度,提出了一种利用双耳时间差与声级差的近场语音增强算法,该方法首先利用这两种差异来估计语音的功率谱和语音的相干函数,然后计算干扰噪声在左右耳间的头相关传输函数的比值,最后构造两个维纳滤波器。客观评价的参数显示该算法去噪效果优于对比算法而目标语音的时间差误差和声级差误差低于对比算法。主观的言语接受阈测试表明该方法能有效提高语音可懂度。结果表明,该算法在能够有效去除干扰噪声的同时,保留了目标语音的空间信息。   相似文献   

9.
一种对加性噪声和信道函数联合补偿的模型估计方法   总被引:1,自引:0,他引:1  
语音识别系统在面对实际环境中多变的加性噪声和信道差异的影响时性能急剧下降,抑制这些噪声和差异所造成的性能下降具有重要意义.作者提出了一种模型补偿算法,使用句子中的非语音段估计加性噪声,然后利用EM算法估计信道函数,从而在倒谱域上对失配的声学模型进行联合补偿.实验表明,相比基线系统,采用该算法的系统的平均性能相对提升幅度超过50%.算法可以动态跟踪环境的变化,性能表现优于一些传统的语音识别稳健性处理算法.  相似文献   

10.
针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。   相似文献   

11.
For the difficulty of separation between singing and accompaniment in the musical signals,an improved music separation method of based on discriminative training depth neural network(DNN) was proposed.Firstly,based on the DNN model,considering the reconstruction errors and discrimination information between singing and accompaniment,an improved objective function was presented to discriminate the training;Then,an additional layer was added to DNN model,introducing the time-frequency masking to optimize the estimated accompaniment of the song,and the corresponding time-domain signal was obtained by inverse Fourier transform;Finally,the influence of different parameters on the separation performance was verified,and compared it with the existing music separation methods.The experimental results showed that the improved objective function and the introduction of time-frequency masking significantly improved the separation performance of the DNN,and the separation performance was improved about 4 dB compared with other existing music separation methods,thus verifying that the proposed method was an effective music separation algorithm.  相似文献   

12.
王玥  李平  崔杰 《声学学报》2013,38(4):501-508
为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。   相似文献   

13.
In this paper, a novel single microphone channel-based speech enhancement technique is presented. While most of the conventional nonnegative matrix factorization-based approaches focus on generating a basis matrix of speech and noise for enhancement, the proposed algorithm performs an additional process to reconstruct speech from noisy speech when these two elements are highly overlapped in selected spectral bands. This process involves a log-spectral amplitude based estimator, which provides the spectrotemporal speech presence probability to obtain a more accurate reconstruction. Moreover, the proposed algorithm applies an unsupervised learning method to the input noise, so it is adaptable to any type of environmental noise without a pre-trained dictionary. The experimental results demonstrate that the proposed algorithm obtains improved speech enhancement performance compared with conventional single channel-based approaches.  相似文献   

14.
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.  相似文献   

15.
In the many studies done on informational masking, interfering speech reduces speech intelligibility. This effect is often used to secure privacy in public spaces. These applications require estimates of how much masking is required. In general, masking effects are estimated by using spectrum information as excitation patterns. However, estimates of informational masking can hardly be obtained by only using spectrum information. Therefore, we estimated the effects of informational masking using time-domain information. Then, we calculated the cepstra of the envelopes’ magnitude histograms. If these cepstra are different between the target and the masker, the signals are not similar in the time-domain. Furthermore, the effect of informational masking would be low. Therefore, we considered the histograms’ cepstra distances (HCD) to estimate signal similarities. The signal similarities in our first experiment were estimated using five maskers by utilizing the HCD. These maskers were random noise, music, female speech, male speech, and target speaker’s speech. Male and female speech were more similar to the target speech than music and noise. Also, the same speaker’s speech was the most similar in the set of maskers. A listening test was carried out in the second experiment to verify the HCD. A double masker was used in this experiment as an effective informational masker. It has similar characteristics to reversal speech. The listening test results suggest the double-masker’s masking effects has the same relation with HCD. This suggests informational masking can be estimated by signal similarity using the HCD.  相似文献   

16.
均方误差(Mean-Square Error,MSE)函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,MSE误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,本文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与MSE代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号