首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 532 毫秒
1.
周健  郑文明  王青云  赵力 《声学学报》2014,39(4):501-508
提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。   相似文献   

2.
卞金洪  吴瑞琦  周锋  赵力 《应用声学》2023,42(2):269-275
基于深度神经网络的方法已经在语声增强领域得到了广泛的应用,然而若想取得理想的性能,一般需要规模较大且复杂度较高的模型。因此,在计算资源有限的设备或对延时要求高的环境下容易出现部署困难的问题。为了解决此问题,提出了一种基于深度复卷积递归网络的师生学习语声增强方法。在师生深度复卷积递归网络模型结构中间的复长短时记忆递归模块提取实部和虚部特征流,并分别计算帧级师生距离损失以进行知识转移。同时使用多分辨率频谱损失以进一步提升低复杂度学生模型的性能。实验在公开数据集Voice Bank Demand和DNS Challenge上进行,结果显示所提方法相对于基线学生模型在各项指标上均有明显提升。  相似文献   

3.
为了从带噪信号中得到纯净的语音信号,提出了一种采用性别相关模型的单通道语音增强算法。具体而言,在训练阶段,分别训练了与性别相关的深度神经网络-非负矩阵分解模型用于估计非负矩阵分解中的权重参数;在测试阶段,提出了一种基于非负矩阵分解和组稀疏惩罚的算法用于判断测试语音中说话人的性别信息,然后再采用对应的模型估计权重,并结合已训练好的字典进行语音增强。实验结果表明所提算法在噪声抑制量及语音质量上,均优于一些基于非负矩阵分解的算法和基于深度神经网络的算法。   相似文献   

4.
室内混响会严重降低语声质量,因此在室内语声通信中对混响的抑制显得尤为重要.针对无线声传感网,该文提出一种基于加权预测误差的分布式自适应去混响算法.通过调整传统递归最小二乘算法,所提出的分布式加权预测误差算法仅需利用一路相同的参考信号和其他节点的本地输出而非全部信号,便可实现最优输出,从而大幅度降低节点间传输的通道数与各...  相似文献   

5.
提升小波加权自相关函数的基音检测算法*   总被引:1,自引:0,他引:1       下载免费PDF全文
王晨  章小兵  刘美娟 《应用声学》2018,37(2):201-207
随着计算机技术的发展,语音信号处理作为人机交互的重要渠道,其在复杂噪声环境下的特征值检测算法直接关系到计算机的运算效率。基音周期是语音特征值提取的重要参数之一。针对传统基音检测算法在噪声环境下检测精度低的问题,提出了一种基于自适应提升小波变换加权线性预测误差自相关函数的基音检测算法。该方法用多级提升小波近似系数加权求和的方法来弥补自相关函数随着时间延迟量的增加幅值衰减的缺陷;用线性预测误差自相关函数的方法来抑制共振峰的干扰,然后将两种方法结合来突出基音周期处的峰值。实验结果表明,与传统的自相关函数法和小波加权法相比,该方法能有效减弱共振峰的影响,突出基音周期处的峰值,提高基音周期检测精度,鲁棒性更好。  相似文献   

6.
基于神经网络的语音谱失真测度研究   总被引:1,自引:1,他引:1  
提出了基于神经网络的语音谱失真测度概念。利用前向神经网络,包括多层感知器和径向基函数网络,对多维非线性函数的逼近原理,使得谱失真测度函数具备了表现人耳听觉系统的主观感知行为的能力。结合语音质量客观评价应用,我们以在大量的失真条件下得到的主观评价结果作为期望值对该网络进行训练。统计相关分析表明,基于神经网络谱失真测度的客观评价方法的主客观评价的相关性,较之传统欧氏距离以及加权欧氏距离都有了显著的提高,并具有更高的鲁棒性.该方法还具有技术独立性.  相似文献   

7.
张军峰  胡寿松 《物理学报》2007,56(2):713-719
运用两阶段学习方法构建径向基函数(RBF)神经网络模型预测混沌时间序列.在利用非监督学习算法确定网络隐层中心时,提出了一种基于高斯基的距离度量,并联合输入输出聚类的策略.基于Fisher可分离率设计高斯基距离度量中的惩罚因子,可以提高聚类的性能.而输入输出聚类策略的引入,建立了聚类性能与网络预测性能之间的联系.因此,根据本文方法构建的网络模型,一方面可以加快网络训练的速度,另一方面可以提高预测性能.将该方法对Mackey-Glass, Lorenz和Logistic混沌时间序列进行了预测仿真研究,仿真结果表明了该方法的有效性. 关键词: 混沌时间序列 预测 径向基神经网络 聚类  相似文献   

8.
在低信噪比和突发背景噪声条件下,已有的深度学习网络模型在单通道语音增强方面效果并不理想,而人类可以利用语音的长时相关性对不同的语音信号形成综合感知。因此刻画语音的长时依赖关系有助于改进低信噪比和突发背景噪声下的增强性能。受该特性的启发,提出一种融合多头注意力机制和U-net深度网络的增强模型TU-net,实现基于时域的端到端单通道语音增强。TU-net网络模型采用U-net网络的编解码层对带噪语音信号进行多尺度特征融合,并利用多头注意力机制实现双路径Transformer,用于计算语音掩模,更好地建模长时相关性。该模型在时域、时频域和感知域计算损失函数,并通过加权组合损失函数指导训练。仿真实验结果表明,TU-net在低信噪比和突发背景噪声条件下增强语音信号的语音质量感知评估(PESQ)、短时客观可懂度(STOI)和信噪比增益等多个评价指标都优于同类的单通道增强网络模型,且保持相对较少的网络模型参数。  相似文献   

9.
赵如歌  冯鹏  罗燕  张颂  何鹏  刘亚楠 《光学学报》2023,(20):314-323
X射线荧光CT(XFCT)是X射线CT与X射线荧光分析相结合的新型成像方式,可用于探测被修饰后的纳米金颗粒在肿瘤内部的分布及质量分数,在早期癌症诊疗方面具有较好的应用潜力。如何抑制XFCT成像的康普顿散射噪声是当前的热点问题。本文基于深度学习方法,通过卷积神经网络学习图像中的噪声分布规律,从而抑制噪声。基于此,提出了一种基于噪声水平估计和卷积神经网络的XFCT去噪网络(NeCNN)算法,该算法运用噪声估计子网络及去噪主网络进行去噪。估计子网络通过去噪卷积神经网络(DnCNN)估计噪声水平并初步降噪,随后将估计结果输入去噪主网络——全卷积神经网络(FCN)用于学习康普顿散射的分布规律,同时为兼顾局部与全局最优解采用均方误差(MSE)及结构相似度(SSIM)作为损失函数。数据集通过Geant4软件模拟扫描填充各种金属纳米颗粒(Au、Bi、Ru、Gd)的空气模体及聚甲基丙烯酸甲酯(PMMA)模体来获取,且设置不同入射X射线的强度,以此模拟不同噪声水平,增强模型泛化能力。实验结果表明,与三维块匹配滤波(BM3D)及DnCNN算法相比,NeCNN算法的去噪结果最优,其SSIM为0.95066,峰...  相似文献   

10.
基于图像信息熵的ptychography轴向距离误差校正   总被引:1,自引:0,他引:1       下载免费PDF全文
窦健泰  高志山  马骏  袁操今  杨忠明 《物理学报》2017,66(16):164203-164203
在轴向距离参与运算的ptychography算法中,轴向距离误差会使重建图像变模糊并降低图像的分辨率.本文基于菲涅耳衍射理论建立了轴向距离误差模型,根据不同轴向距离误差对重构图像清晰度的影响,提出用图像信息熵确定图像最清晰时的轴向距离,并重建出清晰的ptychography图像.比较了图像能量变化、Tamura系数和图像信息熵这三种图像清晰度评价函数在轴向距离误差校正过程中的分布情况,发现它们均具有单峰性,且峰值确定的轴向距离相同.图像信息熵相比其他两种图像清晰度评价函数具有更高的灵敏性.仿真以及实验均证明了基于图像信息熵的ptychography轴向误差校正的可行性.  相似文献   

11.
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.  相似文献   

12.
Speech signals recorded with a distant microphone usually are interfered by the spatial reverberation in the room, which severely degrades the clarity and intelligibility of speech. A speech dereverberation method based on spectral subtraction and spectral line enhancement is proposed in this paper. Following the generalized statistical reverberation model, the power spectrum of late reverberation is estimated and removed from the reverberation speech by the spectral subtraction method. Then, according to the human auditory model, a spectral line enhancement technique based on adaptive post-filtering is adopted to further eliminate the reverberant components between adjacent speech formants. The proposed method can effectively suppress the spatial reverberation and improve the auditory perception of speech. The subjective and objective evaluation results reveal that the perceptual quality of speech is greatly improved by the proposed method.  相似文献   

13.
Although electrolarynx (EL) serves as an important method of phonation for the laryngectomees, the resulting speech is of poor intelligibility due to the presence of a steady background noise caused by the instrument, even worse in the case of additive noise. This paper investigates the problem of EL speech enhancement by taking into account the frequency-domain masking properties of the human auditory system. One approach is incorporating an auditory masking threshold (AMT) for parametric adaptation in a subtractive-type enhancement process. The other is the supplementary AMT (SAMT) algorithm, which applies a cross-correlation spectral subtraction (CCSS) approach as a post-processing scheme to enhancing EL speech dealt with the AMT method. The performance of these two algorithms was evaluated as compared to the power spectral subtraction (PSS) algorithm. The best performance of EL speech enhancement was associated with the SAMT algorithm, followed by the AMT algorithm and the PSS algorithm. Acoustic and perceptual analyses indicated that the AMT and SAMT algorithms achieved the better performances of noise reduction and the enhanced EL speech was more pleasant to human listeners as compared to the PSS algorithm.  相似文献   

14.
Abnormalities in the cochlear function usually cause broadening of the auditory filters which reduces the speech intelligibility. An attempt to apply a spectral enhancement algorithm has been undertaken to improve the identification of Polish vowels by subjects with cochlear-based hearing-impairment. The identification scores of natural (unprocessed) vowels and spectrally enhanced (processed) vowels has been measured for hearing-impaired subjects. It has been found that spectral enhancement improves vowel scores by about 10% for those subjects, however, a wide variation in individual performance among subjects has been observed. The overall vowels identification scores obtained were 85% for natural vowels and 96% for spectrally enhanced vowels.  相似文献   

15.
王玥  李平  崔杰 《声学学报》2013,38(4):501-508
为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。   相似文献   

16.
In this paper, a single-channel speech enhancement algorithm based on non-linear and multi-band Adaptive Gain Control (AGC) is proposed. The algorithm requires neither Signal-to-Noise Ratio (SNR) nor noise parameters estimation. It reduces the background noise in the temporal domain rather than the spectral domain using a non-linear and automatically adjustable gain function for multi-band AGC. The gain function varies in time and is deduced from the temporal envelope of each frequency band to highly compress the frequency regions where noise is present and lightly compress the frequency regions where speech is present. Objective evaluation using the PESQ (Perceptual Evaluation of Speech Quality) metric shows that the proposed algorithm performs better than three benchmarks, namely: the spectral subtraction, the Wiener filter based on a priori SNR estimation and a band-pass modulation filtering algorithm. In addition, blind subjective tests show that the proposed algorithm introduces less musical noise compared to the benchmark algorithms and was preferred 78.8% of the time in terms of signal quality. The proposed algorithm is implemented in a miniature low power digital signal processor to validate its feasibility and complexity for smart hearing protection in noisy environments.  相似文献   

17.

Background

Recent studies have shown that the human right-hemispheric auditory cortex is particularly sensitive to reduction in sound quality, with an increase in distortion resulting in an amplification of the auditory N1m response measured in the magnetoencephalography (MEG). Here, we examined whether this sensitivity is specific to the processing of acoustic properties of speech or whether it can be observed also in the processing of sounds with a simple spectral structure. We degraded speech stimuli (vowel /a/), complex non-speech stimuli (a composite of five sinusoidals), and sinusoidal tones by decreasing the amplitude resolution of the signal waveform. The amplitude resolution was impoverished by reducing the number of bits to represent the signal samples. Auditory evoked magnetic fields (AEFs) were measured in the left and right hemisphere of sixteen healthy subjects.

Results

We found that the AEF amplitudes increased significantly with stimulus distortion for all stimulus types, which indicates that the right-hemispheric N1m sensitivity is not related exclusively to degradation of acoustic properties of speech. In addition, the P1m and P2m responses were amplified with increasing distortion similarly in both hemispheres. The AEF latencies were not systematically affected by the distortion.

Conclusions

We propose that the increased activity of AEFs reflects cortical processing of acoustic properties common to both speech and non-speech stimuli. More specifically, the enhancement is most likely caused by spectral changes brought about by the decrease of amplitude resolution, in particular the introduction of periodic, signal-dependent distortion to the original sound. Converging evidence suggests that the observed AEF amplification could reflect cortical sensitivity to periodic sounds.  相似文献   

18.
Relations between perception of suprathreshold speech and auditory functions were examined in 24 hearing-impaired listeners and 12 normal-hearing listeners. The speech intelligibility index (SII) was used to account for audibility. The auditory functions included detection efficiency, temporal and spectral resolution, temporal and spectral integration, and discrimination of intensity, frequency, rhythm, and spectro-temporal shape. All auditory functions were measured at 1 kHz. Speech intelligibility was assessed with the speech-reception threshold (SRT) in quiet and in noise, and with the speech-reception bandwidth threshold (SRBT), previously developed for investigating speech perception in a limited frequency region around 1 kHz. The results showed that the elevated SRT in quiet could be explained on the basis of audibility. Audibility could only partly account for the elevated SRT values in noise and the deviant SRBT values, suggesting that suprathreshold deficits affected intelligibility in these conditions. SII predictions for the SRBT improved significantly by including the individually measured upward spread of masking in the SII model. Reduced spectral resolution, reduced temporal resolution, and reduced frequency discrimination appeared to be related to speech perception deficits. Loss of peripheral compression appeared to have the smallest effect on the intelligibility of suprathreshold speech.  相似文献   

19.
结合幅度谱和功率谱字典的语音增强方法   总被引:1,自引:0,他引:1       下载免费PDF全文
从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法.在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表....  相似文献   

20.
单通道语音增强算法对汉语语音可懂度影响的研究   总被引:1,自引:0,他引:1  
杨琳  张建平  颜永红 《声学学报》2010,35(2):248-253
考察了当前常用的几种单通道语音增强算法对汉语语音可懂度的影响。受不同类型噪音干扰的语音经过5种单通道语音增强算法的处理后,播放给具有正常听力水平的被试进行听辩,考察增强后语音的可懂度。实验结果表明,语音增强算法并不能改进语音的可懂度水平;通过分析具体的错误原因,发现听辩错误主要来自于音素错误,与声调关系不大;而且,同英文的辨识结果相比,一些增强算法对于中、英文可懂度影响差异显著。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号