期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王玥李平崔杰《声学学报》2013,38(4):501-508

为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。相似文献

2.

人耳听觉相关代价函数深度学习单通道语声增强算法

下载免费PDF全文

程琳娟彭任华郑成诗李晓东《应用声学》2022,41(4):654-666

均方误差函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,均方误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,该文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与均方误差代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。相似文献

3.

基于修正Mel域掩蔽模型和无语音概率的耳语音增强 总被引：1，自引：0，他引：1

陶智赵鹤鸣吴迪陈大庆张晓俊《声学学报》2009,34(4):370-377

提出了一种基于修正Mel域听觉掩蔽模型和无语音概率的耳语音增强方法。该方法根据耳语音的发音特点对Mel频率进行修正,对每一帧耳语音信号进行Mel域频带滤波,同时通过无语音概率(SAP)动态地确定每个频带的听觉掩蔽阈值,对不同的听觉掩蔽阈值自适应地调整谱减系数来进行耳语音增强。对增强后的耳语音进行客观和主观测试,结果表明,该方法与其它谱减法相比,能将残留噪声和背景噪声控制在人耳掩蔽阈值下,取得更小的语音失真,主观听觉也得到了很大的改善。相似文献

4.

Dual-channel spectral subtraction algorithms based speech enhancement dedicated to a bilateral cochlear implant

Fathi Kallel Mondher Frikha Mohamed Ghorbel Ahmed Ben Hamida Christian Berger-Vachon 《Applied Acoustics》2012,73(1):12-20

In this paper, two speech enhancement algorithms (SEAs) based on spectral subtraction (SS) principle have been evaluated for bilateral cochlear implant (BCI) users. Specifically, dual-channel noise power spectral estimation algorithm using power spectral densities (PSD) and cross power spectral density (CPSD) of the observed signals was studied. The enhanced speech signals were obtained using either Dual Channel Non Linear Spectral Subtraction ‘DC-NLSS’ or Dual-Channel Multi-Band Spectral Subtraction ‘DC-MBSS’ algorithms. For performance evaluation, some objective speech assessment tests relying on Perceptual Evaluation of Speech Quality (PESQ) score and speech Itakura-Saito (IS) distortion measurement were performed to fix the optimal number of frequency band needed in DC-MBSS algorithm. In order to evaluate the speech intelligibility, subjective listening tests were assessed with 50 normal hearing listeners using a specific BCI simulator and with three deafened BCI patients. Experimental results, obtained using French Lafon database corrupted by an additive babble noise at different Signal-to-Noise Ratios (SNR), showed that DC-MBSS algorithm improves speech understanding better than DC-NLSS algorithm for single and multiple interfering noise sources. 相似文献

5.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

6.

基于ERB尺度划分的多子带语声信号抗噪谱减算法

周挺挺曾毓敏王蓉蓉卞乐《应用声学》2017,36(3):212-219

为了研究心理声学在语声增强方面的应用,本文提出了一种基于等效矩阵带宽(ERB)尺度划分的多子带语声信号抗噪谱减算法。此算法根据ERB尺度将带噪信号的频谱划分成多个子带,然后再根据每个子带的分段信噪比以及心理声学掩蔽原则分别计算每个子带的谱减参数,最后在每个子带中分别进行谱减算法处理。实验结果表明,应用新算法所获得的语声增强结果在信噪比、IS失真以及PESQ方面均优于之前提出的多子带语声信号抗噪谱减算法。相似文献

7.

稀疏低秩噪声模型下无监督实时单通道语音增强算法

下载免费PDF全文

李轶南张雄伟贾冲陈亮曾理《声学学报》2015,40(4):607-614

针对现有基于字典学习的增强算法需要先验信息、不易实时处理的问题,提出一种便于实时处理的无监督的单通道语音增强算法。首先,该算法将无监督条件下背景噪声的建模问题转化为带噪语音幅度谱的稀疏低秩噪声分解;然后,采用增量非负子空间方法对背景噪声进行在线字典学习,获得能够体现背景噪声时变特性的自适应噪声字典;最后,利用所得的噪声字典,采用易于实时处理的逐帧迭代方式,对带噪语音进行处理。实验结果表明:相较于多带谱减法和基于低秩稀疏矩阵分解的增强算法,所提算法在噪声抑制方面的性能尤为显著,在多项性能评价指标上,均表现出更好的结果。相似文献

8.

Speech dereverberation method based on spectral subtraction and spectral line enhancement

Zhe Chen Rui WangFuliang Yin Bingqian WangWenwen Peng 《Applied Acoustics》2016

Speech signals recorded with a distant microphone usually are interfered by the spatial reverberation in the room, which severely degrades the clarity and intelligibility of speech. A speech dereverberation method based on spectral subtraction and spectral line enhancement is proposed in this paper. Following the generalized statistical reverberation model, the power spectrum of late reverberation is estimated and removed from the reverberation speech by the spectral subtraction method. Then, according to the human auditory model, a spectral line enhancement technique based on adaptive post-filtering is adopted to further eliminate the reverberant components between adjacent speech formants. The proposed method can effectively suppress the spatial reverberation and improve the auditory perception of speech. The subjective and objective evaluation results reveal that the perceptual quality of speech is greatly improved by the proposed method. 相似文献

9.

A method of whispered speech enhancement based on speech absence probability and modified mel-domain masking model

TAO Zhi ZHAO Heming WU Di CHEN Daqing ZHANG Xiaojun 《声学学报：英文版》2011,30(3):345-357

Whispered speech enhancement using auditory masking model in modified Meldomain and Speech Absence Probability(SAP)was proposed.In light of the phonation characteristic of whisper,we modify the Mel-frequency Scaling model.Whispered speech is filtered by the proposed model.Meanwhile,the value of masking threshold for each frequency band is dynamically determined by speech absence probability.Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values.Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods;the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value. 相似文献

10.

Noise reduction of speech signals using time-varying and multi-band adaptive gain control for smart digital hearing protectors

Narimene Lezzoum Ghyslain GagnonJérémie Voix 《Applied Acoustics》2016

In this paper, a single-channel speech enhancement algorithm based on non-linear and multi-band Adaptive Gain Control (AGC) is proposed. The algorithm requires neither Signal-to-Noise Ratio (SNR) nor noise parameters estimation. It reduces the background noise in the temporal domain rather than the spectral domain using a non-linear and automatically adjustable gain function for multi-band AGC. The gain function varies in time and is deduced from the temporal envelope of each frequency band to highly compress the frequency regions where noise is present and lightly compress the frequency regions where speech is present. Objective evaluation using the PESQ (Perceptual Evaluation of Speech Quality) metric shows that the proposed algorithm performs better than three benchmarks, namely: the spectral subtraction, the Wiener filter based on a priori SNR estimation and a band-pass modulation filtering algorithm. In addition, blind subjective tests show that the proposed algorithm introduces less musical noise compared to the benchmark algorithms and was preferred 78.8% of the time in terms of signal quality. The proposed algorithm is implemented in a miniature low power digital signal processor to validate its feasibility and complexity for smart hearing protection in noisy environments. 相似文献

11.

时频结合自适应阈值小波包消噪算法

下载免费PDF全文

田玉静董玉民左红伟《应用声学》2010,29(4):256-262

在充分考虑人耳听觉特性和噪声统计特性的基础上,提出一种时频结合Bark尺度自适应阈值的语音消噪算法,在Bark频域上自适应调整增强系数可以较准确地进行阈值判定。仿真实验验证,时频结合算法在低信噪比输入情况下较传统语音降噪方法具有明显优势,其在消除高斯白噪声的同时有效降低了语音损失,可获得最大信噪比,谱失真测度最小,增强语音的MOS(Mean Opinion Score)评分明显提高,具有较好的听觉效果。相似文献

12.

Application of auditory masking in improved multiband excitation model

Soo Ngee KohGek Huey Chua 《Applied Acoustics》2002,63(6):693-698

Modifications to improve the performance of the improved multiband excitation (IMBE) model in coding narrow-band speech are presented in this paper. The first modification is based on the phenomenon of auditory masking which helps the model to focus on the perceptual aspect of the coder by eliminating components that are inaudible to human ears during the analysis process of the IMBE model. In addition, a spectral amplitude enhancement stage is added. Preliminary results indicate that the proposed modifications improve the objective and subjective performances of the IMBE model in coding narrow-band speech at 4.15 kbps. 相似文献

13.

语音通信降噪研究

下载免费PDF全文

田玉静左红伟王超《应用声学》2020,39(6):932-939

语音通信系统中,语音通过信道传输将不可避免地引入码间串扰和信号畸变,同时受到噪声污染。本文在分析自适应盲均衡算法CMA(constant modulus algorithm）和改进盲均衡算法的基础上,考虑到自适应盲均衡技术在语音噪声控制方面能力有限,将自适应盲均衡技术与小波包掩蔽阈值降噪算法联合使用,形成一种基带语音增强新方法。仿真试验结果显示自适应盲均衡技术可以使星座图变得清晰而紧凑,有效减小误码率。研究证实该方法在语音信号ISI和畸变严重情况下,在白噪及有色噪声不同的噪声环境中都具有稳定的降噪能力,消噪同时可获得汉语普通话良好的听觉效果。相似文献

14.

The effects of hearing loss on the contribution of high- and low-frequency speech information to speech understanding

Hornsby BW Ricketts TA 《The Journal of the Acoustical Society of America》2003,113(3):1706-1717

The speech understanding of persons with "flat" hearing loss (HI) was compared to a normal-hearing (NH) control group to examine how hearing loss affects the contribution of speech information in various frequency regions. Speech understanding in noise was assessed at multiple low- and high-pass filter cutoff frequencies. Noise levels were chosen to ensure that the noise, rather than quiet thresholds, determined audibility. The performance of HI subjects was compared to a NH group listening at the same signal-to-noise ratio and a comparable presentation level. Although absolute speech scores for the HI group were reduced, performance improvements as the speech and noise bandwidth increased were comparable between groups. These data suggest that the presence of hearing loss results in a uniform, rather than frequency-specific, deficit in the contribution of speech information. Measures of auditory thresholds in noise and speech intelligibility index (SII) calculations were also performed. These data suggest that differences in performance between the HI and NH groups are due primarily to audibility differences between groups. Measures of auditory thresholds in noise showed the "effective masking spectrum" of the noise was greater for the HI than the NH subjects. 相似文献

15.

Perceptually motivated wavelet packet transform for bioacoustic signal enhancement

Ren Y Johnson MT Tao J 《The Journal of the Acoustical Society of America》2008,124(1):316-327

A significant and often unavoidable problem in bioacoustic signal processing is the presence of background noise due to an adverse recording environment. This paper proposes a new bioacoustic signal enhancement technique which can be used on a wide range of species. The technique is based on a perceptually scaled wavelet packet decomposition using a species-specific Greenwood scale function. Spectral estimation techniques, similar to those used for human speech enhancement, are used for estimation of clean signal wavelet coefficients under an additive noise model. The new approach is compared to several other techniques, including basic bandpass filtering as well as classical speech enhancement methods such as spectral subtraction, Wiener filtering, and Ephraim-Malah filtering. Vocalizations recorded from several species are used for evaluation, including the ortolan bunting (Emberiza hortulana), rhesus monkey (Macaca mulatta), and humpback whale (Megaptera novaeanglia), with both additive white Gaussian noise and environment recording noise added across a range of signal-to-noise ratios (SNRs). Results, measured by both SNR and segmental SNR of the enhanced wave forms, indicate that the proposed method outperforms other approaches for a wide range of noise conditions. 相似文献

16.

Noise estimation based on time–frequency correlation for speech enhancement

Wenhao Yuan Jiajun Lin Wei An Yu Wang Ning Chen 《Applied Acoustics》2013,74(5):770-781

As a fundamental part of speech enhancement, noise estimation is particularly challenging in highly non-stationary noise environments. In this work, we propose an effective algorithm on the basis of the “Improved Minima Controlled Recursive Averaging (IMCRA)” with the objective to improve the performance of noise estimation. The main contributions of this work are: (i) in the algorithm, a rough decision about speech presence is proposed by calculating the autocorrelation and cross-channel correlation of the T–F (Time–Frequency) units; (ii) with this decision, we refine the smoothing parameters for the smoothing of noisy power spectrum and the recursive averaging in noise spectrum estimation as well as the weighting factor for the a priori SNR (Signal to Noise Ratio) estimation in the IMCRA; (iii) we improve the search of local minima during spectral bursts by adding a minimum search with a shorter window. Extensive experiments are carried out to evaluate the performance of our proposed algorithm. The experimental results illustrate that, compared with the IMCRA, the proposed approach significantly improves the accuracy of noise spectrum estimation and the quality of enhanced speech in the typical noise situations. 相似文献

17.

Auditory masking of a 10 kHz tone with environmental, comodulated, and Gaussian noise in bottlenose dolphins (Tursiops truncatus)

Trickey JS Branstetter BK Finneran JJ 《The Journal of the Acoustical Society of America》2010,128(6):3799-3804

The pattern of auditory masking derived from Gaussian noise is often cited and used to predict the detrimental effects of masking noise on marine mammals. However, environmental noise (both anthropogenic and natural) may not always be Gaussian distributed. Some noise sources are highly structured with complex amplitude fluctuations that extend across frequency regions, which are often termed comodulated noise. Recent evidence with bottlenose dolphins using comodulated noise demonstrated a significant release from masking compared to Gaussian maskers of the same bandwidth and pressure spectral density level, a result known as comodulation masking release. The present study demonstrates a pattern of masking where both temporally fluctuating comodulated noise and environmental noise produce lower masked thresholds compared to Gaussian noise of the same spectral density level and bandwidth. Furthermore, a threshold reduction or "masking release" occurred when the environmental noise bandwidth increased beyond a critical band. These results provide further evidence that conventional models of auditory masking using Gaussian maskers (i.e., the power spectrum model) do not fully describe the masking effects that occur in realistic environments. 相似文献

18.

Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing

Jørgensen S Dau T 《The Journal of the Acoustical Society of America》2011,130(3):1475-1487

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. 相似文献

19.

Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities

Seongjae Lee David K. Han Hanseok Ko 《Applied Acoustics》2017

In this paper, a novel single microphone channel-based speech enhancement technique is presented. While most of the conventional nonnegative matrix factorization-based approaches focus on generating a basis matrix of speech and noise for enhancement, the proposed algorithm performs an additional process to reconstruct speech from noisy speech when these two elements are highly overlapped in selected spectral bands. This process involves a log-spectral amplitude based estimator, which provides the spectrotemporal speech presence probability to obtain a more accurate reconstruction. Moreover, the proposed algorithm applies an unsupervised learning method to the input noise, so it is adaptable to any type of environmental noise without a pre-trained dictionary. The experimental results demonstrate that the proposed algorithm obtains improved speech enhancement performance compared with conventional single channel-based approaches. 相似文献

20.

联合精确比值掩蔽与深度神经网络的单通道语音增强方法

下载免费PDF全文

柏浩钧张天骐刘鉴兴叶绍鹏《声学学报》2022,47(3):394-404

针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。相似文献