期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

聂玲子陈雪勤赵鹤鸣《声学学报》2021,46(1):81-91

从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法。在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表。对分别由幅度谱和功率谱恢复而来的两路信号进行自适应加权重构,结合相位补偿函数得到增强后的语音信号。实验结果表明,该方法在平稳、非平稳噪声环境下相比于单一谱特征的语音增强方法平均提高31.6%,改善了语音增强方法的性能。相似文献

2.

基于修正Mel域掩蔽模型和无语音概率的耳语音增强 总被引：1，自引：0，他引：1

陶智赵鹤鸣吴迪陈大庆张晓俊《声学学报》2009,34(4):370-377

提出了一种基于修正Mel域听觉掩蔽模型和无语音概率的耳语音增强方法。该方法根据耳语音的发音特点对Mel频率进行修正,对每一帧耳语音信号进行Mel域频带滤波,同时通过无语音概率(SAP)动态地确定每个频带的听觉掩蔽阈值,对不同的听觉掩蔽阈值自适应地调整谱减系数来进行耳语音增强。对增强后的耳语音进行客观和主观测试,结果表明,该方法与其它谱减法相比,能将残留噪声和背景噪声控制在人耳掩蔽阈值下,取得更小的语音失真,主观听觉也得到了很大的改善。相似文献

3.

L_1/2稀疏约束卷积非负矩阵分解的单通道语音增强方法

下载免费PDF全文

路成田猛周健王华彬陶亮《声学学报》2017,42(3):377-384

为了刻画语音信号帧间相关性和使用更少的语音基表示语音特征,提出一种采用L_1/2稀疏约束的卷积非负矩阵分解方法进行单通道语音增强。首先,进行噪声学习得到噪声基;然后,以噪声基为先验信息结合L_1/2稀疏约束卷积非负矩阵分解方法学习含噪语音中的语音基成分;最后,利用学习到的语音基和系数重建出干净语音信号。在不同噪声环境下进行的实验结果表明,本文方法优于采用L₁稀疏约束的卷积非负矩阵方法及传统的统计语音增强方法。相似文献

4.

Application of spectral subtraction method on enhancement of electrolarynx speech

Liu H Zhao Q Wan M Wang S 《The Journal of the Acoustical Society of America》2006,120(1):398-406

Although electrolarynx (EL) serves as an important method of phonation for the laryngectomees, the resulting speech is of poor intelligibility due to the presence of a steady background noise caused by the instrument, even worse in the case of additive noise. This paper investigates the problem of EL speech enhancement by taking into account the frequency-domain masking properties of the human auditory system. One approach is incorporating an auditory masking threshold (AMT) for parametric adaptation in a subtractive-type enhancement process. The other is the supplementary AMT (SAMT) algorithm, which applies a cross-correlation spectral subtraction (CCSS) approach as a post-processing scheme to enhancing EL speech dealt with the AMT method. The performance of these two algorithms was evaluated as compared to the power spectral subtraction (PSS) algorithm. The best performance of EL speech enhancement was associated with the SAMT algorithm, followed by the AMT algorithm and the PSS algorithm. Acoustic and perceptual analyses indicated that the AMT and SAMT algorithms achieved the better performances of noise reduction and the enhanced EL speech was more pleasant to human listeners as compared to the PSS algorithm. 相似文献

5.

An iterative noise cross-PSD estimation for two-microphone speech enhancement 总被引：2，自引：0，他引：2

Mohsen Rahmani Ahmad Akbari Beghdad Ayad 《Applied Acoustics》2009,70(3):514-521

Among various speech enhancement methods, dual-microphone methods are of a great importance for their low cost implementation and for exploiting spatial-filtering benefits of the microphone arrays. Coherence based methods are well-known as efficient two-microphone noise reduction techniques. These techniques do not work well, when received noise signals are correlated. These can be improved when the cross power spectral density (CPSD) of noise is available. In this paper, we propose an iterative approach for estimation of the noise CPSD to be employed in coherence based methods. We compare the proposed iterative noise CPSD estimation with a noise CPSD estimation technique based on voice activity detector (VAD), both of which are employed in a two-microphone speech enhancement, separately. Evaluation results show that the two-microphone speech enhancement scheme utilizing the proposed noise CPSD estimation technique performs superior than the enhancement system using the VAD-based noise CPSD estimation. 相似文献

6.

基于字典学习和稀疏表示的单通道语音增强算法综述* 总被引：1，自引：0，他引：1

下载免费PDF全文

叶中付朱媛媛贾翔宇《应用声学》2019,38(4):645-652

如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。相似文献

7.

Perceptually motivated wavelet packet transform for bioacoustic signal enhancement

Ren Y Johnson MT Tao J 《The Journal of the Acoustical Society of America》2008,124(1):316-327

A significant and often unavoidable problem in bioacoustic signal processing is the presence of background noise due to an adverse recording environment. This paper proposes a new bioacoustic signal enhancement technique which can be used on a wide range of species. The technique is based on a perceptually scaled wavelet packet decomposition using a species-specific Greenwood scale function. Spectral estimation techniques, similar to those used for human speech enhancement, are used for estimation of clean signal wavelet coefficients under an additive noise model. The new approach is compared to several other techniques, including basic bandpass filtering as well as classical speech enhancement methods such as spectral subtraction, Wiener filtering, and Ephraim-Malah filtering. Vocalizations recorded from several species are used for evaluation, including the ortolan bunting (Emberiza hortulana), rhesus monkey (Macaca mulatta), and humpback whale (Megaptera novaeanglia), with both additive white Gaussian noise and environment recording noise added across a range of signal-to-noise ratios (SNRs). Results, measured by both SNR and segmental SNR of the enhanced wave forms, indicate that the proposed method outperforms other approaches for a wide range of noise conditions. 相似文献

8.

Noise estimation based on time–frequency correlation for speech enhancement

Wenhao Yuan Jiajun Lin Wei An Yu Wang Ning Chen 《Applied Acoustics》2013,74(5):770-781

As a fundamental part of speech enhancement, noise estimation is particularly challenging in highly non-stationary noise environments. In this work, we propose an effective algorithm on the basis of the “Improved Minima Controlled Recursive Averaging (IMCRA)” with the objective to improve the performance of noise estimation. The main contributions of this work are: (i) in the algorithm, a rough decision about speech presence is proposed by calculating the autocorrelation and cross-channel correlation of the T–F (Time–Frequency) units; (ii) with this decision, we refine the smoothing parameters for the smoothing of noisy power spectrum and the recursive averaging in noise spectrum estimation as well as the weighting factor for the a priori SNR (Signal to Noise Ratio) estimation in the IMCRA; (iii) we improve the search of local minima during spectral bursts by adding a minimum search with a shorter window. Extensive experiments are carried out to evaluate the performance of our proposed algorithm. The experimental results illustrate that, compared with the IMCRA, the proposed approach significantly improves the accuracy of noise spectrum estimation and the quality of enhanced speech in the typical noise situations. 相似文献

9.

联合深度神经网络和凸优化的单通道语音增强算法 总被引：1，自引：1，他引：0

下载免费PDF全文

张晓艳张天骐葛宛营白杨柳《声学学报》2021,46(3):471-480

噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。相似文献

10.

联合精确比值掩蔽与深度神经网络的单通道语音增强方法

下载免费PDF全文

柏浩钧张天骐刘鉴兴叶绍鹏《声学学报》2022,47(3):394-404

针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。相似文献

11.

Audio-visual enhancement of speech in noise.

L Girin J L Schwartz G Feng 《The Journal of the Acoustical Society of America》2001,109(6):3007-3020

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach--that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach to the problem is considered, since it has been demonstrated in several studies that viewing the speaker's face improves message intelligibility, especially in noisy environments. A speech enhancement prototype system that takes advantage of visual inputs is developed. A filtering process approach is proposed that uses enhancement filters estimated with the help of lip shape information. The estimation process is based on linear regression or simple neural networks using a training corpus. A set of experiments assessed by Gaussian classification and perceptual tests demonstrates that it is indeed possible to enhance simple stimuli (vowel-plosive-vowel sequences) embedded in white Gaussian noise. 相似文献

12.

Using power level difference for near field dual-microphone speech enhancement

Nima Yousefian Ahmad Akbari Mohsen Rahmani 《Applied Acoustics》2009,70(11-12):1412-1421

In this contribution, a novel dual-channel speech enhancement technique is introduced. The proposed approach uses the dissimilarity between the power of received signals in the two channels as a criterion for speech enhancement and noise reduction. We claim that in near field conditions, where the distances between microphones and sound source are short, the difference in the received power levels at the two microphones is an estimate of the clean speech signal power. Then, apply this theory to present an optimum method for speech enhancement. Fortunately, the method has the ability to cope with problems such as transient noise and nearby microphones which are two of the main problems of the proposed dual-microphone speech enhancement techniques. Using objective speech quality measures and spectrogram analysis, we show that the proposed method results in improved speech quality. 相似文献

13.

一种新的手机双麦克风消噪系统*

章雒霏张铭李晨《应用声学》2017,36(1):32-40

针对现有的手机双麦克风消噪系统无法应对多种复杂的噪声环境在消除噪声的同时会引起语音失真等问题,本文提出了一种新的手机双麦克风消噪系统,该系统将时域与频域处理相结合,在噪声估计和噪声消除两个方面均做了改进,结合双麦克风和单麦克风的噪声估计算法,提高了噪声估计的准确性,同时将基音检测与消噪处理相结合,在语音帧中估计语音基音频率,同时确定语音和噪声频率点,对待语音频率点和噪声频率点分别调整维纳滤波器的参数在滤除噪声的同时对语音频率点尽可能的保留从而减少语音失真。实验结果表明,与现有的双麦克风消噪系统相比,本系统在对噪声进行抑制的同时能够有效减少消噪算法对语音造成的损害,提高了手机的通话质量,对于方向性的语音干扰也能起到很好的抑制效果。相似文献

14.

Speech enhancement based on the discrete Gabor transform and multi-notch adaptive digital filters

Ergun Erçelebi 《Applied Acoustics》2004,65(8):739-762

This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained. 相似文献

15.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

16.

听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法

王玥李平崔杰《声学学报》2013,38(4):501-508

为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。相似文献

17.

人耳听觉相关代价函数深度学习单通道语声增强算法

下载免费PDF全文

程琳娟彭任华郑成诗李晓东《应用声学》2022,41(4):654-666

均方误差函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,均方误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,该文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与均方误差代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。相似文献

18.

Contribution of low-frequency acoustic information to Chinese speech recognition in cochlear implant simulations

Luo X Fu QJ 《The Journal of the Acoustical Society of America》2006,120(4):2260-2266

Chinese sentence recognition strongly relates to the reception of tonal information. For cochlear implant (CI) users with residual acoustic hearing, tonal information may be enhanced by restoring low-frequency acoustic cues in the nonimplanted ear. The present study investigated the contribution of low-frequency acoustic information to Chinese speech recognition in Mandarin-speaking normal-hearing subjects listening to acoustic simulations of bilaterally combined electric and acoustic hearing. Subjects listened to a 6-channel CI simulation in one ear and low-pass filtered speech in the other ear. Chinese tone, phoneme, and sentence recognition were measured in steady-state, speech-shaped noise, as a function of the cutoff frequency for low-pass filtered speech. Results showed that low-frequency acoustic information below 500 Hz contributed most strongly to tone recognition, while low-frequency acoustic information above 500 Hz contributed most strongly to phoneme recognition. For Chinese sentences, speech reception thresholds (SRTs) improved with increasing amounts of low-frequency acoustic information, and significantly improved when low-frequency acoustic information above 500 Hz was preserved. SRTs were not significantly affected by the degree of spectral overlap between the CI simulation and low-pass filtered speech. These results suggest that, for CI patients with residual acoustic hearing, preserving low-frequency acoustic information can improve Chinese speech recognition in noise. 相似文献

19.

Multiresolution information measures applied to speech recognition

María E. Torres Hugo L. Rufiner Diego H. Milone 《Physica A》2007,385(1):319-332

Considerable advances in automatic speech recognition have been made in the last decades, thanks specially to the use of hidden Markov models. In the field of speech signal analysis, different techniques have been developed. However, deterioration in the performance of the speech recognizers has been observed when they are trained with clean signal and tested with noisy signals. This is still an open problem in this field. Continuous multiresolution entropy has been shown to be robust to additive noise in applications to different physiological signals. In previous works we have included Shannon and Tsallis entropies, and their corresponding divergences, in different speech analysis and recognition systems. In this paper we present an extension of the continuous multiresolution entropy to different divergences and we propose them as new dimensions for the pre-processing stage of a speech recognition system. This approach takes into account information about changes in the dynamics of speech signal at different scales. The methods proposed here are tested with speech signals corrupted with babble and white noise. Their performance is compared with classical mel cepstral parametrization. The results suggest that these continuous multiresolution entropy related measures provide valuable information to the speech recognition system and that they could be considered to be included as an extra component in the pre-processing stage. 相似文献

20.

低信噪比下采用感知语谱结构边界参数的语音端点检测算法

吴迪赵鹤鸣陶智张晓俊肖仲喆许宜申《声学学报》2014,39(3):392-399

提出了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法,用于在低信噪比环境下的语音信号预处理。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10 dB到10 dB的各种信噪比环境下,采用PSSB参数的端点检测算法,相对于其它端点检测算法,更有效地检测出语音的端点。在-10 dB的极低信噪比下,提出的方法仍然有75.2%的正确率。采用PSSB参数的端点检测算法,更适合于低信噪比白噪声环境下的语音端点检测。相似文献