首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
窄带语音带宽扩展算法研究   总被引:1,自引:0,他引:1  
张勇  刘轶 《声学学报》2014,39(6):764-773
为了降低谱失真,提出了一种基于隐马尔科夫模型的窄带语音带宽扩展算法。首先,算法选取与宽带谱包络互信息大的参数构成特征矢量,并利用隐马尔可夫状态和过去观察特征矢量的联合先验概率估计条件后验概率。其次,以条件后验概率为基础,算法结合贝叶斯条件参数估计法和最小均方差准则估计宽带谱包络。针对宽带激励信号估计,基于信号高频和低频的谐波相关性,提出了一种中频激励扩展算法。实验结果表明,与传统的基于隐马尔可夫模型的带宽扩展算法相比,本文算法可降低0.187 dB的平均谱失真,将谱失真大于10 dB的语音帧减少了34.3%。   相似文献   

2.
基于高斯混合模型的语音带宽扩展算法的研究   总被引:2,自引:0,他引:2  
张勇  胡瑞敏 《声学学报》2009,34(5):471-480
为了降低高带谱失真,研究了带宽扩展算法中特征参数与高带谱包络的互信息和高带谱失真之间的函数关系,并在此基础上提出了一种扩展高斯混合模型带宽扩展算法。首先,算法选择与高带谱包络互信息大的参数构成特征矢量,并根据高斯混合模型计算特征矢量与高带谱包络的联合概率密度。其次,采用Expectation-Maximization(EM)算法估计高斯分量模型参数并计算后验概率。最后,通过后验概率估计高带谱包络。实验结果表明,与传统的高斯混合模型带宽扩展算法相比,本文算法可降低0.3 dB的高带平均谱失真,将谱失真大于10dB的语音帧减少了50%以上。   相似文献   

3.
It is well known that the non-stationary wideband noise is the most difficult to be removed in speech enhancement. In this paper a novel speech enhancement algorithm based on the dyadic wavelet transform and the simplified Karhunen-Loeve transform (KLT) is proposed to suppress the non-stationary wideband noise. The noisy speech is decomposed into components by the wavelet space and KLT-based vector space, and the components are processed and reconstructed, respectively, by distinguishing between voiced speech and unvoiced speech. There are no requirements of noise whitening and SNR pre-calculating. In order to evaluate the performance of this algorithm in more detail, a three-dimensional spectral distortion measure is introduced. Experiments and comparison between different speech enhancement systems by means of the distortion measure show that the proposed method has no drawbacks existing in the previous methods and performs better shaping and suppressing of the non-stationary wideband noise for speech enhancement.  相似文献   

4.
戴明扬  徐柏龄 《应用声学》2001,20(6):6-12,44
本文基于人耳听觉模型提出了一种鲁棒性的话者特征参数提取方法。该种方法中,首先由Gamma tone听觉滤波器组和Meddis内耳毛细胞发放模型获得表征听觉神经活动特性的听觉相关图。由听觉神经脉冲发放的锁相特性和双声抑制特性,我们将听觉相关图每个频带中的幅值最大频率分量作为表征当前频带特性的特征参量,于是所有频带的特征参量便构成了表征当前语音段特性的特征矢量;我们采用DCT交换进一步消除各个特征参量之间的相关性,压缩特征矢量的维数。有效性试验表明,该种特征矢量基本上反映了输入语音的谱包络特性;抗噪声性能实验表明,在高斯白噪声和汽车噪声干扰下,这种特征参数比LPCC和MFCC有较小的相对失真;基于矢量量化的文本无关话者辨识表明,对于三种类型的噪声干扰该种特征参数在低信噪比下都获得了较好的识别结果。  相似文献   

5.
Many hearing-impaired listeners suffer from distorted auditory processing capabilities. This study examines which aspects of auditory coding (i.e., intensity, time, or frequency) are distorted and how this affects speech perception. The distortion-sensitivity model is used: The effect of distorted auditory coding of a speech signal is simulated by an artificial distortion, and the sensitivity of speech intelligibility to this artificial distortion is compared for normal-hearing and hearing-impaired listeners. Stimuli (speech plus noise) are wavelet coded using a complex sinusoidal carrier with a Gaussian envelope (1/4 octave bandwidth). Intensity information is distorted by multiplying the modulus of each wavelet coefficient by a random factor. Temporal and spectral information are distorted by randomly shifting the wavelet positions along the temporal or spectral axis, respectively. Measured were (1) detection thresholds for each type of distortion, and (2) speech-reception thresholds for various degrees of distortion. For spectral distortion, hearing-impaired listeners showed increased detection thresholds and were also less sensitive to the distortion with respect to speech perception. For intensity and temporal distortion, this was not observed. Results indicate that a distorted coding of spectral information may be an important factor underlying reduced speech intelligibility for the hearing impaired.  相似文献   

6.
A method used for objective evaluation of pronunciation of finals in standard Chinese is presented. The formant pattern of final is selected as the mam feature and an improved evaluation algorithm based on Support Vector Machine is proposed. In this algorithm, two-level classification strategy is employed. A full-classification model and a sub-classification model are trained for each final. The pronunciation quality is evaluated based on the classification results of this two-level strategy with scoring model of each final. The new evaluation method is compared with traditional methods such as Hidden Markov Model (HMM) posterior probability scoring method and feature of Mel-Frequency Cepstrum Coefficients (MFCC), and the results show that the performance is effectively improved by the proposed method. The correlation of scores between human testers and machine has achieved 82%.  相似文献   

7.
The just-noticeable-difference in frequency (jndf) for complex signals with triangular spectral envelopes is found to depend on the envelope slope. For shallow slopes (less than 140 dB/oct), jndf increases with decreasing slope. Addition of noise also impairs frequency discrimination within a region of about 20 dB above masked threshold. This is found for both maskers used: a wideband noise and a narrow-band masker which is below the signal in frequency. When wideband noise is used, frequency discrimination of complex signals with shallow slopes deteriorates more rapidly with decreasing signal-to-noise ratio than it does when the signals have steep spectral slopes.  相似文献   

8.
Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.  相似文献   

9.
Artificial bandwidth extension methods have been developed to improve the quality and intelligibility of narrowband telephone speech and to reduce the difference with wideband speech. Such methods have commonly been evaluated with objective measures or subjective listening-only tests, but conversational evaluations have been rare. This article presents a conversational evaluation of two methods for the artificial bandwidth extension of telephone speech. Bandwidth-extended narrowband speech is compared with narrowband and wideband speech in a test setting including a simulated telephone connection, realistic conversation tasks, and various background noise conditions. The responses of the subjects indicate that speech processed with one of the methods is preferred to narrowband speech in noise, but wideband speech is superior to both narrowband and bandwidth-extended speech. Bandwidth extension was found to be beneficial for telephone conversation in noisy listening conditions.  相似文献   

10.
语音存在概率的估计是语音增强的核心技术之一,针对传统的存在概率估计方法是启发式的,没有把存在概率的估计统一到一个理论框架之中,不能保证估计最优,提出了一种基于序贯隐马尔可夫模型(SHMM)的存在概率估计方法,在每一子带上构建一个SHMM模型描述对数功率谱包络的时间序列,把谱包络序列看作一个在语音和噪声状态之间转移的动态一阶马尔可夫链,采用单高斯函数构建每一状态的概率模型,语音状态的后验概率即为语音信号的存在概率。为了满足算法实时性要求,SHMM参数估计简化为一阶回归过程,根据极大似然准则逐帧更新模型参数。实验表明:SHMM所描述的时序相关性对存在概率的估计起到关键作用,它优于一般的启发式估计方法;SHMM算法的语音增强分段信噪比(SegSNR)和对数谱失真(LSD)性能优于经典的改进型最小统计量控制递归平均(IMCRA)算法。   相似文献   

11.
为了克服低信噪比输入下,语音增强造成语音清音中的弱分量损失,造成重构信号包络失真的问题。论文提出了一种新的语音增强方法。该方法根据语音感知模型,采用不完全小波包分解拟合语音临界频带,并对语音按子带能量进行清浊音区分处理,在阈值计算上,提出了一种清浊音分离,基于子带信号能量的小波包自适应阈值算法。通过仿真实验,客观评测和听音测试表明,该算法在低信噪比输入时较传统算法,能够更加有效地减少重构信号包络失真,在不损伤语音清晰度和自然度的前提下,使输出信噪比明显提高。将该算法与能量谱减法结合,进行二次增强能进一步提高降噪输出的语音质量。  相似文献   

12.
陈雪勤  赵鹤鸣 《声学学报》2013,38(2):195-200
为了改善耳语音转换中声道系统的转换性能,针对定值转换方法在非特定人耳语音转换系统中效果不理想的情况,提出使用通用背景模型建立独立于说话人的声道系统转换模型。进一步针对在通用背景模型中由于较大分量数产生的声学概率密度统计模型的误差问题,提出基于最小谱失真度的后验概率和有效高斯分量选择方法优化特征矢量的转换性能。定义了板仓一斋田谱失真测度的性能指标对该模型进行分析比较,实验表明,基于通用背景模型的转换特征矢量平均谱失真度性能指标优于定值偏移方法,且稳定性明显好于定值偏移方法。通用背景模型基础上有效高斯分量选择方法可进一步将性能指标提高5.11%,主观听觉测试表明本文方法可改善转换语音的清晰度和准确度。   相似文献   

13.
提出了复Contourlet域(CCT)中有向图与高斯混合模型的声呐图像增强算法。采用复Contourlet分析提取各尺度中声呐图像每一方向的弱特征信息;为建立特征信息间的联系,考虑复Contourlet域相邻尺度间子带系数的状态具有Markov性,子节点系数的状态依赖于父节点系数状态,构建有向概率图模型反映复系数的这种持续性;尺度内,构建高斯混合模型来建立同尺度中特性信息的联系,以两状态高斯混合模型来表征子带系数的非高斯边缘分布;最后,采用期望最大(EM)算法训练模型参数估计增强图像的系数,实现声呐图像增强。实验结果表明,本文算法与小波域隐马尔可夫树(HMT)算法、Contourlet域HMT算法相比,峰值信噪比(PSNR)增大4 dB以上,结构相似(SSIM)指数增加0.3;本文算法不仅能较好地抑制了声呐图像的强噪声,同时保留了图像边缘和轮廓等弱特征信息。   相似文献   

14.
王玥  李平  崔杰 《声学学报》2013,38(4):501-508
为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。   相似文献   

15.
Two experiments investigated the effects of critical bandwidth and frequency region on the use of temporal envelope cues for speech. In both experiments, spectral details were reduced using vocoder processing. In experiment 1, consonant identification scores were measured in a condition for which the cutoff frequency of the envelope extractor was half the critical bandwidth (HCB) of the auditory filters centered on each analysis band. Results showed that performance is similar to those obtained in conditions for which the envelope cutoff was set to 160 Hz or above. Experiment 2 evaluated the impact of setting the cutoff frequency of the envelope extractor to values of 4, 8, and 16 Hz or to HCB in one or two contiguous bands for an eight-band vocoder. The cutoff was set to 16 Hz for all the other bands. Overall, consonant identification was not affected by removing envelope fluctuations above 4 Hz in the low- and high-frequency bands. In contrast, speech intelligibility decreased as the cutoff frequency was decreased in the midfrequency region from 16 to 4 Hz. The behavioral results were fairly consistent with a physical analysis of the stimuli, suggesting that clearly measurable envelope fluctuations cannot be attenuated without affecting speech intelligibility.  相似文献   

16.
The dual-microphone voice activity detection (VAD) technique is proposed by applying discriminative weight training to achieve optimal weighting of spatial features available within the dual-microphone VAD. Since the motivation behind our method is to use the relevant spatial information available from the two microphones, we employ the phase difference, coherence, and power level difference ratio (PLDR) as a feature vector, and then use this feature vector to derive the maximum a posteriori (MAP) probabilities. Then, we combine each MAP probability based on a discriminative weight training, i.e., the minimum classification error (MCE) method to offer an optimal VAD decision in a spectral domain, which successfully represents the dynamic evolution of speech over time even in the non-stationary noise environments. The proposed dual-microphone VAD algorithm outperforms conventional dual-microphone VAD methods based on only single feature among the PLDR, phase difference, and spectral coherence.  相似文献   

17.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

18.
一种基于支持向量机的海底声学参数快速统计反演方法   总被引:1,自引:0,他引:1  
高伟  王宁 《声学学报》2010,35(3):343-352
匹配场统计反演海底声参数的根本目的是求解未知参数的后验概率分布(PPD)。针对现有各种求解参数PPD的数值方法如穷举搜索、Markov Chain Monte Carlo采样、最近邻域插值近似算法普遍存在计算速度慢、时间长、难以满足实际应用的问题,本文提出了一种基于支持向量机的快速求解参数PPD的新算法。该算法利用了支持向量机强大的小样本学习能力,通过训练学习拟合未知海底声参数和后验概率之间存在的函数关系,从而在求解参数PPD时简化了利用声场传播模型计算后验概率的复杂过程,减少了计算时间。数值仿真算例和海上实验数据的处理结果验证了该算法在低维匹配场统计反演海底声参数问题中的有效性。   相似文献   

19.
The last decade has seen increasing interest in techniques for the enhancement of digital speech signals. Significant gains have been made in terms of signal-to-noise ratio (SNR) and quality, but few techniques have produced improvements in intelligibility. A method for speech enhancement based on nonlinear expansion of the spectral envelope is presented. The expansion is consistent with both the long-term spectrum of the speech and with the probability that speech is present in a given sample. Objective SNR measures are used to compare this algorithm with the well-known spectral subtraction method, with an alternative expansion scheme, and with limiting SNRs resulting from perfect recovery of the amplitude spectrum. For the purpose of intelligibility assessments, a simplified version of the algorithm has been implemented on a Texas Instruments TMS320-C25 system. Listening trials with this real-time system, conducted using a modified rhyme test, have produced small, but consistent, improvements in articulation scores.  相似文献   

20.
为了研究心理声学在语声增强方面的应用,本文提出了一种基于等效矩阵带宽(ERB)尺度划分的多子带语声信号抗噪谱减算法。此算法根据ERB尺度将带噪信号的频谱划分成多个子带,然后再根据每个子带的分段信噪比以及心理声学掩蔽原则分别计算每个子带的谱减参数,最后在每个子带中分别进行谱减算法处理。实验结果表明,应用新算法所获得的语声增强结果在信噪比、IS失真以及PESQ方面均优于之前提出的多子带语声信号抗噪谱减算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号