首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
针对高噪环境下语音识别的困难,提出一种基于独立分量分析的盲分离(ICA/BSS)与小波联合的语音降噪预处理方法,针对不同种类和不同输入信噪比的噪声设计了试验,结果表明基于ICA的语音识别预处理方法对低输入信噪比情况下多种噪声具有很强的鲁棒性和优越性,此结论对现实世界高噪环境下的信号分析和语音识别具有重要意义。  相似文献   

2.
矢量泰勒级数特征补偿的说话人识别   总被引:2,自引:0,他引:2  
将矢量泰勒级数(Vector Taylor Series,VTS)特征补偿算法应用于说话人识别,给出了卷积噪声方差的近似闭式解,构建了联合快速估计卷积噪声和加性噪声均值和方差的框架。该算法可在无需失配环境先验信息的前提下,直接从失配语音中估计出卷积噪声和加性噪声的均值和方差,实现对环境失配的补偿。实验结果表明,在信道变化较大的无线信道下,卷积噪声方差的补偿最高可降低误识率3.24%.提升了系统的识别性能。在存在加性噪声的无线信道下,与基于线性失真模型的特征映射算法和倒谱均值减算法相比,本文算法可分别最大降低49.65%和68.06%的误识率,适合于信道变化较大的失配环境补偿。   相似文献   

3.
李哲军  周萍  景新幸 《应用声学》2016,24(4):155-157, 162
针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪;实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。  相似文献   

4.
针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪。实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。  相似文献   

5.
时文华  张雄伟  邹霞  孙蒙  李莉 《声学学报》2020,45(3):299-307
提出了一种联合深度编解码神经网络和时频掩蔽估计的语音增强方法。该方法利用深度编解码网络估计时频掩蔽表示,并联合带噪语音的幅度谱学习带噪语音与纯净语音幅度谱之间的非线性映射关系。深度编解码网络采用卷积-反卷积网络结构。在编码端,利用卷积网络的局部感知特性,对带噪语音的时频域结构特征进行建模,提取语音特征,同时抑制背景噪声。在解码端,利用编码端提取到的语音特征逐层恢复局部细节信息并重构语音信号。同时,在编解码端对应层之间引入跳跃连接,以减少由于池化和全连接操作导致的低层细节信息丢失的问题。在TIMIT语音库和不完全匹配噪声集下进行仿真实验,实验结果表明,该方法可以有效抑制噪声,且能较好地恢复出语音细节成分。   相似文献   

6.
联合深度神经网络和凸优化的单通道语音增强算法   总被引:1,自引:1,他引:0       下载免费PDF全文
噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。   相似文献   

7.
提出了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法,用于在低信噪比环境下的语音信号预处理。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10 dB到10 dB的各种信噪比环境下,采用PSSB参数的端点检测算法,相对于其它端点检测算法,更有效地检测出语音的端点。在-10 dB的极低信噪比下,提出的方法仍然有75.2%的正确率。采用PSSB参数的端点检测算法,更适合于低信噪比白噪声环境下的语音端点检测。   相似文献   

8.
针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。   相似文献   

9.
为了刻画语音信号帧间相关性和使用更少的语音基表示语音特征,提出一种采用L_(1/2)稀疏约束的卷积非负矩阵分解方法进行单通道语音增强。首先,进行噪声学习得到噪声基;然后,以噪声基为先验信息结合L_(1/2)稀疏约束卷积非负矩阵分解方法学习含噪语音中的语音基成分;最后,利用学习到的语音基和系数重建出干净语音信号。在不同噪声环境下进行的实验结果表明,本文方法优于采用L_1稀疏约束的卷积非负矩阵方法及传统的统计语音增强方法。  相似文献   

10.
针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。   相似文献   

11.
12.
尹辉  谢湘  匡镜明 《声学学报》2012,37(1):97-103
分数阶Fourier变换在处理非平稳信号尤其是chirp信号方面有着独特的优势,而人耳听觉系统具有自动语音识别系统难以比拟的优良性能。本文采用Gammatone听觉滤波器组对语音信号进行前端时域滤波,然后对输出的各个子带信号用分数阶Fourer变换方法提取声学特征。分数阶Fourier变换的阶数对其性能有着重要影响,本文针对子带时域信号提出了采用瞬时频率曲线拟合求取阶数的方法,并将其与采用模糊函数的方法作了比较。在干净与含噪汉语孤立数字库上的语音识别结果表明,采用新提出的声学特征得到的识别正确率相对MFCC基线系统有了显著提高;根据瞬时频率曲线搜索阶数的算法与模糊函数方法相比,计算量大大减少,并且根据该方法提取的声学特征得到了最高的平均识别正确率。   相似文献   

13.
针对低信噪比说话人识别中缺失数据特征方法鲁棒性下降的问题,提出了一种采用感知听觉场景分析的缺失数据特征提取方法。首先求取语音的缺失数据特征谱,并由语音的感知特性求出感知特性的语音含量。含噪语音经过感知特性的语音增强和对其语谱的二维增强后求解出语音的分布,联合感知特性语音含量和缺失强度参数提取出感知听觉因子。再结合缺失数据特征谱把特征的提取过程分解为不同听觉场景进行区分地分析和处理,以增强说话人识别系统的鲁棒性能。实验结果表明,在-10 dB到10 dB的低信噪比环境下,对于4种不同的噪声,提出的方法比5种对比方法的鲁棒性均有提高,平均识别率分别提高26.0%,19.6%,12.7%,4.6%和6.5%。论文提出的方法,是一种在时-频域中寻找语音鲁棒特征的方法,更适合于低信噪比环境下的说话人识别。   相似文献   

14.
肖寒春  郭俊峰  张丽 《应用声学》2018,37(6):909-915
梅尔倒谱系数特征提取技术依据人耳的感知特性将声信号从线性频域转换到梅尔域,在语音识别中得到广泛应用。该文将梅尔倒谱系数技术用于小型低空飞行器的声信号特征提取中,并针对螺旋桨驱动类的小型低空飞行器具有稳定的强谐波特性,对梅尔倒谱系数特征提取中使用的梅尔滤波器进行改进,通过对此类谐波处的线性频谱与梅尔谱转换曲线的斜率进行投影替换,提高滤波器对该谐波处信号的感知敏感度。仿真结果表明,使用改进的梅尔倒谱系数特征提取方法对小型低空飞行器进行特征提取时,能够得到更低的等误识率,并且在低信噪比环境中,改进的梅尔倒谱系数特征提取方法具有更好的抗噪能力。  相似文献   

15.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, the performance degrades significantly in noisy conditions. In this paper, a technique for noise compensation is proposed where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensation technique suppresses the effect of additive noise in speech. The robustness of the proposed features is further enhanced by the gain normalization technique. The normalized temporal envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are converted into modulation frequency features. These features are used in an automatic phoneme recognition task. Experiments are performed in mismatched train/test conditions where the test data are corrupted with various environmental distortions like telephone channel noise, additive noise, and room reverberation. Experiments are also performed on large amounts of real conversational telephone speech. In these experiments, the proposed features show substantial improvements in phoneme recognition rates compared to other speech analysis techniques. Furthermore, the contribution of various processing stages for robust speech signal representation is analyzed.  相似文献   

16.
Pitch detection is an important part of speech recognition and speech processing. In this paper, a pitch detection algorithm based on second generation wavelet transform was developed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform. The proposed pitch detection algorithm was tested for both real speech and synthetic speech signal. Some experiments were carried out under noisy environment condition to evaluate the accuracy and robustness of the proposed algorithm. Results showed that the proposed algorithm was robust to noise and provided accurate estimates of the pitch period for both low-pitched and high-pitched speakers. Moreover, different wavelet filters that were obtained using second generation wavelet transform were considered to see the effects of them on the proposed algorithm. It was noticed that Haar filter showed good performance as compared to the other wavelet filters.  相似文献   

17.
王玮蔚  张秀再 《应用声学》2019,38(2):237-244
针对传统语音情感特征参数在进行情感分类时性能不佳的问题,该文提出了一种基于变分模态分解的语音情感识别方法。情感语音信号首先由变分模态分解提取固有模态函数,然后对所选主导固有模态函数进行重新聚合,再提取梅尔倒谱系数和各固有模态函数的希尔伯特边际谱。为了验证该文提出的特征性能,选用两种语音数据库(EMODB、RAVDESS)进行实验,按该文方法提取特征后使用极限学习机进行语音情感分类识别。实验结果表明:相比基于经验模态分解和集合经验模态分解的语音情感特征,该文提出的特征有更好的识别性能,验证了该方法的实用性。  相似文献   

18.
Numerous attempts have been made to find low-dimensional, formant-related representations of speech signals that are suitable for automatic speech recognition. However, it is often not known how these features behave in comparison with true formants. The purpose of this study was to compare two sets of automatically extracted formant-like features, i.e., robust formants and HMM2 features, to hand-labeled formants. The robust formant features were derived by means of the split Levinson algorithm while the HMM2 features correspond to the frequency segmentation of speech signals obtained by two-dimensional hidden Markov models. Mel-frequency cepstral coefficients (MFCCs) were also included in the investigation as an example of state-of-the-art automatic speech recognition features. The feature sets were compared in terms of their performance on a vowel classification task. The speech data and hand-labeled formants that were used in this study are a subset of the American English vowels database presented in Hillenbrand et al. [J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Classification performance was measured on the original, clean data and in noisy acoustic conditions. When using clean data, the classification performance of the formant-like features compared very well to the performance of the hand-labeled formants in a gender-dependent experiment, but was inferior to the hand-labeled formants in a gender-independent experiment. The results that were obtained in noisy acoustic conditions indicated that the formant-like features used in this study are not inherently noise robust. For clean and noisy data as well as for the gender-dependent and gender-independent experiments the MFCCs achieved the same or superior results as the formant features, but at the price of a much higher feature dimensionality.  相似文献   

19.
This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.  相似文献   

20.
基于多带解调分析和瞬时频率估计的耳语音话者识别   总被引:4,自引:0,他引:4  
王敏  赵鹤鸣 《声学学报》2010,35(4):471-476
为了改善耳语音话者识别的稳健性,提出了一种基于调幅-调频(AM-FM)模型的耳语音特征参数,瞬时频率估计(IFE)。根据语音产生的共振峰调制理论,采用多带解调分析(MDA)获得语音的瞬时包络和频率;然后根据包络幅度和频率的加权估计,得到语音的特征IFE来描绘语音的频率结构。将该特征用于耳语话者识别并和传统的Mel倒谱系数(MFCC)进行了比较。实验结果表明,随着测试人数的增加,IFE的识别效果略好于MFCC;在测试信道改变的情况下,与MFCC相比IFE的稳健性得到了有效的提高。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号