期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李哲军周萍景新幸《应用声学》2016,24(4):155-157, 162

针对语音信号中存在加性噪声使MFCC的鲁棒性和识别系统的性能下降的问题,基本谱减法的引入在增强MFCC抗噪性上取得的效果有限,为了使MFCC具有更好的抗噪性,提出了一种改进算法,在谱减法的基础上引入谱熵的思想,利用谱熵值的分布逐帧进行噪声估计,可更精确地谱减去噪;实验结果表明,当语音中含有加性噪声时,与基本谱减法相比,改进谱减法的说话人识别系统抗噪性与鲁棒性更好。 相似文献

2.

基于听觉模型的话者特征参数提取及其在噪声背景下的话者辨识

下载免费PDF全文

戴明扬徐柏龄《应用声学》2001,20(6):6-12,44

本文基于人耳听觉模型提出了一种鲁棒性的话者特征参数提取方法。该种方法中,首先由Gamma tone听觉滤波器组和Meddis内耳毛细胞发放模型获得表征听觉神经活动特性的听觉相关图。由听觉神经脉冲发放的锁相特性和双声抑制特性,我们将听觉相关图每个频带中的幅值最大频率分量作为表征当前频带特性的特征参量,于是所有频带的特征参量便构成了表征当前语音段特性的特征矢量;我们采用DCT交换进一步消除各个特征参量之间的相关性,压缩特征矢量的维数。有效性试验表明,该种特征矢量基本上反映了输入语音的谱包络特性;抗噪声性能实验表明,在高斯白噪声和汽车噪声干扰下,这种特征参数比LPCC和MFCC有较小的相对失真;基于矢量量化的文本无关话者辨识表明,对于三种类型的噪声干扰该种特征参数在低信噪比下都获得了较好的识别结果。相似文献

3.

深浅层特征及模型融合的说话人识别 总被引：4，自引：0，他引：4

下载免费PDF全文

仲伟峰方祥范存航温正棋陶建华《声学学报》2018,43(2):263-272

为了进一步提高说话人识别系统的性能,提出基于深、浅层特征融合及基于I-Vector的模型融合的说话人识别。基于深、浅层特征融合的方法充分考虑不同层级特征之间的互补性,通过深、浅层特征的融合,更加全面地描述说话人信息;基于I-Vector模型融合的方法融合不同说话人识别系统提取的I-Vector特征后进行距离计算,在系统的整体结构上综合了不同说话人识别系统的优势。通过利用CASIA南北方言语料库进行测试,以等错误率为衡量指标,相比基线系统,基于深、浅层特征融合的说话人识别其等错误率相对下降了54.8%,基于I-Vector的模型融合的方法其等错误率相对下降了69.5%。实验结果表明,深、浅层特征及模型融合的方法是有效的。相似文献

4.

低信噪比下采用感知语谱结构边界参数的语音端点检测算法

吴迪赵鹤鸣陶智张晓俊肖仲喆许宜申《声学学报》2014,39(3):392-399

提出了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法,用于在低信噪比环境下的语音信号预处理。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10 dB到10 dB的各种信噪比环境下,采用PSSB参数的端点检测算法,相对于其它端点检测算法,更有效地检测出语音的端点。在-10 dB的极低信噪比下,提出的方法仍然有75.2%的正确率。采用PSSB参数的端点检测算法,更适合于低信噪比白噪声环境下的语音端点检测。相似文献

5.

Speaker normalization of static and dynamic vowel spectral features

S A Zahorian A J Jagharghi 《The Journal of the Acoustical Society of America》1991,90(1):67-75

Two methods are described for speaker normalizing vowel spectral features: one is a multivariable linear transformation of the features and the other is a polynomial warping of the frequency scale. Both normalization algorithms minimize the mean-square error between the transformed data of each speaker and vowel target values obtained from a "typical speaker." These normalization techniques were evaluated both for formants and a form of cepstral coefficients (DCTCs) as spectral parameters, for both static and dynamic features, and with and without fundamental frequency (F0) as an additional feature. The normalizations were tested with a series of automatic classification experiments for vowels. For all conditions, automatic vowel classification rates increased for speaker-normalized data compared to rates obtained for nonnormalized parameters. Typical classification rates for vowel test data for nonnormalized and normalized features respectively are as follows: static formants--69%/79%; formant trajectories--76%/84%; static DCTCs 75%/84%; DCTC trajectories--84%/91%. The linear transformation methods increased the classification rates slightly more than the polynomial frequency warping. The addition of F0 improved the automatic recognition results for nonnormalized vowel spectral features as much as 5.8%. However, the addition of F0 to speaker-normalized spectral features resulted in much smaller increases in automatic recognition rates. 相似文献

6.

Conversion from whispered speech to normal speech using the extended bilinear transformation method

TAO Zhi ZHAO Heming TAN Xuedan GU Jihua ZHANG Xiaojun WU Di 《声学学报：英文版》2013,(4):425-438

A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper＇s formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper＇s non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech. 相似文献

7.

Speaker sex identification from voiced, whispered, and filtered isolated vowels.

N J Lass K R Hughes M D Bowyer L T Waters V T Bourne 《The Journal of the Acoustical Society of America》1976,59(3):675-678

相似文献

8.

采用扩展型双线性变换法将耳语音转换为正常语音的研究

下载免费PDF全文

陶智赵鹤鸣谈雪丹顾济华张晓俊吴迪《声学学报》2012,37(6):651-658

提出了一种采用扩展型双线性变换将耳语音转换为正常语音的方法。根据耳语音在不同频段的共振峰偏移程度不同,将耳语音的频谱进行分段处理,在此基础上建立耳语音转换为正常语音的转换函数。由于耳语音在各频段相对于正常语音非线性偏移,在双线性变换函数中引入扩展因子,使其对频谱的非线性偏移与对共振峰带宽的压缩更加符合耳语音转换为正常语音的实际转换需求,有效减小了转换语音与正常语音的谱失真距离。实验结果表明,本文的转换语音在音质和可懂度上均得到了有效提高。相似文献

9.

Study of the formant and duration in Chinese whispered vowel speech

Yue Zhao Wei Lin 《Applied Acoustics》2016

Study on the acoustical characteristic is important to speech and speaker recognition in Chinese whispered speech. In this paper, the characteristics of whispered speech are introduced and the acoustical characteristics in Chinese whispered speech are discussed. There is no fundamental frequency in the whispered speech, so other characteristics such as the duration and frequency of formant are extracted and analyzed. From experiments with six simple Chinese whispered vowels, it is proved that the duration and the frequency of formant can be used as the main acoustical characteristics in the Chinese whispered recognition. 相似文献

10.

A method of whispered speech enhancement based on speech absence probability and modified mel-domain masking model

TAO Zhi ZHAO Heming WU Di CHEN Daqing ZHANG Xiaojun 《声学学报：英文版》2011,30(3):345-357

Whispered speech enhancement using auditory masking model in modified Meldomain and Speech Absence Probability(SAP)was proposed.In light of the phonation characteristic of whisper,we modify the Mel-frequency Scaling model.Whispered speech is filtered by the proposed model.Meanwhile,the value of masking threshold for each frequency band is dynamically determined by speech absence probability.Then whispered speech enhancement is conducted by adaptively rectifying the spectrum subtraction coefficients using different masking threshold values.Results of objective and subjective tests on the enhanced whispered signal show that compared with other methods;the proposed method can enhance whispered signal with better subjective auditory quality and less distortion by reducing the music noise and background noise under the masking threshold value. 相似文献

11.

汉语耳语标准频谱的测量与计算 总被引：1，自引：0，他引：1

孙飞沈勇李炬安康《声学学报》2010,35(4):477-480

提出了与GB7348-87《耳语标准频谱》不同的汉语耳语功率谱密度级随频率的变化关系。在消声室中测量以提高测量信噪比,使用实时分析仪测量单个人耳语发音的长期声压频谱,并且对每个人的长期声压频谱做自归一化,通过数学方法将多个样本\ 相似文献

12.

基于听觉模型的耳语音的声韵切分 总被引：5，自引：0，他引：5

下载免费PDF全文

丁慧栗学丽徐柏龄《应用声学》2004,23(2):20-25,44

本文分析了耳语音的特点，并根据生理声学及心理声学的基本理论与实验资料，提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次：耳蜗对声音频率的分解机理；听觉系统的时域和频域非线性变化；中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性，因而适于耳语音识别，在耳语音声韵母切分实验中得到了满意的结果。相似文献

13.

全局特征及弱尺度融合策略的小样本语音情感识别

下载免费PDF全文

黄永明章国宝李雄达飞鹏《声学学报》2012,37(3):330-338

语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。相似文献

14.

Identification of speaker sex from isolated, whispered vowels 总被引：2，自引：0，他引：2

M F Schwartz H E Rine 《The Journal of the Acoustical Society of America》1968,44(6):1736-1737

相似文献