排序方式: 共有56条查询结果,搜索用时 109 毫秒
41.
基于Fisher线性判别分析的语音信号端点检测方法 总被引:1,自引:0,他引:1
传统的语音端点检测方法对辅音,特别是受到噪声污染的清音部分与背景噪声之间分离能力不足。针对上述问题,该文提出一种基于Fisher线性判别分析的梅尔频率倒谱系数(F-MFCC)端点检测方法。将清音信号和背景噪声视为两类分类问题,采用Fisher准则求解具有判别信息的最佳投影方向,使得投影后的特征参数具有最小类内散度和最大类间散度,从而增大清音与背景噪声的可分离性。在不同语音库上的实验结果表明,F-MFCC能够在不同信噪比和背景噪声条件下提高语音端点检测的准确率。 相似文献
42.
43.
适当均衡耳机到鼓膜的传递函数可有效提高耳机声重放效果。耳廓与耳道滤波效应引起的幅度峰谷有助于人耳听觉感知,以平直幅频响应为目标的幅度均衡无法保持适当的峰谷。该文提出了基于roex滤波器与Mel频率倒谱系数的耳机到鼓膜的传递函数平滑方法,用于模拟人耳听觉感知特性和平滑耳机到鼓膜的传递函数,使均衡后的幅频响应保持相应的峰谷,避免了幅度峰谷过渡均衡。实验结果表明,进行耳机到鼓膜的传递函数平滑的幅度均衡对提高耳机的音色有显著作用,基于Mel频率倒谱系数平滑的幅度均衡对提高耳机的音色最为显著。 相似文献
44.
基于方差归一化失真测度的语音识别 总被引:1,自引:0,他引:1
在语音识别系统中不同码字有不同的识别能力,识别能力的大小可在权值上加以体现。采用了方差归一化失真测度,将这种有效性差别作为权重矢量反映到失真测度计算公式中,可有效提高基于MFCC(Mel -Frequency Cepstral Coefficients即梅尔倒频谱参数)的语音识别系统的识别性能。 相似文献
45.
46.
Dysarthria is a degenerative disorder of the central nervous system that affects the control of articulation and pitch; therefore, it affects the uniqueness of sound produced by the speaker. Hence, dysarthric speaker recognition is a challenging task. In this paper, a feature-extraction method based on deep belief networks is presented for the task of identifying a speaker suffering from dysarthria. The effectiveness of the proposed method is demonstrated and compared with well-known Mel-frequency cepstral coefficient features. For classification purposes, the use of a multi-layer perceptron neural network is proposed with two structures. Our evaluations using the universal access speech database produced promising results and outperformed other baseline methods. In addition, speaker identification under both text-dependent and text-independent conditions are explored. The highest accuracy achieved using the proposed system is 97.3%. 相似文献
47.
48.
Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States
下载免费PDF全文
![点击此处可从《电子科技学刊:英文版》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Bronson Syiem Sushanta Kabir Dutta Juwesh Binong Lairenlakpam Joyprakash Singh 《电子科技学刊:英文版》2021,19(2):155-162
In this paper, we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora. These four features include linear predictive coding (LPC), linear prediction cepstrum coefficient (LPCC), perceptual linear prediction (PLP), and Mel frequency cepstral coefficient (MFCC). The 10-hour speech data were used for training and 3-hour data for testing. For each spectral feature, different hidden Markov model (HMM) based recognizers with variations in HMM states and different Gaussian mixture models (GMMs) were built. The performance was evaluated by using the word error rate (WER). The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features. 相似文献
49.
研究了音频信息处理中一项重要的预处理工作:语音音乐分类.针对语音信号处理中遇到的实际问题,选择合适的音频特征和分类器来对音频数据进行语音和音乐分类.采用二级系统,选择优化低能量率( Modified Low Energy Ratio,MLER)以及梅尔频谱倒谱系数(Mel Frequency Cepstral Coef... 相似文献
50.