共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
分布式语音识别(DSR)是近年来出现的新技术,具有广阔的应用前景。结合ETSI关于DSR的最新标准介绍了DSR系统的组成,分析了DSR的主要技术,如前端特征提取算法、特征压缩与纠检错、服务器端的语音重构算法等,最后对DSR技术的应用做了简单展望。 相似文献
4.
5.
6.
7.
8.
针对真实环境中各种背景噪声下的鸟类声音识别问题,提出了一种基于新型抗噪特征提取的鸟类声音识别技术.首先,根据适用于高度非平稳环境下的噪声估计算法求出噪声功率谱.其次,使用多频带谱减法对声音功率谱进行降噪处理.接着,结合降噪的声音功率谱提取抗噪幂归一化倒谱系数(APNCC).最后,采用支持向量机(SVM)分别对提取的APNCC,幂归一化倒谱系数(PNCC)和Mel频率倒谱系数(MFCC)对34种鸟类声音进行不同环境和信噪比情况下的对比实验.实验表明,提取的APNCC具有较好的平均识别效果及较强的噪声鲁棒性,更适用于信噪比低于30dB环境下的鸟类声音识别. 相似文献
9.
利用隐马尔可夫模型(HMM)的动态时间序列建模能力及神经网络的模式分类能力,构成混合语音识别模型,同时考虑到语音信号的非平稳性,采用小波分析方法提取语音特征向量。通过时间规整方法,将所有具有可变长度的语音特征向量转换为相同维数的特征向量,从而简化了神经网络的结构。仿真结果表明,采用混合语音识别模型以及时间规整方法,不仅可提高识别率,同时大大缩减了训练时间,获得了很好的识别效果。 相似文献
10.
一种适于非特定人语音识别的并行隐马尔可夫模型 总被引:2,自引:0,他引:2
为了适合非特定人语音识别,提出了一种由多条并行马尔可夫链组成的并行HMM(Parallel Hidden Markov Model,PHMM),从而融合了基于分类的语音识别中为各个类别建立的模板,提高了识别性能,各条链之间允许有交叉,使得融合的多模板之间存在状态共享,同时PHMM可以在训练过程中自动完成聚类,且测试语音的输出结果来自所有类别,无需聚类分析和类别判断,这些都减少了存储量和计算量,汉语非特定人孤立数字的识别实验表明,PHMM较之传统CHMM使识别性能及噪声鲁棒性都得到了改善。 相似文献
11.
12.
Statistical Model‐Based Noise Reduction Approach for Car Interior Applications to Speech Recognition
Sung Joo Lee Byung Ok Kang Ho‐Young Jung Yunkeun Lee Hyung Soon Kim 《ETRI Journal》2010,32(5):801-809
This paper presents a statistical model‐based noise suppression approach for voice recognition in a car environment. In order to alleviate the spectral whitening and signal distortion problem in the traditional decision‐directed Wiener filter, we combine a decision‐directed method with an original spectrum reconstruction method and develop a new two‐stage noise reduction filter estimation scheme. When a tradeoff between the performance and computational efficiency under resource‐constrained automotive devices is considered, ETSI standard advance distributed speech recognition font‐end (ETSI‐AFE) can be an effective solution, and ETSI‐AFE is also based on the decision‐directed Wiener filter. Thus, a series of voice recognition and computational complexity tests are conducted by comparing the proposed approach with ETSI‐AFE. The experimental results show that the proposed approach is superior to the conventional method in terms of speech recognition accuracy, while the computational cost and frame latency are significantly reduced. 相似文献
13.
QIN Wei WEI Gang 《中国电子科技》2006,4(1):43-46
As a kind of statistical method, the technique of Hidden Markov Model (HMM) is widely used for speech recognition. In order to train the HMM to be more effective with much less amount of data, the Subspace Distribution Clustering Hidden Markov Model (SDCHMM), derived from the Continuous Density Hidden Markov Model (CDHMM), is introduced. With parameter tying, a new method to train SDCHMMs is described. Compared with the conventional training method, an SDCHMM recognizer trained by means of the new method achieves higher accuracy and speed. Experiment results show that the SDCHMM recognizer outperforms the CDHMM recognizer on speech recognition of Chinese digits. 相似文献
14.
为了解决传统高斯混合模型(Gaussian Mixture Model,GMM)对初值敏感,在实际训练中极易得到局部最优参数的问题,本文提出了一种GMM参数优化的新方法.将小生境技术与最大似然估计融入到遗传训练过程,形成了一种新的混合算法,缓解了遗传算法产生的"早熟"现象,提高了算法的局部搜索能力.采用自适应策略来控制交叉和变异算子,同时在适应度评价中融入了其他用户的区分性信息,提高了模型的分类精度,增强了GMM的泛化能力.实验表明,与传统和改进的两种方法相比,本文的方法都可以得到更优的模型参数,使得系统的识别率进一步提高. 相似文献
15.
This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Frequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account. Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 相似文献
16.
ZhangLinghua ZhengBaoyu YangZhen 《电子科学学刊(英文版)》2005,22(4):437-442
This letter proposes an effective and robust speech feature extraction method based on statistical analysis of Pitch Prequency Distributions (PFD) for speaker identification. Compared with the conventional cepstrum, PFD is relatively insensitive to Additive White Gaussian Noise (AWGN), but it does not show good performance for speaker identification, even if under clean environments. To compensate this shortcoming, PFD and conventional cepstrum are combined to make the ultimate decision, instead of simply taking one kind of features into account.Experimental results indicate that the hybrid approach can give outstanding improvement for text-independent speaker identification under noisy environments corrupted by AWGN. 相似文献
17.
Comparison of Khasi Speech Representations with Different Spectral Features and Hidden Markov States 下载免费PDF全文
Bronson Syiem Sushanta Kabir Dutta Juwesh Binong Lairenlakpam Joyprakash Singh 《电子科技学刊:英文版》2021,19(2):155-162
In this paper, we present a comparison of Khasi speech representations with four different spectral features and novel extension towards the development of Khasi speech corpora. These four features include linear predictive coding (LPC), linear prediction cepstrum coefficient (LPCC), perceptual linear prediction (PLP), and Mel frequency cepstral coefficient (MFCC). The 10-hour speech data were used for training and 3-hour data for testing. For each spectral feature, different hidden Markov model (HMM) based recognizers with variations in HMM states and different Gaussian mixture models (GMMs) were built. The performance was evaluated by using the word error rate (WER). The experimental results show that MFCC provides a better representation for Khasi speech compared with the other three spectral features. 相似文献
18.
19.
基于滑动窗的混合高斯模型运动目标检测方法 总被引:1,自引:0,他引:1
在复杂场景下,传统混合高斯模型能较好地检测出运动目标,但随着时间的推移,模型参数收敛缓慢且难以适应场景中真实背景的实时变化,从而导致运动目标的错误检测率增加。该文利用滑动窗技术的短时历史记忆特性,提出一种新颖的基于滑动窗的混合高斯模型运动目标检测方法,该方法弥补了传统混合高斯背景模型不能及时形成新背景的缺点,提高了运动检测的完整性,并进一步降低了算法对场景光照变化的敏感性。多场景下的对比实验结果表明,该方法能更准确、完整地检测出运动目标并具有更好的环境适应性。 相似文献
20.
We consider the feature recombination technique in a multiband approach to speaker identification and verification. To overcome the ineffectiveness of conventional feature recombination in broadband noisy environments, we propose a new subband feature recombination which uses subband likelihoods and a subband reliable‐feature selection technique with an adaptive noise model. In the decision step of speaker recognition, a few very low unreliable feature likelihood scores can cause a speaker recognition system to make an incorrect decision. To overcome this problem, reliable‐feature selection adjusts the likelihood scores of an unreliable feature by comparison with those of an adaptive noise model, which is estimated by the maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. To evaluate the effectiveness of the proposed methods in noisy environments, we use the TIMIT database and the NTIMIT database, which is the corresponding telephone version of TIMIT database. The proposed subband feature recombination with subband reliable‐feature selection achieves better performance than the conventional feature recombination system with reliable‐feature selection. 相似文献