首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
肖寒春  郭俊峰  张丽 《应用声学》2018,37(6):909-915
梅尔倒谱系数特征提取技术依据人耳的感知特性将声信号从线性频域转换到梅尔域,在语音识别中得到广泛应用。该文将梅尔倒谱系数技术用于小型低空飞行器的声信号特征提取中,并针对螺旋桨驱动类的小型低空飞行器具有稳定的强谐波特性,对梅尔倒谱系数特征提取中使用的梅尔滤波器进行改进,通过对此类谐波处的线性频谱与梅尔谱转换曲线的斜率进行投影替换,提高滤波器对该谐波处信号的感知敏感度。仿真结果表明,使用改进的梅尔倒谱系数特征提取方法对小型低空飞行器进行特征提取时,能够得到更低的等误识率,并且在低信噪比环境中,改进的梅尔倒谱系数特征提取方法具有更好的抗噪能力。  相似文献   

2.
This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincare? section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincare? sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincare? sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features.  相似文献   

3.
《Optik》2014,125(22):6678-6680
Facial expression recognition plays an important role in a variety of real-world applications such as human–computer interaction, robot control, smart meeting, and visual surveillance. One critical step for facial expression recognition is to accurately extract emotional features. In this article, a facial expression recognition approach based on two-stage local facial textures extraction is proposed. At the first stage, we use the threshold local binary pattern to transform a facial image into a feature image. We then extract the most discriminate features from the feature image by using the block-based center-symmetric local binary pattern. Finally, these features are classified by the support vector machine. Experimental results are provided to illustrate the proposed approach is an effective method, compared to other similar methods.  相似文献   

4.
尺度不变特征与几何特征融合的人耳识别方法   总被引:3,自引:1,他引:2  
田莹  苑玮琦 《光学学报》2008,28(8):1485-1491
要提高人耳的识别率,关键是特征的提取与表达.尺度不变特征变换(SIFT)技术是局部点特征提取算法,在尺度空间寻找极值点,提取对图像的尺度和旋转变化具有不变性,对光照变化和图像变形具有较强的适应性的特征向量.尝试用SIFT技术来提取外耳图像的结构特征点以形成稳定的特征描述子,为了克服一幅图像中有多个局部描述子相似的问题,在SIFT特征描述子中融入一个耳廓几何特征.最后采用特征向量的欧氏距离作为两幅图像相似性度量标准进行人耳识别.在耳图像库七进行实验.结果表明,该方法不仅可以有效地提取人耳特征,通过少量特征可获得较高的识别率,而且对耳图像刚体变化具有较强的稳健性.  相似文献   

5.
Lip contour tracking is an integral part of lip reading application. Fast and accurate lip tracking is an important step in lip reading. This paper uses a novel active contour model for lip tracking and proposes geometrical feature extraction approach for lip reading. Effect of individual features are compared and a joint feature model is obtained by combining weighted decision obtained by a feature vector of difference in inner area, height and width of lip. Ergodic hidden markov model (HMM) is used as a classifier. For each digit Markov Model is tested for 3 states and 5 states. Videos of English digit from 0 to 9 have been recorded for recognition test. Cuave database is used for comparison along with an in-house database. While doing computation of feature vectors, only significant frames are used to reduce the computation complexity. Results of experimentations on digit utterances are given to show that the maximum recognized digit can be used for important programming command of computerized numerical control machines.  相似文献   

6.
7.
特征提取和分类是太赫兹光谱识别的关键。部分物质在太赫兹波段内没有明显的吸收峰,难以人工定义、提取特征及分类识别,为此,结合深度信念网络(deep belief network,DBN)和K-Nearest Neighbors (KNN)分类器的优点,提出了一种基于DBN的太赫兹光谱识别方法。首先利用S-G滤波和三次样条插值对ATP,acetylcholine_bromide,bifenthrin,buprofezin,carbazole,bleomycin,buckminster和cylotriphosphazene在0.9~6 THz内的太赫兹透射光谱进行归一化处理;然后由两层受限波尔兹曼机(restricted Boltzmann machine, RBM)构建DBN模型,并采用逐层无监督的方法训练模型,以自动提取太赫兹光谱特征;最后用KNN分类器对8种物质的太赫兹透射光谱进行分类。结果表明,使用DBN自动提取的光谱特征,KNN分类器、BP神经网络、SOM神经网络和RBF神经网络的分类准确率达到了90%以上,且KNN分类器的识别率优于其他三种分类器;采用DBN自动提取物质的太赫兹光谱特征大大减少了工作量,在海量光谱数据识别中具有广阔的应用前景。  相似文献   

8.
Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.  相似文献   

9.
Based on the actual needs of speech application research such as speech recognition and voiceprint recognition,the acoustic characteristics and recognition of Hotan dialect were studied for the first time.Firstly,the Hetian dialect voice was selected for artificial multi-level annotation,and the formant,duration and intensity of the vowel were analyzed to describe statistically the main pattern of Hetian dialect and the pronunciation characteristics of male and female.Then using the analysis of variance and nonparametric analysis to test the formant samples of the three dialects of Uygur language,the results show that there are significant differences in the formant distribution patterns of male vowels,female vowels and whole vowels in the three dialects.Finally,the GUM-UBM(Gaussian Mixture Model-Universal Background Model),DNN-UBM(Deep Neural Networks-Universal Background Model) and LSTM-UBM(Long Short Term Memory Network-Universal Background Model) Uyghur dialect recognition models are constructed respectively.Based on the Mel-frequency cepstrum coefficient and its combination with the formant frequency for the input feature extraction,the contrastive experiment of dialect i-vector distinctiveness is carried out.The experimental results show that the combined features of the formant coefficients can increase the recognition of the dialect,and the LSTM-UBM model can extract more discriminative dialects than the GMM-UBM and DNN-UBM.  相似文献   

10.
11.
针对低信噪比说话人识别中缺失数据特征方法鲁棒性下降的问题,提出了一种采用感知听觉场景分析的缺失数据特征提取方法。首先求取语音的缺失数据特征谱,并由语音的感知特性求出感知特性的语音含量。含噪语音经过感知特性的语音增强和对其语谱的二维增强后求解出语音的分布,联合感知特性语音含量和缺失强度参数提取出感知听觉因子。再结合缺失数据特征谱把特征的提取过程分解为不同听觉场景进行区分地分析和处理,以增强说话人识别系统的鲁棒性能。实验结果表明,在-10 dB到10 dB的低信噪比环境下,对于4种不同的噪声,提出的方法比5种对比方法的鲁棒性均有提高,平均识别率分别提高26.0%,19.6%,12.7%,4.6%和6.5%。论文提出的方法,是一种在时-频域中寻找语音鲁棒特征的方法,更适合于低信噪比环境下的说话人识别。   相似文献   

12.
In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically interpretable features. Robustness against extrinsic variation (different types of additive noise) and intrinsic variability (arising from changes in speaking rate, effort, and style) is quantified in a series of recognition experiments. The results are compared to reference ASR systems using Mel-frequency cepstral coefficients (MFCCs), MFCCs with cepstral mean subtraction (CMS) and RASTA-PLP features, respectively. Gabor features are shown to be more robust against extrinsic variation than the baseline systems without CMS, with relative improvements of 28% and 16% for two training conditions (using only clean training samples or a mixture of noisy and clean utterances, respectively). When used in a state-of-the-art system, improvements of 14% are observed when spectro-temporal features are concatenated with MFCCs, indicating the complementarity of those feature types. An analysis of the importance of specific MF shows that temporal MF up to 25 Hz and spectral MF up to 0.25 cycles/channel are beneficial for ASR.  相似文献   

13.
In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/output pattern pairs and attempts to learn their functional relationship; it develops the necessary representational features during the course of learning. A series of computer simulation studies was carried out to assess the ability of these networks to accurately label sounds, to learn to recognize sounds without labels, and to learn feature representations of continuous speech. These studies demonstrated that the networks can learn to label presegmented test tokens with accuracies of up to 95%. Networks trained on segmented sounds using a strategy that requires no external labels were able to recognize and delineate sounds in continuous speech. These networks developed rich internal representations that included units which corresponded to such traditional distinctions as vowels and consonants, as well as units that were sensitive to novel and nonstandard features. Networks trained on a large corpus of unsegmented, continuous speech without labels also developed interesting feature representations, which may be useful in both segmentation and label learning. The results of these studies, while preliminary, demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.  相似文献   

14.
针对近红外光谱定性分析中,增加新的品种进行建模时,原有模型识别效果不够稳定的问题,提出一种在建模样本的基础上添加同类物质的历史光谱数据的特征提取方法,首先采集建模样本的近红外光谱数据,然后添加同种物质样本的历史近红外光谱数据,再对所有近红外光谱数据进行预处理,其次对所有样本数据进行偏最小二乘(PLS)特征提取得到偏最小二乘空间,并只将建模样本数据向构建的偏最小二乘空间进行投影,最后将投影后的建模数据进行正交线性判别分析(OLDA)特征提取。以玉米种子近红外光谱为研究对象,分别对建模数据添加历史近红外光谱以及不添加历史近红外光谱两种情况进行特征提取,并通过仿生模式识别(BPR)方法构建模型进行验证,实验结果表明,添加历史近红外光谱构建偏最小二乘空间的特征提取方法相对于不添加历史近红外光谱的方法,首先在增加建模集品种数量时,原有的品种识别率基本不变;其次在相同PLS维数时,所建模型对不同时间采集的测试集识别效果基本一致,证明了该方法可以提高模型稳健性。在实际应用中就可以在品种鉴别软件中将特征提取维数设置为固定值,免除了品种鉴别软件的用户在增加建模集品种时为了保证最优识别效果重新选定最优PLS参数的麻烦。  相似文献   

15.
紫檀属中的木材有很多属于名贵木材,不同树种之间十分相似。传统的木材识别方法多以木材解剖学为主,通过观察木材的切片结构特征对木材的树种进行判断,这类方法虽有较高的识别精度,但是其识别工艺较为复杂而且技术难度也相对较高。与木材解剖学相对应的是利用图像信息或光谱信息的木材树种识别方法,该类方法虽具有较为简单的识别工艺,但是在对同属相似木材树种进行识别时,往往不能够取得较好的识别效果。提出了一种基于木材切面光谱特征和纹理特征相融合的木材树种识别方法,该方法不仅识别工艺简单、自动化程度高,而且具有较高的识别精度。首先通过数码相机和光谱仪采集木材切面的图像信息和光谱信息,然后分别使用纹理特征提取方法和光谱特征提取方法提取两类特征的特征向量,接下来使用基于典型相关分析的特征级融合方法将这两个特征向量进行融合,最后使用支持向量机对融合后的特征向量进行分类识别。为了验证方法的有效性,以市场中常见的5种紫檀属树种的三个切面为研究对象,对这些木材树种进行了识别。实验结果显示,单独使用纹理特征的识别正确率最高为80.00%,单独使用光谱特征的识别正确率最高为94.40%,使用融合的特征最高的识别正确率可达99.20%。还将这5种木材树种与其他30种木材树种进行了混合,混合后的木材样本数量可达1750。实验进一步显示,该方法可以对包含紫檀属在内的35种木材的树种进行识别,其正确率可达98.29%。综上所述,木材的纹理特征和木材的光谱特征可以有效的相互补充,从而进一步提高识别正确率。最后还用所提出的方法与目前主流的方法进行了比较,结果发现所述的木材树种识别方法高于目前主流方法。  相似文献   

16.
A new feature extraction model, generalized perceptual linear prediction (gPLP), is developed to calculate a set of perceptually relevant features for digital signal analysis of animal vocalizations. The gPLP model is a generalized adaptation of the perceptual linear prediction model, popular in human speech processing, which incorporates perceptual information such as frequency warping and equal loudness normalization into the feature extraction process. Since such perceptual information is available for a number of animal species, this new approach integrates that information into a generalized model to extract perceptually relevant features for a particular species. To illustrate, qualitative and quantitative comparisons are made between the species-specific model, generalized perceptual linear prediction (gPLP), and the original PLP model using a set of vocalizations collected from captive African elephants (Loxodonta africana) and wild beluga whales (Delphinapterus leucas). The models that incorporate perceptional information outperform the original human-based models in both visualization and classification tasks.  相似文献   

17.
This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.  相似文献   

18.
In this work a simple technique to obtain information about the species of wood samples using stress-wave sounds in the audible range is presented. However, spectra of wood sounds generated by pendulum impact are very complex and feature extraction for classification purposes is very difficult. Polyspectral techniques have been successfully applied to several problems from radar pattern recognition to medical signal processing. Following this approach, convolution of four different sound impacts has been done. This permits to extract clear polyspectral features suitable for wood species recognition with possible applications to both human assisted and automatic wood identification systems with minimal intersample variability. Results indicate that using this technique only the two most intense polyspectral peaks are enough for species recognition.  相似文献   

19.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, the performance degrades significantly in noisy conditions. In this paper, a technique for noise compensation is proposed where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensation technique suppresses the effect of additive noise in speech. The robustness of the proposed features is further enhanced by the gain normalization technique. The normalized temporal envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are converted into modulation frequency features. These features are used in an automatic phoneme recognition task. Experiments are performed in mismatched train/test conditions where the test data are corrupted with various environmental distortions like telephone channel noise, additive noise, and room reverberation. Experiments are also performed on large amounts of real conversational telephone speech. In these experiments, the proposed features show substantial improvements in phoneme recognition rates compared to other speech analysis techniques. Furthermore, the contribution of various processing stages for robust speech signal representation is analyzed.  相似文献   

20.
A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets, syllabic peaks and dips, fricative onsets and offsets, and sonorant consonant onsets and offsets. Binary classifiers of the manner phonetic features-syllabic, sonorant and continuant-are used for probabilistic detection of these landmarks. The probabilistic framework exploits two properties of the acoustic cues of phonetic features-(1) sufficiency of acoustic cues of a phonetic feature for a probabilistic decision on that feature and (2) invariance of the acoustic cues of a phonetic feature with respect to other phonetic features. Probabilistic landmark sequences are constrained using manner class pronunciation models for isolated word recognition with known vocabulary. The performance of the system is compared with (1) the same probabilistic system but with mel-frequency cepstral coefficients (MFCCs), (2) a hidden Markov model (HMM) based system using APs and (3) a HMM based system using MFCCs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号