首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
提出在参数的提取过程中用不同的感知规整因子对不同人的参数归一化,从而实现在非特定人语音识别中对不同人的归一化处理。感知规整因子是基于声门上和声门下之间耦合作用产生声门下共鸣频率来估算的,与采用声道第三共振峰作为基准频率的方法比较,它能较多的滤除语义信息的影响,更好地体现说话人的个性特征。本文提取抗噪性能优于Mel倒谱参数的感知最小方差无失真参数作为识别特征,语音模型用经典的隐马尔可夫模型(HMM)。实验证明,本文方法与传统的语音识别参数和用声道第三共振峰进行谱规整的方法相比,在干净语音中单词错误识别率分别下降了4%和3%,在噪声环境下分别下降了9%和5%,有效地改善了非特定人语音识别系统的性能。   相似文献   

2.
To reduce degradation in speech recognition due to varied characteristics of different speakers,a method of perceptual frequency warping based on subglottal resonances for speaker normalization is proposed.The warping factor is extracted from the second subglottal resonance using acoustic coupling between subglottis and vocal tract.The second subglottal resonance is independent of the speech content,which reflects the speaker characteristics more than the third formant.The perceptual minimum variation distortionless response(PMVDR) coefficient is normalized,which is more robust and has better anti-noise capability than MFCC. The normalized coefficients are used in the speech-mode training and speech recognition.Experiments show that the word error rate,as compared with MFCC and the spectrum warping by the third formant,decreases by 4%and 3%respectively in clean speech recognition,and by 9%and 5%respectively in a noisy environment.The results indicate that the proposed method can improve the word recognition accuracy in a speaker-independent recognition system.  相似文献   

3.
OBJECTIVE: To assess whether magnetic resonance imaging (MRI) allows the vocal tract (VT) area function to be determined for a normal male speaker. METHOD: VT shapes were acquired using MRI during sustained production of French points vowels: /i/, /a/, /u/. Cross-sectional areas were measured from a series of planes spaced at intervals of 1 cm along the length of the VT and were used as input in a previously described VT model to simulate the vowels. The first three formant frequencies, F1, F2, and F3, computed from the MRI-measured VT model were compared with subject's natural formant frequencies. RESULTS: Including piriform sinuses, calculated formants differed from measured formants F1, F2, and F3, respectively, for /i/ by -3.5%, +7.7%, and +27.5%; for /a/ by +11% +19.5%, and -4.3%; and for /u/ by +.9%, +23.4%, and +9.6%. Excluding piriform sinuses, calculated formants differed from measured formants F1, F2, and F3, respectively, for /i/ by -3.5%, +12%, and +28%, and for /u/ by +10.1%, +26.8%, and +13.7% The piriform sinuses were not visualized for /a/ on MRI. CONCLUSIONS: MRI is a noninvasive technique that allows VT imaging and determination of VT area function for a normal male speaker. Several possible sources of discrepancies are as follows: variability of the articulation, difficulties in assessment of VT wall boundaries, role of the piriform sinuses, and VT length.  相似文献   

4.
A comparison has been made of the transition properties of six types of speech synthesizer parameters: serial resonance, prediction coefficients, reflection coefficients, area functions, parallel resonance, and, finally, a simple set of articulatory parameters. The first four synthesizers are formally equivalent and can be made to produce identical steady-state sounds (targets). The last two involve approximations, but achieve similar targets. Formant paths between targets will differ according to the parameter type used during interpolation. Each type was tested on nonsense words spanning a wide range of parameter values. Linear interpolation of synthesizer parameters was used to determine a path between target values. The resultant data were then converted to formant values and plotted as a spectrographic (frequency versus time) representation. Small differences in formant frequency (versus linear transitions of formant frequency and bandwidth) were common, and some quite large differences in formant bandwidths were observed in certain cases.  相似文献   

5.
6.
Since its introduction, the Sundberg model of the laryngeal system as the resonance source of the singer's formant has gained wide acceptance. However, no studies directly testing this hypothesis in vivo have previously been reported. Thus, the present study was undertaken to test this hypothesis on three classically trained professional male singers. The vocal behaviors of the singer-subjects were evaluated during modal and pulse register phonation via magnetic resonance imaging, strobolaryngoscopy, and acoustic analysis. Results indicated the subjects did not achieve the laryngopharyngeal/laryngeal outlet cross-sectional area ratio requisite to the model and that the formant remained robust in pulse register phonation. It was concluded that these subjects' behaviors were not consistent with Sundberg's model and that the model was inadequate to account for the generation of the singer's formant in these three subjects.  相似文献   

7.
基于多带解调分析和瞬时频率估计的耳语音话者识别   总被引:4,自引:0,他引:4  
王敏  赵鹤鸣 《声学学报》2010,35(4):471-476
为了改善耳语音话者识别的稳健性,提出了一种基于调幅-调频(AM-FM)模型的耳语音特征参数,瞬时频率估计(IFE)。根据语音产生的共振峰调制理论,采用多带解调分析(MDA)获得语音的瞬时包络和频率;然后根据包络幅度和频率的加权估计,得到语音的特征IFE来描绘语音的频率结构。将该特征用于耳语话者识别并和传统的Mel倒谱系数(MFCC)进行了比较。实验结果表明,随着测试人数的增加,IFE的识别效果略好于MFCC;在测试信道改变的情况下,与MFCC相比IFE的稳健性得到了有效的提高。   相似文献   

8.
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.  相似文献   

9.
The acoustic effects of the laryngeal cavity on the vocal tract resonance were investigated by using vocal tract area functions for the five Japanese vowels obtained from an adult male speaker. Transfer functions were examined with the laryngeal cavity eliminated from the whole vocal tract, volume velocity distribution patterns were calculated, and susceptance matching analysis was performed between the laryngeal cavity and the vocal tract excluding the laryngeal cavity (vocal tract proper). It was revealed that the laryngeal cavity generates one of the formants of the vocal tract, which is the fourth in the present study. At this formant, the resonance of the laryngeal cavity (the 1/4 wavelength resonance) induces the open-tube resonance of the vocal tract proper (the 3/2 wavelength resonance). At the other formants, on the other hand, the vocal tract proper acts as a closed tube, because the laryngeal cavity has only a small contribution to generating these formants and the effective closed end of the whole vocal tract is the junction between the laryngeal cavity and the vocal tract proper.  相似文献   

10.
A technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value.  相似文献   

11.
Key voice features--fundamental frequency (F0) and formant frequencies--can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry are discussed as are implications for speech origins theories, and challenges to them, related to laryngeal position and vocal tract length.  相似文献   

12.
Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels.  相似文献   

13.
Study on the acoustical characteristic is important to speech and speaker recognition in Chinese whispered speech. In this paper, the characteristics of whispered speech are introduced and the acoustical characteristics in Chinese whispered speech are discussed. There is no fundamental frequency in the whispered speech, so other characteristics such as the duration and frequency of formant are extracted and analyzed. From experiments with six simple Chinese whispered vowels, it is proved that the duration and the frequency of formant can be used as the main acoustical characteristics in the Chinese whispered recognition.  相似文献   

14.
Over the last few decades, researchers have been investigating the mechanisms involved in speech production. Image analysis can be a valuable aid in the understanding of the morphology of the vocal tract. The application of magnetic resonance imaging to study these mechanisms has been proven to be reliable and safe. We have applied deformable models in magnetic resonance images to conduct an automatic study of the vocal tract; mainly, to evaluate the shape of the vocal tract in the articulation of some European Portuguese sounds, and then to successfully automatically segment the vocal tract's shape in new images. Thus, a point distribution model has been built from a set of magnetic resonance images acquired during artificially sustained articulations of 21 sounds, which successfully extracts the main characteristics of the movements of the vocal tract. The combination of that statistical shape model with the gray levels of its points is subsequently used to build active shape models and active appearance models. Those models have then been used to segment the modeled vocal tract into new images in a successful and automatic manner. The computational models have thus been revealed to be useful for the specific area of speech simulation and rehabilitation, namely to simulate and recognize the compensatory movements of the articulators during speech production.  相似文献   

15.
This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speaker's vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal scans of repetitive consonant-vowel (CV) syllable production. For reference, high-quality acoustic recordings of the speech material are also available. The raw data are made freely available for research purposes.  相似文献   

16.
Five premier male country singers involved in our previous studies spoke and sang the words of both the national anthem and a country song of their choice. Long-term-average spectra were made of the spoken and sung material of each singer. The spectral characteristics of county singers' speech and singing were similar. A prominent peak in the upper part of the spectrum, previously described as the "speaker's formant," was found in the county singers' speech and singing. The singer's formant, a strong spectral peak near 2.8 kHz, an important part of the spectrum of classically trained singers, was not found in the spectra of the country singers. The results support the conclusion that the resonance characteristics in speech and singing are similar in country singing and that county singing is not characterized by a singer's formant.  相似文献   

17.
The effect of speaking rate variations on second formant (F2) trajectories was investigated for a continuum of rates. F2 trajectories for the schwa preceding a voiced bilabial stop, and one of three target vocalic nuclei following the stop, were generated for utterances of the form "Put a bV here, where V was /i/,/ae/ or /oI/. Discrete spectral measures at the vowel-consonant and consonant-vowel interfaces, as well as vowel target values, were examined as potential parameters of rate variation; several different whole-trajectory analyses were also explored. Results suggested that a discrete measure at the vowel consonant (schwa-consonant) interface, the F2off value, was in many cases a good index of rate variation, provided the rates were not unusually slow (vowel durations less than 200 ms). The relationship of the spectral measure at the consonant-vowel interface, F2 onset, as well as that of the "target" for this vowel, was less clearly related to rate variation. Whole-trajectory analyses indicated that the rate effect cannot be captured by linear compressions and expansions of some prototype trajectory. Moreover, the effect of rate manipulation on formant trajectories interacts with speaker and vocalic nucleus type, making it difficult to specify general rules for these effects. However, there is evidence that a small number of speaker strategies may emerge from a careful qualitative and quantitative analysis of whole formant trajectories. Results are discussed in terms of models of speech production and a group of speech disorders that is usually associated with anomalies of speaking rate, and hence of formant frequency trajectories.  相似文献   

18.
The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.  相似文献   

19.
Auditory feedback influences human speech production, as demonstrated by studies using rapid pitch and loudness changes. Feedback has also been investigated using the gradual manipulation of formants in adaptation studies with whispered speech. In the work reported here, the first formant of steady-state isolated vowels was unexpectedly altered within trials for voiced speech. This was achieved using a real-time formant tracking and filtering system developed for this purpose. The first formant of vowel /epsilon/ was manipulated 100% toward either /ae/ or /I/, and participants responded by altering their production with average Fl compensation as large as 16.3% and 10.6% of the applied formant shift, respectively. Compensation was estimated to begin <460 ms after stimulus onset. The rapid formant compensations found here suggest that auditory feedback control is similar for both F0 and formants.  相似文献   

20.
An extensive developmental acoustic study of the speech patterns of children and adults was reported by Lee and colleagues [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. This paper presents a reexamination of selected fundamental frequency and formant frequency data presented in their report for ten monophthongs by investigating sex-specific and developmental patterns using two different approaches. The first of these includes the investigation of age- and sex-specific formant frequency patterns in the monophthongs. The second, the investigation of fundamental frequency and formant frequency data using the critical band rate (bark) scale and a number of acoustic-phonetic dimensions of the monophthongs from an age- and sex-specific perspective. These acoustic-phonetic dimensions include: vowel spaces and distances from speaker centroids; frequency differences between the formant frequencies of males and females; vowel openness/closeness and frontness/backness; the degree of vocal effort; and formant frequency ranges. Both approaches reveal both age- and sex-specific development patterns which also appear to be dependent on whether vowels are peripheral or nonperipheral. The developmental emergence of these sex-specific differences are discussed with reference to anatomical, physiological, sociophonetic, and culturally determined factors. Some directions for further investigation into the age-linked sex differences in speech across the lifespan are also proposed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号