首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.  相似文献   

3.
Peta White   《Journal of voice》1999,13(4):570-582
High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children's spoken or sung vowel formants. Twenty-nine 11-year-old Swedish children were asked to produce 4 sustained spoken and sung vowels. In order to circumvent the problem of wide harmonic spacing, F1 and F2 measurements were taken from vowels produced with a sweeping F0. Experienced choir singers were selected as subjects in order to minimize the larynx height adjustments associated with pitch variation in less skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in girls than in boys suggesting longer vocal tracts in these preadolescent boys. Furthermore, formant scaling demonstrated vowel dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.  相似文献   

4.
The ability to recognize synthetic, two-formant vowels with equal duration and similar loudness was measured in five subjects with the Cochlear and five subjects with the Symbion cochlear implants. In one set of test stimuli, vowel pairs differed only in the first-formant frequency (F1). In another set, vowel pairs differed only in the second-formant frequency (F2). When F1 differed, four of five Cochlear subjects and four of five Symbion subjects recognized the vowels significantly above chance. When F2 differed, two of five Cochlear subjects and three of five Symbion subjects scored above chance. These results suggest that implanted subjects can utilize both "place" information across different electrodes and "rate" information on a single electrode to derive information about the spectral content of the stimulus.  相似文献   

5.
6.
The present investigation assessed the simultaneous and temporal masking produced by computer-generated synthetic vowels. The durations (100 and 200 ms) of each of four vowel-like maskers were employed. The masker was presented at 70 dB SPL. The probe signals were three filtered noise bursts whose spectral distributions corresponded to regions of high spectral energy in three English stop consonants. Quiet and masked thresholds were determined using the method of adjustment. Data are reported for two experienced listeners who participated in all the listening conditions. The results were generally in accord with the results of masking experiments using nonspeech signals in that both the frequency specificity of masking and temporal masking effects were demonstrated.  相似文献   

7.
This work investigated the measurement of vibrato and tremor extent values. Related works have not explored the possibility of measuring extent in the spectra of fundamental frequency (f(0)) low-frequency undulations. It is shown here that by canceling average (DC) values and baseline drifts of f(0) contours, as well as weighting the respective spectra by the time window DC value, extent measures can be promptly obtained in the frequency domain. The method is illustrated with measurements from synthetic and human data.  相似文献   

8.
9.
蒙古语七个元音声频特性计算机分析   总被引:4,自引:0,他引:4  
为了采取语音输入方式来实现蒙古语各类文字文献的自动转换(机器翻译),所选择的语音、语言模型的性能直接影响识别精度。为此对蒙古语各类语言文字的元音进行声学特性的分析和分类不仅对蒙古语语音的识别非常重要,而且对蒙古语的语言学、地方方言学的研究也有很重要的意义.本文介绍对蒙古语哈拉合、察合尔、卫拉特发音元音读音进行声频特性分析的结果以及利用语音模型识别方式验证所测定的不同发音共振峰频率分布情况。  相似文献   

10.
The ability of subjects to identify vowels in vibrotactile transformations of consonant-vowel syllables was measured for two types of displays: a spectral display (frequency by intensity), and a vocal tract area function display (vocal tract location by cross-sectional area). Both displays were presented to the fingertip via the tactile display of the Optacon transducer. In the first experiments the spectral display was effective for identifying vowels in /b/V/ context when as many as 24 or as few as eight spectral channels were presented to the skin. However, performance fell when the 12- and 8-channel displays were reduced in size to occupy 1/2 or 1/3 of the 24-row tactile matrix. The effect of reducing the size of the display was greater when the spectrum was represented as a solid histogram ("filled" patterns) than when it was represented as a simple spectral contour ("unfilled" patterns). Spatial masking within the filled pattern was postulated as the cause for this decline in performance. Another experiment measured the utility of the spectral display when the syllables were produced by multiple speakers. The resulting increase in response confusions was primarily attributable to variations in the tactile patterns caused by differences in vocal tract resonances among the speakers. The final experiment found an area function display to be inferior to the spectral display for identification of vowels. The results demonstrate that a two-dimensional spectral display is worthy of further development as a basic vibrotactile display for speech.  相似文献   

11.
12.
There is a significant body of research examining the intelligibility of sinusoidal replicas of natural speech. Discussion has followed about what the sinewave speech phenomenon might imply about the mechanisms underlying phonetic recognition. However, most of this work has been conducted using sentence material, making it unclear what the contributions are of listeners' use of linguistic constraints versus lower level phonetic mechanisms. This study was designed to measure vowel intelligibility using sinusoidal replicas of naturally spoken vowels. The sinusoidal signals were modeled after 300 /hVd/ syllables spoken by men, women, and children. Students enrolled in an introductory phonetics course served as listeners. Recognition rates for the sinusoidal vowels averaged 55%, which is much lower than the ~95% intelligibility of the original signals. Attempts to improve performance using three different training methods met with modest success, with post-training recognition rates rising by ~5-11 percentage points. Follow-up work showed that more extensive training produced further improvements, with performance leveling off at ~73%-74%. Finally, modeling work showed that a fairly simple pattern-matching algorithm trained on naturally spoken vowels classified sinewave vowels with 78.3% accuracy, showing that the sinewave speech phenomenon does not necessarily rule out template matching as a mechanism underlying phonetic recognition.  相似文献   

13.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.  相似文献   

14.
In an early experiment using synthetic speech, it was shown that raising or lowering the formants in an introductory sentence affected the identification of the vowel in a following test word [P. Ladefoged and D. Broadbent, J. Acoust. Soc. Am. 29, 98-104 (1957)]. This experiment has now been replicated using natural speech produced by a phonetician using two different overall settings of the vocal tract.  相似文献   

15.
This study examined the effects of mild-to-moderate sensorineural hearing loss on vowel perception abilities of young, hearing-impaired (YHI) adults. Stimuli were presented at a low conversational level with a flat frequency response (approximately 60 dB SPL), and in two gain conditions: (a) high level gain with a flat frequency response (95 dB SPL), and (b) frequency-specific gain shaped according to each listener's hearing loss (designed to simulate the frequency response provided by a linear hearing aid to an input signal of 60 dB SPL). Listeners discriminated changes in the vowels /I e E inverted-v ae/ when F1 or F2 varied, and later categorized the vowels. YHI listeners performed better in the two gain conditions than in the conversational level condition. Performances in the two gain conditions were similar, suggesting that upward spread of masking was not seen at these signal levels for these tasks. Results were compared with those from a group of elderly, hearing-impaired (EHI) listeners, reported in Coughlin, Kewley-Port, and Humes [J. Acoust. Soc. Am. 104, 3597-3607 (1998)]. Comparisons revealed no significant differences between the EHI and YHI groups, suggesting that hearing impairment, not age, is the primary contributor to decreased vowel perception in these listeners.  相似文献   

16.
Vowel intelligibility during singing is an important aspect of communication during performance. The intelligibility of isolated vowels sung by Western classically trained singers has been found to be relatively low, in fact, decreasing as pitch rises, and it is lower for women than for men. The lack of contextual cues significantly deteriorates vowel intelligibility. It was postulated in this study that the reduced intelligibility of isolated sung vowels may be partly from the vowels used by the singers in their daily vocalises. More specifically, if classically trained singers sang only a few American English vowels during their vocalises, their intelligibility for American English vowels would be less than for those classically trained singers who usually vocalize on most American English vowels. In this study, there were 21 subjects (15 women, 6 men), all Western classically trained performers as well as teachers of classical singing. They sang 11 words containing 11 different American English vowels, singing on two pitches a musical fifth apart. Subjects were divided into two groups, those who normally vocalize on 4, 5, or 6 vowels, and those who sing all 11 vowels during their daily vocalises. The sung words were cropped to isolate the vowels, and listening tapes were created. Two listening groups, four singing teachers and five speech-language pathologists, were asked to identify the vowels intended by the singers. Results suggest that singing fewer vowels during daily vocalises does not decrease intelligibility compared with singing the 11 American English vowels. Also, in general, vowel intelligibility was lower with the higher pitch, and vowels sung by the women were less intelligible than those sung by the men. Identification accuracy was about the same for the singing teacher listeners and the speech-language pathologist listeners except for the lower pitch, where the singing teachers were more accurate.  相似文献   

17.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds.  相似文献   

18.
19.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

20.
Perceived pitch of whispered vowels   总被引:2,自引:0,他引:2  
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号