首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
语音中元音和辅音的听觉感知研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文对语音中元音和辅音的听觉感知研究进行综述。80多年前基于无意义音节的权威实验结果表明辅音对人的听感知更为重要,由于实验者在学术上的成就和权威性,这一结论成为了常识,直到近20年前基于自然语句的实验挑战了这个结论并引发了新一轮的研究。本文主要围绕元音和辅音对语音感知的相对重要性、元音和辅音的稳态信息和边界动态信息对语音感知的影响以及相关研究的潜在应用等进行较为系统的介绍,最后给出了总结与展望。  相似文献   

3.
4.
Three experiments examined the ability of listeners to identify steady-state synthetic vowel-like sounds presented concurrently in pairs to the same ear. Experiment 1 confirmed earlier reports that listeners identify the constituents of such pairs more accurately when they differ in fundamental frequency (f0) by about a half semitone or more, compared to the condition where they have the same f0. When the constituents have different f0's, corresponding harmonics of the two vowels are misaligned in frequency and corresponding pitch periods are asynchronous in time. These differences provide cues that might aid identification. Experiments 2 and 3 determined whether listeners can use these cues, divorced from a difference in f0, to improve their accuracy of identification. Harmonic misalignment was beneficial when the constituents had an f0 of 200 Hz so that the harmonics of each constituent were well separated in frequency. Pitch-period asynchrony was beneficial when the constituents had an f0 of 50 Hz so that the onsets of the pitch periods of each constituent were well separated in time. Neither cue was beneficial when both constituents had an f0 of 100 Hz. It is unlikely, therefore, that either cue contributed to the improvement in performance found in Experiment 1 where the constituents were given different f0's close to 100 Hz. Rather, it is argued that performance improved in Experiment 1 primarily because the two f0's specified two pitches that could be used to segregate the contributions of each vowel in the composite waveform.  相似文献   

5.
Vowel matching and identification experiments were carried out to investigate the perceptual contribution of harmonics in the first formant region of synthetic front vowels. In the first experiment, listeners selected the best phonetic match from an F1 continuum, for reference stimuli in which a band of two to five adjacent harmonics of equal intensity replaced the F1 peak; F1 values of best matches were near the frequency of the highest frequency harmonic in the band. Attenuation of the highest harmonic in the band resulted in lower F1 matches. Attenuation of the lowest harmonic had no significant effects, except in the case of a 2-harmonic band, where higher F1 matches were selected. A second experiment investigated the shifts in matched F1 resulting from an intensity increment to either one of a pair of harmonics in the F1 region. These shifts were relatively invariant over different harmonic frequencies and proportional to the fundamental frequency. A third experiment used a vowel identification task to determine phoneme boundaries on an F1 continuum. These boundaries were not substantially altered when the stimuli comprised only the two most prominent harmonics in the F1 region, or these plus either the higher or lower frequency subset of the remaining F1 harmonics. The results are consistent with an estimation procedure for the F1 peak which assigns greatest weight to the two most prominent harmonics in the first formant region.  相似文献   

6.
7.
In this study we assessed age-related differences in the perception and production of American English (AE) vowels by native Mandarin speakers as a function of the amount of exposure to the target language. Participants included three groups of native Mandarin speakers: 87 children, adolescents and young adults living in China, 77 recent arrivals who had lived in the U.S. for two years or less, and 54 past arrivals who had lived in the U.S. between three and five years. The latter two groups arrived in the U.S. between the ages of 7 and 44 years. Discrimination of six AE vowel pairs /i-i/, /i-e(I)/, /e-ae/, /ae-a/, /a-(symbol see text)/, and /u-a/ was assessed with a categorial AXB task. Production of the eight vowels /i, i, e(I), e, ae, (symbol see text), a, u/ was assessed with an immediate imitation task. Age-related differences in performance accuracy changed from an older-learner advantage among participants in China, to no age differences among recent arrivals, and to a younger-learner advantage among past arrivals. Performance on individual vowels and vowel contrasts indicated the influence of the Mandarin phonetic/phonological system. These findings support a combined environmental and L1 interference/transfer theory as an explanation of the long-term younger-learner advantage in mastering L2 phonology.  相似文献   

8.
The ability of subjects to identify vowels in vibrotactile transformations of consonant-vowel syllables was measured for two types of displays: a spectral display (frequency by intensity), and a vocal tract area function display (vocal tract location by cross-sectional area). Both displays were presented to the fingertip via the tactile display of the Optacon transducer. In the first experiments the spectral display was effective for identifying vowels in /b/V/ context when as many as 24 or as few as eight spectral channels were presented to the skin. However, performance fell when the 12- and 8-channel displays were reduced in size to occupy 1/2 or 1/3 of the 24-row tactile matrix. The effect of reducing the size of the display was greater when the spectrum was represented as a solid histogram ("filled" patterns) than when it was represented as a simple spectral contour ("unfilled" patterns). Spatial masking within the filled pattern was postulated as the cause for this decline in performance. Another experiment measured the utility of the spectral display when the syllables were produced by multiple speakers. The resulting increase in response confusions was primarily attributable to variations in the tactile patterns caused by differences in vocal tract resonances among the speakers. The final experiment found an area function display to be inferior to the spectral display for identification of vowels. The results demonstrate that a two-dimensional spectral display is worthy of further development as a basic vibrotactile display for speech.  相似文献   

9.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.  相似文献   

10.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds.  相似文献   

11.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

12.
Perceived pitch of whispered vowels   总被引:2,自引:0,他引:2  
  相似文献   

13.
A series of experiments on the detectability of vowels in isolation has been completed. Stimuli consisted of three sets of ten vowels: one synthetic, one from a male talker, and one from a female talker. Vowel durations ranged from 20-160 ms for each of the sets. Thresholds for detecting the vowels in isolation were obtained from well-trained, normal-hearing listeners using an adaptive-tracking paradigm. For a given duration, detection thresholds for vowels calibrated for equal rms sound pressure at the earphones differed by 22 dB across the 30 vowels. In addition, an orderly decrease in vowel thresholds was obtained for increased duration, as predicted from previous data on temporal integration. Several different analyses were performed in an attempt to explain the differential detectability across the 30 vowels. Analyses accounting for audibility reduced threshold variability significantly, but vowel thresholds still ranged over 15 dB. Vowel spectra were subsequently modeled as excitation patterns, and several detection hypotheses were examined. A simple average of excitation levels across excited critical bands provided the best prediction of the level variations needed to maintain threshold-level loudness across all vowels.  相似文献   

14.
15.
This paper seeks to characterize the nature, size, and range of acoustic amplitude variation in naturally produced coarticulated vowels in order to determine its potential contribution and relevance to vowel perception. The study is a partial replication and extension of the pioneering work by House and Fairbanks [J. Acoust. Soc. Am. 22, 105-113 (1953)], who reported large variation in vowel amplitude as a function of consonantal context. Eight American English vowels spoken by men and women were recorded in ten symmetrical CVC consonantal contexts. Acoustic amplitude measures included overall rms amplitude, amplitude of the rms peak along with its relative location in the CVC-word, and the amplitudes of individual formants F1-F4 along with their frequencies. House and Fairbanks' amplitude results were not replicated: Neither the overall rms nor the rms peak varied appreciably as a function of consonantal context. However, consonantal context was shown to affect significantly and systematically the amplitudes of individual formants at the vowel nucleus. These effects persisted in the auditory representation of the vowel signal. Auditory spectra showed that the pattern of spectral amplitude variation as a function of contextual effects may still be encoded and represented at early stages of processing by the peripheral auditory system.  相似文献   

16.
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.  相似文献   

17.
18.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

19.
This paper investigated how foreign-accented stress cues affect on-line speech comprehension in British speakers of English. While unstressed English vowels are usually reduced to /?/, Dutch speakers of English only slightly centralize them. Speakers of both languages differentiate stress by suprasegmentals (duration and intensity). In a cross-modal priming experiment, English listeners heard sentences ending in monosyllabic prime fragments--produced by either an English or a Dutch speaker of English--and performed lexical decisions on visual targets. Primes were either stress-matching ("ab" excised from absurd), stress-mismatching ("ab" from absence), or unrelated ("pro" from profound) with respect to the target (e.g., ABSURD). Results showed a priming effect for stress-matching primes only when produced by the English speaker, suggesting that vowel quality is a more important cue to word stress than suprasegmental information. Furthermore, for visual targets with word-initial secondary stress that do not require vowel reduction (e.g., CAMPAIGN), resembling the Dutch way of realizing stress, there was a priming effect for both speakers. Hence, our data suggest that Dutch-accented English is not harder to understand in general, but it is in instances where the language-specific implementation of lexical stress differs across languages.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号