首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

2.
This paper seeks to characterize the nature, size, and range of acoustic amplitude variation in naturally produced coarticulated vowels in order to determine its potential contribution and relevance to vowel perception. The study is a partial replication and extension of the pioneering work by House and Fairbanks [J. Acoust. Soc. Am. 22, 105-113 (1953)], who reported large variation in vowel amplitude as a function of consonantal context. Eight American English vowels spoken by men and women were recorded in ten symmetrical CVC consonantal contexts. Acoustic amplitude measures included overall rms amplitude, amplitude of the rms peak along with its relative location in the CVC-word, and the amplitudes of individual formants F1-F4 along with their frequencies. House and Fairbanks' amplitude results were not replicated: Neither the overall rms nor the rms peak varied appreciably as a function of consonantal context. However, consonantal context was shown to affect significantly and systematically the amplitudes of individual formants at the vowel nucleus. These effects persisted in the auditory representation of the vowel signal. Auditory spectra showed that the pattern of spectral amplitude variation as a function of contextual effects may still be encoded and represented at early stages of processing by the peripheral auditory system.  相似文献   

3.
Static, dynamic, and relational properties in vowel perception   总被引:2,自引:0,他引:2  
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.  相似文献   

4.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

5.
The amount of acoustic information that native and non-native listeners need for syllable identification was investigated by comparing the performance of monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Duration-preserved silent-center syllables retaining 10, 20, 30, or 40 ms of the consonant-vowel and vowel-consonant transitions were created for the target vowels /i, I, eI, epsilon, ae/ and /a/, spoken by two males in /bVb/ context. Duration-neutral syllables were created by editing the silent portion to equate the duration of all vowels. Listeners identified the syllables in a six-alternative forced-choice task. The earlier learners identified the whole-word and 40 ms duration-preserved syllables as accurately as the monolingual listeners, but identified the silent-center syllables significantly less accurately overall. Only the monolingual listener group identified syllables significantly more accurately in the duration-preserved than in the duration-neutral condition, suggesting that the non-native listeners were unable to recover from the syllable disruption sufficiently to access the duration cues in the silent-center syllables. This effect was most pronounced for the later learners, who also showed the most vowel confusions and the greatest decrease in performance from the whole word to the 40 ms transition condition.  相似文献   

6.
Amplitude change at consonantal release has been proposed as an invariant acoustic property distinguishing between the classes of stops and glides [Mack and Blumstein, J. Acoust. Soc. Am. 73, 1739-1750 (1983)]. Following procedures of Mack and Blumstein, we measured the amplitude change in the vicinity of the consonantal release for two speakers. The results for one speaker matched those of Mack and Blumstein, while those for the second speaker showed some differences. In a subsequent experiment, we tested the hypothesis that a difference in amplitude change serves as an invariant perceptual cue for distinguishing between continuants and noncontinuants, and more specifically, as a critical cue for identifying stops and glides [Shinn and Blumstein, J. Acoust. Soc. Am. 75, 1243-1252 (1984)]. Interchanging the amplitude envelopes of natural /bV/ and /wV/ syllables containing the same vowel had little effect on perception: 97% of all syllables were identified as originally produced. Thus, although amplitude change in the vicinity of consonantal release may distinguish acoustically between stops and glides with some consistency, the change is not fully invariant, and certainly does not seem to be a critical perceptual cue in natural speech.  相似文献   

7.
The conditions under which listeners do and do not compensate for coarticulatory vowel nasalization were examined through a series of experiments of listeners' perception of naturally produced American English oral and nasal vowels spliced into three contexts: oral (C_C), nasal (N_N), and isolation. Two perceptual paradigms, a rating task in which listeners judged the relative nasality of stimulus pairs and a 4IAX discrimination task in which listeners judged vowel similarity, were used with two listener groups, native English speakers and native Thai speakers. Thai and English speakers were chosen because their languages differ in the temporal extent of anticipatory vowel nasalization. Listeners' responses were highly context dependent. For both perceptual paradigms and both language groups, listeners were less accurate at judging vowels in nasal than in non-nasal (oral or isolation) contexts; nasal vowels in nasal contexts were the most difficult to judge. Response patterns were generally consistent with the hypothesis that, given an appropriate and detectable nasal consonant context, listeners compensate for contextual vowel nasalization and attribute the acoustic effects of the nasal context to their coarticulatory source. However, the results also indicated that listeners do not hear nasal vowels in nasal contexts as oral; listeners retained some sensitivity to vowel nasalization in all contexts, indicating partial compensation for coarticulatory vowel nasalization. Moreover, there were small but systematic differences between the native Thai- and native English-speaking groups. These differences are as expected if perceptual compensation is partial and the extent of compensation is linked to patterns of coarticulatory nasalization in the listeners' native language.  相似文献   

8.
The purpose of this study was to provide data on the ability of elderly listeners to estimate the age group of women (25-35, 45-55, 70-80) from phonated and whispered vowel productions. Further, comparisons were made between the performance of these elderly listeners and results for young listeners reported previously [S. E. Linville and H. Fisher, J. Acoust. Soc. Am. 78, 40-48 (1985)]. Tape recordings of whispered and normally phonated /ae/ vowels were played to 23 elderly women for relative age judgments. Results suggest that elderly women are not as accurate as young women in estimating age from sustained vowel productions, although the two listener groups tend to categorize individual speakers similarly. Further, it appears that listener age is a factor in acoustic cues used in making age judgments.  相似文献   

9.
Multichannel cochlear implant users vary greatly in their word-recognition abilities. This study examined whether their word recognition was related to the use of either highly dynamic or relatively steady-state vowel cues contained in /bVb/ and /wVb/ syllables. Nine conditions were created containing different combinations of formant transition, steady-state, and duration cues. Because processor strategies differ, the ability to perceive static and dynamic information may depend on the type of cochlear implant used. Ten Nucleus and ten Ineraid subjects participated, along with 12 normal-hearing control subjects. Vowel identification did not differ between implanted groups, but both were significantly poorer at identifying vowels than the normal-hearing group. Vowel identification was best when at least two kinds of cues were available. Using only one type of cue, performance was better with excised vowels containing steady-state formants than in "vowelless" syllables, where the center vocalic portion was deleted and transitions were joined. In the latter syllable type, Nucleus subjects identified vowels significantly better when /b/ was the initial consonant; the other two groups were not affected by specific consonantal context. Cochlear implant subjects' word-recognition was positively correlated with the use of dynamic vowel cues, but not with steady-state cues.  相似文献   

10.
F1 structure provides information for final-consonant voicing   总被引:1,自引:0,他引:1  
Previous research has shown that F1 offset frequencies are generally lower for vowels preceding voiced consonants than for vowels preceding voiceless consonants. Furthermore, it has been shown that listeners use these differences in offset frequency in making judgments about final-consonant voicing. A recent production study [W. Summers, J. Acoust. Soc. Am. 82, 847-863 (1987)] reported that F1 frequency differences due to postvocalic voicing are not limited to the final transition or offset region of the preceding vowel. Vowels preceding voiced consonants showed lower F1 onset frequencies and lower F1 steady-state frequencies than vowels preceding voiceless consonants. The present study examined whether F1 frequency differences in the initial transition and steady-state regions of preceding vowels affect final-consonant voicing judgments in perception. The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.  相似文献   

11.
This study examined the effects of mild-to-moderate sensorineural hearing loss on vowel perception abilities of young, hearing-impaired (YHI) adults. Stimuli were presented at a low conversational level with a flat frequency response (approximately 60 dB SPL), and in two gain conditions: (a) high level gain with a flat frequency response (95 dB SPL), and (b) frequency-specific gain shaped according to each listener's hearing loss (designed to simulate the frequency response provided by a linear hearing aid to an input signal of 60 dB SPL). Listeners discriminated changes in the vowels /I e E inverted-v ae/ when F1 or F2 varied, and later categorized the vowels. YHI listeners performed better in the two gain conditions than in the conversational level condition. Performances in the two gain conditions were similar, suggesting that upward spread of masking was not seen at these signal levels for these tasks. Results were compared with those from a group of elderly, hearing-impaired (EHI) listeners, reported in Coughlin, Kewley-Port, and Humes [J. Acoust. Soc. Am. 104, 3597-3607 (1998)]. Comparisons revealed no significant differences between the EHI and YHI groups, suggesting that hearing impairment, not age, is the primary contributor to decreased vowel perception in these listeners.  相似文献   

12.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

13.
Coarticulatory influences on the perceived height of nasal vowels   总被引:1,自引:0,他引:1  
Certain of the complex spectral effects of vowel nasalization bear a resemblance to the effects of modifying the tongue or jaw position with which the vowel is produced. Perceptual evidence suggests that listener misperceptions of nasal vowel height arise as a result of this resemblance. Whereas previous studies examined isolated nasal vowels, this research focused on the role of phonetic context in shaping listeners' judgments of nasal vowel height. Identification data obtained from native American English speakers indicated that nasal coupling does not necessarily lead to listener misperceptions of vowel quality when the vowel's nasality is coarticulatory in nature. The perceived height of contextually nasalized vowels (in a [bVnd] environment) did not differ from that of oral vowels (in a [bVd] environment) produced with the same tongue-jaw configuration. In contrast, corresponding noncontextually nasalized vowels (in a [bVd] environment) were perceived as lower in quality than vowels in the other two conditions. Presumably the listeners' lack of experience with distinctive vowel nasalization prompted them to resolve the spectral effects of noncontextual nasalization in terms of tongue or jaw height, rather than velic height. The implications of these findings with respect to sound changes affecting nasal vowel height are also discussed.  相似文献   

14.
This study examines the neural representation of the vowel /epsilon/ in the auditory nerve of acoustically traumatized cats and asks whether spectral modifications of the vowel can restore a normal neural representation. Four variants of /epsilon/, which differed primarily in the frequency of the second formant (F2), were used as stimuli. Normally, the rate-place code provides a robust representation of F2 for these vowels, in the sense that rate changes encode changes in F2 frequency [Conley and Keilson, J. Acoust. Soc. Am. 98, 3223 (1995)]. This representation is lost after acoustic trauma [Miller et al., J. Acoust. Soc. Am. 105, 311 (1999)]. Here it is shown that an improved representation of the F2 frequency can be gained by a form of high-frequency emphasis that is determined by both the hearing-loss profile and the spectral envelope of the vowel. Essentially, the vowel was high-pass filtered so that the F2 and F3 peaks were amplified without amplifying frequencies in the trough between F1 and F2. This modification improved the quality of the rate and temporal tonotopic representations of the vowel and restored sensitivity to the F2 frequency. Although a completely normal representation was not restored, this method shows promise as an approach to hearing-aid signal processing.  相似文献   

15.
This study investigated the extent to which adult Japanese listeners' perceived phonetic similarity of American English (AE) and Japanese (J) vowels varied with consonantal context. Four AE speakers produced multiple instances of the 11 AE vowels in six syllabic contexts /b-b, b-p, d-d, d-t, g-g, g-k/ embedded in a short carrier sentence. Twenty-four native speakers of Japanese were asked to categorize each vowel utterance as most similar to one of 18 Japanese categories [five one-mora vowels, five two-mora vowels, plus/ei, ou/ and one-mora and two-mora vowels in palatalized consonant CV syllables, C(j)a(a), C(j)u(u), C(j)o(o)]. They then rated the "category goodness" of the AE vowel to the selected Japanese category on a seven-point scale. None of the 11 AE vowels was assimilated unanimously to a single J response category in all context/speaker conditions; consistency in selecting a single response category ranged from 77% for /eI/ to only 32% for /ae/. Median ratings of category goodness for modal response categories were somewhat restricted overall, ranging from 5 to 3. Results indicated that temporal assimilation patterns (judged similarity to one-mora versus two-mora Japanese categories) differed as a function of the voicing of the final consonant, especially for the AE vowels, /see text/. Patterns of spectral assimilation (judged similarity to the five J vowel qualities) of /see text/ also varied systematically with consonantal context and speakers. On the basis of these results, it was predicted that relative difficulty in the identification and discrimination of AE vowels by Japanese speakers would vary significantly as a function of the contexts in which they were produced and presented.  相似文献   

16.
A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.  相似文献   

17.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

18.
The perception of breathiness in vowels is cued by multiple acoustic cues, including changes in aspiration noise (AH) and the open quotient (OQ) [Klatt and Klatt, J. Acoust. Soc. Am. 87(2), 820-857 (1990)]. A loudness model can be used to determine the extent to which AH masks the harmonic components in voice. The resulting "partial loudness" (PL) and loudness of AH ["noise loudness" (NL)] have been shown to be good predictors of perceived breathiness [Shrivastav and Sapienza, J. Acoust. Soc. Am. 114(1), 2217-2224 (2003)]. The levels of AH and OQ were systematically manipulated for ten synthetic vowels. Perceptual judgments of breathiness were obtained and regression functions to predict breathiness from the ratio of NL to PL (η) were derived. Results show that breathiness can be modeled as a power function of η. The power parameter of this function appears to be affected by the fundamental frequency of the vowel. A second experiment was conducted to determine if the resulting power function could estimate breathiness in a different set of voices. The breathiness of these stimuli, both natural and synthetic, was determined in a listening test. The model estimates of breathiness were highly correlated with perceptual data but the absolute predicted values showed some discrepancies.  相似文献   

19.
Monolingual Peruvian Spanish listeners identified natural tokens of the Canadian French (CF) and Canadian English (CE) /?/ and /?/, produced in five consonantal contexts. The results demonstrate that while the CF vowels were mapped to two different native vowels, /e/ and /a/, in all consonantal contexts, the CE contrast was mapped to the single native vowel /a/ in four out of five contexts. Linear discriminant analysis revealed that acoustic similarity between native and target language vowels was a very good predictor of context-specific perceptual mappings. Predictions are made for Spanish learners of the /?/-/?/ contrast in CF and CE.  相似文献   

20.
Naive listeners' perceptual assimilations of non-native vowels to first-language (L1) categories can predict difficulties in the acquisition of second-language vowel systems. This study demonstrates that listeners having two slightly different dialects as their L1s can differ in the perception of foreign vowels. Specifically, the study shows that Bohemian Czech and Moravian Czech listeners assimilate Dutch high front vowels differently to L1 categories. Consequently, the listeners are predicted to follow different paths in acquiring these Dutch vowels. These findings underscore the importance of carefully considering the specific dialect background of participants in foreign- and second-language speech perception studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号