首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The conditions under which listeners do and do not compensate for coarticulatory vowel nasalization were examined through a series of experiments of listeners' perception of naturally produced American English oral and nasal vowels spliced into three contexts: oral (C_C), nasal (N_N), and isolation. Two perceptual paradigms, a rating task in which listeners judged the relative nasality of stimulus pairs and a 4IAX discrimination task in which listeners judged vowel similarity, were used with two listener groups, native English speakers and native Thai speakers. Thai and English speakers were chosen because their languages differ in the temporal extent of anticipatory vowel nasalization. Listeners' responses were highly context dependent. For both perceptual paradigms and both language groups, listeners were less accurate at judging vowels in nasal than in non-nasal (oral or isolation) contexts; nasal vowels in nasal contexts were the most difficult to judge. Response patterns were generally consistent with the hypothesis that, given an appropriate and detectable nasal consonant context, listeners compensate for contextual vowel nasalization and attribute the acoustic effects of the nasal context to their coarticulatory source. However, the results also indicated that listeners do not hear nasal vowels in nasal contexts as oral; listeners retained some sensitivity to vowel nasalization in all contexts, indicating partial compensation for coarticulatory vowel nasalization. Moreover, there were small but systematic differences between the native Thai- and native English-speaking groups. These differences are as expected if perceptual compensation is partial and the extent of compensation is linked to patterns of coarticulatory nasalization in the listeners' native language.  相似文献   

2.
For each of five vowels [i e a o u] following [t], a continuum from non-nasal to nasal was synthesized. Nasalization was introduced by inserting a pole-zero pair in the vicinity of the first formant in an all-pole transfer function. The frequencies and spacing of the pole and zero were systematically varied to change the degree of nasalization. The selection of stimulus parameters was determined from acoustic theory and the results of pilot experiments. The stimuli were presented for identification and discrimination to listeners whose language included a non-nasal--nasal vowel opposition (Gujarati, Hindi, and Bengali) and to American listeners. There were no significant differences between language groups in the 50% crossover points of the identification functions. Some vowels were more influenced by range and context effects than were others. The language groups showed some differences in the shape of the discrimination functions for some vowels. On the basis of the results, it is postulated that (1) there is a basic acoustic property of nasality, independent of the vowel, to which the auditory system responds in a distinctive way regardless of language background; and (2) there are one or more additional acoustic properties that may be used to various degrees in different languages to enhance the contrast between a nasal vowel and its non-nasal congener. A proposed candidate for the basic acoustic property is a measure of the degree of prominence of the spectral peak in the vicinity of the first formant. Additional secondary properties include shifts in the center of gravity of the low-frequency spectral prominence, leading to a change in perceived vowel height, and changes in overall spectral balance.  相似文献   

3.
This study explored how across-talker differences influence non-native vowel perception. American English (AE) and Korean listeners were presented with recordings of 10 AE vowels in /bVd/ context. The stimuli were mixed with noise and presented for identification in a 10-alternative forced-choice task. The two listener groups heard recordings of the vowels produced by 10 talkers at three signal-to-noise ratios. Overall the AE listeners identified the vowels 22% more accurately than the Korean listeners. There was a wide range of identification accuracy scores across talkers for both AE and Korean listeners. At each signal-to-noise ratio, the across-talker intelligibility scores were highly correlated for AE and Korean listeners. Acoustic analysis was conducted for 2 vowel pairs that exhibited variable accuracy across talkers for Korean listeners but high identification accuracy for AE listeners. Results demonstrated that Korean listeners' error patterns for these four vowels were strongly influenced by variability in vowel production that was within the normal range for AE talkers. These results suggest that non-native listeners are strongly influenced by across-talker variability perhaps because of the difficulty they have forming native-like vowel categories.  相似文献   

4.
There is extensive evidence that in the same phonetic environment the voice fundamental frequency (Fo) of vowels varies directly with vowel "height." This Fo difference between vowels could be caused by acoustic interaction between the first vowel formant and the vibrating vocal folds. Since higher vowels have lower first formants than low vowels the acoustic interaction should be greatest for high vowels whose first formant frequencies are closer in frequency to Fo. Ten speakers were used to see if acoustic interaction could cause the Fo differences. The consonant [m] was recorded in the utterances [umu] and [ama]. Although the formant structure of [m] in [umu] and [ama] should not differ significantly, the Fo of each [m] allophone was significantly different. However, the Fo of each [m] allophone did not differ significantly from the Fo of the following vowel. These results did not support acoustic interaction. However, it is quite reasonable to conclude that the Fo variation of [m] was caused by coarticulatory anticipation of the tongue and jaw for the following vowel. Another experiment is offered in order to help explain the physical causes of intrinsic vowel Fo. In this experiment Fo lowering was found at the beginning of vowels following Arabic pharyngeal approximants. This finding indicates that the Fo of pharyngeal constricting vowels, e.g., [ae] and [a], might be lowered as a result of similar articulary movements, viz. tongue compression and active pharyngeal constriction.  相似文献   

5.
In vowel perception, nasalization and height (the inverse of the first formant, F1) interact. This paper asks whether the interaction results from a sensory process, decision mechanism, or both. Two experiments used vowels varying in height, degree of nasalization, and three other stimulus parameters: the frequency region of F1, the location of the nasal pole/zero complex relative to F1, and whether a consonant following the vowel was oral or nasal. A fixed-classification experiment, designed to estimate basic sensitivity between stimuli, measured accuracy for discriminating stimuli differing in F1, in nasalization, and on both dimensions. A configuration derived by a multidimensional scaling analysis revealed a perceptual interaction that was stronger for stimuli in which the nasal pole/zero complex was below rather than above the oral pole, and that was present before both nasal and oral consonants. Phonetic identification experiments, designed to measure trading relations between the two dimensions, required listeners to identify height and nasalization in vowels varying in both. Judgments of nasalization depended on F1 as well as on nasalization, whereas judgments of height depended primarily on F1, and on nasalization more when the nasal complex was below than above the oral pole. This pattern was interpreted as a decision-rule interaction that is distinct from the interaction in basic sensitivity. Final consonant nasality had little effect in the classification experiment; in the identification experiment, nasal judgments were more likely when the following consonant was nasal.  相似文献   

6.
In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities. Analysis of the susceptance plots of the pharyngeal and oral cavities, -(B(p)+B(o)), and the nasal cavity, B(n), helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated.  相似文献   

7.
The goal of this study is to investigate coarticulatory resistance and aggressiveness for the jaw in Catalan consonants and vowels and, more specifically, for the alveolopalatal nasal //[symbol see text]/ and for dark /l/ for which there is little or no data on jaw position and coarticulation. Jaw movement data for symmetrical vowel-consonant-vowel sequences with the consonants /p, n, l, s, ∫, [ symbol see text], k/ and the vowels /i, a, u/ were recorded by three Catalan speakers with a midsagittal magnetometer. Data reveal that jaw height is greater for /s, ∫/ than for /p, [see text]/, which is greater than for /n, l, k/ during the consonant, and for /i, u/ than for /a/ during the vowel. Differences in coarticulatory variability among consonants and vowels are inversely related to differences in jaw height, i.e., fricatives and high vowels are most resistant, and /n, l, k/ and the low vowel are least resistant. Moreover, coarticulation resistant phonetic segments exert more prominent effects and, thus, are more aggressive than segments specified for a lower degree of coarticulatory resistance. Data are discussed in the light of the degree of articulatory constraint model of coarticulation.  相似文献   

8.
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception.  相似文献   

9.
There is increasing evidence that fine articulatory adjustments are made by speakers to reinforce and sometimes counteract the acoustic consequences of nasality. However, it is difficult to attribute the acoustic changes in nasal vowel spectra to either oral cavity configuration or to velopharyngeal opening (VPO). This paper takes the position that it is possible to disambiguate the effects of VPO and oropharyngeal configuration on the acoustic output of the vocal tract by studying the position and movement of the tongue and lips during the production of oral and nasal vowels. This paper uses simultaneously collected articulatory, acoustic, and nasal airflow data during the production of all oral and phonemically nasal vowels in Hindi (four speakers) to understand the consequences of the movements of oral articulators on the spectra of nasal vowels. For Hindi nasal vowels, the tongue body is generally lowered for back vowels, fronted for low vowels, and raised for front vowels (with respect to their oral congeners). These movements are generally supported by accompanying changes in the vowel spectra. In Hindi, the lowering of back nasal vowels may have originally served to enhance the acoustic salience of nasality, but has since engendered a nasal vowel chain shift.  相似文献   

10.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

11.
This study investigated the extent to which adult Japanese listeners' perceived phonetic similarity of American English (AE) and Japanese (J) vowels varied with consonantal context. Four AE speakers produced multiple instances of the 11 AE vowels in six syllabic contexts /b-b, b-p, d-d, d-t, g-g, g-k/ embedded in a short carrier sentence. Twenty-four native speakers of Japanese were asked to categorize each vowel utterance as most similar to one of 18 Japanese categories [five one-mora vowels, five two-mora vowels, plus/ei, ou/ and one-mora and two-mora vowels in palatalized consonant CV syllables, C(j)a(a), C(j)u(u), C(j)o(o)]. They then rated the "category goodness" of the AE vowel to the selected Japanese category on a seven-point scale. None of the 11 AE vowels was assimilated unanimously to a single J response category in all context/speaker conditions; consistency in selecting a single response category ranged from 77% for /eI/ to only 32% for /ae/. Median ratings of category goodness for modal response categories were somewhat restricted overall, ranging from 5 to 3. Results indicated that temporal assimilation patterns (judged similarity to one-mora versus two-mora Japanese categories) differed as a function of the voicing of the final consonant, especially for the AE vowels, /see text/. Patterns of spectral assimilation (judged similarity to the five J vowel qualities) of /see text/ also varied systematically with consonantal context and speakers. On the basis of these results, it was predicted that relative difficulty in the identification and discrimination of AE vowels by Japanese speakers would vary significantly as a function of the contexts in which they were produced and presented.  相似文献   

12.
赵擎华  杨俊杰 《应用声学》2021,40(6):937-945
为解决司法话者识别中利用鼻化元音构建元音声学空间图时如何准确判别鼻化元音的口、鼻音共振峰的问题。本文通过计算机语音工作站对语音样本的共振峰进行编辑操作,利用生成的语音样本构建不同的对照组分别进行听辨。结果表明,口音、鼻音共振峰分别被衰减后的语音变化特点呈现一定规律,使用此方法可以准确区分鼻化元音的口、鼻共振峰的阶次。本文建立的“共振峰编辑”与“听觉感知”相结合的判别方法,可以为司法话者识别及语音感知、识别等相关领域通过构建元音声学空间图进行声学特征研究的模型提供口音、鼻音共振峰的判别依据。  相似文献   

13.
A significant body of evidence has accumulated indicating that vowel identification is influenced by spectral change patterns. For example, a large-scale study of vowel formant patterns showed substantial improvements in category separability when a pattern classifier was trained on multiple samples of the formant pattern rather than a single sample at steady state [J. Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. However, in the earlier study all utterances were recorded in a constant /hVd/ environment. The main purpose of the present study was to determine whether a close relationship between vowel identity and spectral change patterns is maintained when the consonant environment is allowed to vary. Recordings were made of six men and six women producing eight vowels (see text) in isolation and in CVC syllables. The CVC utterances consisted of all combinations of seven initial consonants (/h,b,d,g,p,t,k/) and six final consonants (/b,d,g,p,t,k/). Formant frequencies for F1-F3 were measured every 5 ms during the vowel using an interactive editing tool. Results showed highly significant effects of phonetic environment. As with an earlier study of this type, particularly large shifts in formant patterns were seen for rounded vowels in alveolar environments [K. Stevens and A. House, J. Speech Hear. Res. 6, 111-128 (1963)]. Despite these context effects, substantial improvements in category separability were observed when a pattern classifier incorporated spectral change information. Modeling work showed that many aspects of listener behavior could be accounted for by a fairly simple pattern classifier incorporating F0, duration, and two discrete samples of the formant pattern.  相似文献   

14.
15.
The goal of this study was to determine whether acoustic properties could be derived for English labial and alveolar nasal consonants that remain stable across vowel contexts, speakers, and syllable positions. In experiment I, critical band analyses were conducted of five tokens each of [m] and [n] followed by the vowels [i e a o u] spoken by three speakers. Comparison of the nature of the changes in the spectral patterns from the murmur to the release showed that, for labials, there was a greater change in energy in the region of Bark 5-7 relative to that of Bark 11-14, whereas, for alveolars, there was a greater change in energy from the murmur to the release in the region of Bark 11-14 relative to that of Bark 5-7. Quantitative analyses of each token indicated that over 89% of the utterances could be appropriately classified for place of articulation by comparing the proportion of energy change in these spectral regions. In experiment II, the spectral patterns of labial and alveolar nasals produced in the context of [s] + nasal ([ m n]) + vowel ([ i e a o u]) by two speakers were explored. The same analysis procedures were used as in experiment I. Eighty-four percent of the utterances were appropriately classified, although labial consonants were less consistently classified than in experiment I. The properties associated with nasal place of articulation found in this study are discussed in relation to those associated with place of articulation in stop consonants and are considered from the viewpoint of a more general theory of acoustic invariance.  相似文献   

16.
There is limited documentation available on how sensorineurally hearing-impaired listeners use the various sources of phonemic information that are known to be distributed across time in the speech waveform. In this investigation, a group of normally hearing listeners and a group of sensorineurally hearing-impaired listeners (with and without the benefit of amplification) identified various consonant and vowel productions that had been systematically varied in duration. The consonants (presented in a /haCa/ environment) and the vowels (presented in a /bVd/ environment) were truncated in steps to eliminate various segments from the end of the stimulus. The results indicated that normally hearing listeners could extract more phonemic information, especially cues to consonant place, from the earlier occurring portions of the stimulus waveforms than could the hearing-impaired listeners. The use of amplification partially decreased the performance differences between the normally hearing listeners and the unaided hearing-impaired listeners. The results are relevant to current models of normal speech perception that emphasize the need for the listener to make phonemic identifications as quickly as possible.  相似文献   

17.
Formant dynamics in vowel nuclei contribute to vowel classification in English. This study examined listeners' ability to discriminate dynamic second formant transitions in synthetic high front vowels. Acoustic measurements were made from the nuclei (steady state and 20% and 80% of vowel duration) for the vowels /i, I, e, epsilon, ae/ spoken by a female in /bVd/ context. Three synthesis parameters were selected to yield twelve discrimination conditions: initial frequency value for F2 (2525, 2272, or 2068 Hz), slope direction (rising or falling), and duration (110 or 165 ms). F1 frequency was roved. In the standard stimuli, F0 and F1-F4 were steady state. In the comparison stimuli only F2 frequency varied linearly to reach a final frequency. Five listeners were tested under adaptive tracking to estimate the threshold for frequency extent, the minimal detectable difference in frequency between the initial and final F2 values, called deltaF extent. Analysis showed that initial F2 frequency and direction of movement for some F2 frequencies contributed to significant differences in deltaF extent. Results suggested that listeners attended to differences in the stimulus property of frequency extent (hertz), not formant slope (hertz/second). Formant extent thresholds were at least four times smaller than extents measured in the natural speech tokens, and 18 times smaller than for the diphthongized vowel /e/.  相似文献   

18.
This study investigated the role of sensory feedback during the production of front vowels. A temporary aftereffect induced by tongue loading was employed to modify the somatosensory-based perception of tongue height. Following the removal of tongue loading, tongue height during vowel production was estimated by measuring the frequency of the first formant (F1) from the acoustic signal. In experiment 1, the production of front vowels following tongue loading was investigated either in the presence or absence of auditory feedback. With auditory feedback available, the tongue height of front vowels was not modified by the aftereffect of tongue loading. By contrast, speakers did not compensate for the aftereffect of tongue loading when they produced vowels in the absence of auditory feedback. In experiment 2, the characteristics of the masking noise were manipulated such that it masked energy either in the F1 region or in the region of the second and higher formants. The results showed that the adjustment of tongue height during the production of front vowels depended on information about F1 in the auditory feedback. These findings support the idea that speech goals include both auditory and somatosensory targets and that speakers are able to make use of information from both sensory modalities to maximize the accuracy of speech production.  相似文献   

19.
The contribution of the nasal murmur and the vocalic formant transitions to perception of the [m]-[n] distinction in utterance-initial position preceding [i,a,u] was investigated, extending the recent work of Kurowski and Blumstein [J. Acoust. Soc. Am. 76, 383-390 (1984)]. A variety of waveform-editing procedures were applied to syllables produced by six different talkers. Listeners' judgments of the edited stimuli confirmed that the nasal murmur makes a significant contribution to place of articulation perception. Murmur and transition information appeared to be integrated at a genuinely perceptual, not an abstract cognitive, level. This was particularly evident in [-i] context, where only the simultaneous presence of murmur and transition components permitted accurate place of articulation identification. The perceptual information seemed to be purely relational in this case. It also seemed to be context specific, since the spectral change from the murmur to the vowel onset did not follow an invariant pattern across front and back vowels.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号