首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

2.
Current speech perception models propose that relative perceptual difficulties with non-native segmental contrasts can be predicted from cross-language phonetic similarities. Japanese (J) listeners performed a categorical discrimination task in which nine contrasts (six adjacent height pairs, three front/back pairs) involving eight American (AE) vowels [i?, ?, ε, ??, ɑ?, ?, ?, u?] in /hVb?/ disyllables were tested. The listeners also completed a perceptual assimilation task (categorization as J vowels with category goodness ratings). Perceptual assimilation patterns (quantified as categorization overlap scores) were highly predictive of discrimination accuracy (r(s)=0.93). Results suggested that J listeners used both spectral and temporal information in discriminating vowel contrasts.  相似文献   

3.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

4.
Current theories of cross-language speech perception claim that patterns of perceptual assimilation of non-native segments to native categories predict relative difficulties in learning to perceive (and produce) non-native phones. Cross-language spectral similarity of North German (NG) and American English (AE) vowels produced in isolated hVC(a) (di)syllables (study 1) and in hVC syllables embedded in a short sentence (study 2) was determined by discriminant analyses, to examine the extent to which acoustic similarity was predictive of perceptual similarity patterns. The perceptual assimilation of NG vowels to native AE vowel categories by AE listeners with no German language experience was then assessed directly. Both studies showed that acoustic similarity of AE and NG vowels did not always predict perceptual similarity, especially for "new" NG front rounded vowels and for "similar" NG front and back mid and mid-low vowels. Both acoustic and perceptual similarity of NG and AE vowels varied as a function of the prosodic context, although vowel duration differences did not affect perceptual assimilation patterns. When duration and spectral similarity were in conflict, AE listeners assimilated vowels on the basis of spectral similarity in both prosodic contexts.  相似文献   

5.
This paper examines four acoustic properties (duration F0, F1, and F2) of the monophthongal vowels of Iberian Spanish (IS) from Madrid and Peruvian Spanish (PS) from Lima in various consonantal contexts (/s/, /f/, /t/, /p/, and /k/) and in various phrasal contexts (in isolated words and sentence-internally). Acoustic measurements on 39 speakers, balanced by dialect and gender, can be generalized to the following differences between the two dialects. The vowel /a/ has a lower first formant in PS than in IS by 6.3%. The vowels /e/ and /o/ have more peripheral second-formant (F2) values in PS than in IS by about 4%. The consonant /s/ causes more centralization of the F2 of neighboring vowels in IS than in PS. No dialectal differences are found for the effect of phrasal context. Next to the between-dialect differences in the vowels, the present study finds that /s/ has a higher spectral center of gravity in PS than in IS by about 10%, that PS speakers speak slower than IS speakers by about 9%, and that Spanish-speaking women speak slower than Spanish-speaking men by about 5% (irrespective of dialect).  相似文献   

6.
The conditions under which listeners do and do not compensate for coarticulatory vowel nasalization were examined through a series of experiments of listeners' perception of naturally produced American English oral and nasal vowels spliced into three contexts: oral (C_C), nasal (N_N), and isolation. Two perceptual paradigms, a rating task in which listeners judged the relative nasality of stimulus pairs and a 4IAX discrimination task in which listeners judged vowel similarity, were used with two listener groups, native English speakers and native Thai speakers. Thai and English speakers were chosen because their languages differ in the temporal extent of anticipatory vowel nasalization. Listeners' responses were highly context dependent. For both perceptual paradigms and both language groups, listeners were less accurate at judging vowels in nasal than in non-nasal (oral or isolation) contexts; nasal vowels in nasal contexts were the most difficult to judge. Response patterns were generally consistent with the hypothesis that, given an appropriate and detectable nasal consonant context, listeners compensate for contextual vowel nasalization and attribute the acoustic effects of the nasal context to their coarticulatory source. However, the results also indicated that listeners do not hear nasal vowels in nasal contexts as oral; listeners retained some sensitivity to vowel nasalization in all contexts, indicating partial compensation for coarticulatory vowel nasalization. Moreover, there were small but systematic differences between the native Thai- and native English-speaking groups. These differences are as expected if perceptual compensation is partial and the extent of compensation is linked to patterns of coarticulatory nasalization in the listeners' native language.  相似文献   

7.
Medial movements of the lateral pharyngeal wall at the level of the velopharyngeal port were examined by using a computerized ultrasound system. Subjects produced CVNVC sequences involving all combinations of the vowels /a/ and /u/ and the nasal consonants /n/ and /m/. The effects of both vowels on the CVN and NVC gestures (opening and closing of the velopharyngeal port, respectively) were assessed in terms of movement amplitude, duration, and movement onset time. The amplitude of both opening and closing gestures of the lateral pharyngeal wall was less in the context of the vowel /u/ than the vowel /a/. In addition, the onset of the opening gesture towards the nasal consonant was related to the identity of both the initial and the final vowels. The characteristics of the functional coupling of the velum and lateral pharyngeal wall in speech are discussed.  相似文献   

8.
The perceptual mechanisms of assimilation and contrast in the phonetic perception of vowels were investigated. In experiment 1, 14 stimulus continua were generated using an /i/-/e/-/a/ vowel continuum. They ranged from a continuum with both ends belonging to the same phonemic category in Japanese, to a continuum with both ends belonging to different phonemic categories. The AXB method was employed and the temporal position of X was changed under three conditions. In each condition ten subjects were required to judge whether X was similar to A or to B. The results demonstrated that assimilation to the temporally closer sound occurs if the phonemic categories of A and B are the same and that contrast to the temporally closer sound occurs if A and B belong to different phonemic categories. It was observed that the transition from assimilation to contrast is continuous except in the /i'/-X-/e/ condition. In experiment 2, the total duration of t 1 (between A and X) and t 2 (between X and B) was changed under five conditions. One stimulus continuum consisted of the same phonemic category in Japanese and the other consisted of different phonemic categories. Six subjects were required to make similarity judgements of X. The results demonstrated that the occurrence of assimilation and contrast to the temporally closer sound seemed to be constant under each of the five conditions. The present findings suggest that assimilation and contrast are determined by three factors: the temporal position of the three stimuli, the acoustic distance between the three stimuli on the stimulus continuum, and the phonemic categories of the three stimuli.  相似文献   

9.
This paper seeks to characterize the nature, size, and range of acoustic amplitude variation in naturally produced coarticulated vowels in order to determine its potential contribution and relevance to vowel perception. The study is a partial replication and extension of the pioneering work by House and Fairbanks [J. Acoust. Soc. Am. 22, 105-113 (1953)], who reported large variation in vowel amplitude as a function of consonantal context. Eight American English vowels spoken by men and women were recorded in ten symmetrical CVC consonantal contexts. Acoustic amplitude measures included overall rms amplitude, amplitude of the rms peak along with its relative location in the CVC-word, and the amplitudes of individual formants F1-F4 along with their frequencies. House and Fairbanks' amplitude results were not replicated: Neither the overall rms nor the rms peak varied appreciably as a function of consonantal context. However, consonantal context was shown to affect significantly and systematically the amplitudes of individual formants at the vowel nucleus. These effects persisted in the auditory representation of the vowel signal. Auditory spectra showed that the pattern of spectral amplitude variation as a function of contextual effects may still be encoded and represented at early stages of processing by the peripheral auditory system.  相似文献   

10.
This study explores the following hypothesis: forward looping movements of the tongue that are observed in VCV sequences are due partly to the anatomical arrangement of the tongue muscles, how they are used to produce a velar closure, and how the tongue interacts with the palate during consonantal closure. The study uses an anatomically based two-dimensional biomechanical tongue model. Tissue elastic properties are accounted for in finite-element modeling, and movement is controlled by constant-rate control parameter shifts. Tongue raising and lowering movements are produced by the model mainly with the combined actions of the genioglossus, styloglossus, and hyoglossus. Simulations of V1CV2 movements were made, where C is a velar consonant and V is [a], [i], or [u]. Both vowels and consonants are specified in terms of targets, but for the consonant the target is virtual, and cannot be reached because it is beyond the surface of the palate. If V1 is the vowel [a] or [u], the resulting trajectory describes a movement that begins to loop forward before consonant closure and continues to slide along the palate during the closure. This pattern is very stable when moderate changes are made to the specification of the target consonant location and agrees with data published in the literature. If V1 is the vowel [i], looping patterns are also observed, but their orientation was quite sensitive to small changes in the location of the consonant target. These findings also agree with patterns of variability observed in measurements from human speakers, but they contradict data published by Houde [Ph.D. dissertation (1967)]. These observations support the idea that the biomechanical properties of the tongue could be the main factor responsible for the forward loops when V1 is a back vowel, regardless of whether V2 is a back vowel or a front vowel. In the [i] context it seems that additional factors have to be taken into consideration in order to explain the observations made on some speakers.  相似文献   

11.
An analysis is presented of regional variation patterns in the vowel system of Standard Dutch as spoken in the Netherlands (Northern Standard Dutch) and Flanders (Southern Standard Dutch). The speech material consisted of read monosyllabic utterances in a neutral consonantal context (i.e., /sVs/). The analyses were based on measurements of the duration and the frequencies of the first two formants of the vowel tokens. Recordings were made for 80 Dutch and 80 Flemish speakers, who were stratified for the social factors gender and region. These 160 speakers were distributed across four regions in the Netherlands and four regions in Flanders. Differences between regional varieties were found for duration, steady-state formant frequencies, and spectral change of formant frequencies. Variation patterns in the spectral characteristics of the long mid vowels /e o ?/ and the diphthongal vowels /ei oey bacwards c u/ were in accordance with a recent theory of pronunciation change in Standard Dutch. Finally, it was found that regional information was present in the steady-state formant frequency measurements of vowels produced by professional language users.  相似文献   

12.
The purpose of this investigation was to study the effects of consonant environment on vowel duration for normally hearing males, hearing-impaired males with intelligible speech, and hearing-impaired males with semi-intelligible speech. The results indicated that the normally hearing and intelligible hearing-impaired speakers exhibited similar trends with respect to consonant influence on vowel duration; i.e., vowels were longer in duration, in a voiced environment as compared with a voiceless, and in a fricative environment as compared with a plosive. The semi-intelligible hearing-impaired speakers, however, failed to demonstrate a consonant effect on vowel duration, and produced the vowels with significantly longer durations when compared with the other two groups of speakers. These data provide information regarding temporal conditions which may contribute to the decreased intelligibility of hearing-impaired persons.  相似文献   

13.
This study explored how across-talker differences influence non-native vowel perception. American English (AE) and Korean listeners were presented with recordings of 10 AE vowels in /bVd/ context. The stimuli were mixed with noise and presented for identification in a 10-alternative forced-choice task. The two listener groups heard recordings of the vowels produced by 10 talkers at three signal-to-noise ratios. Overall the AE listeners identified the vowels 22% more accurately than the Korean listeners. There was a wide range of identification accuracy scores across talkers for both AE and Korean listeners. At each signal-to-noise ratio, the across-talker intelligibility scores were highly correlated for AE and Korean listeners. Acoustic analysis was conducted for 2 vowel pairs that exhibited variable accuracy across talkers for Korean listeners but high identification accuracy for AE listeners. Results demonstrated that Korean listeners' error patterns for these four vowels were strongly influenced by variability in vowel production that was within the normal range for AE talkers. These results suggest that non-native listeners are strongly influenced by across-talker variability perhaps because of the difficulty they have forming native-like vowel categories.  相似文献   

14.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

15.
Monolingual Peruvian Spanish listeners identified natural tokens of the Canadian French (CF) and Canadian English (CE) /?/ and /?/, produced in five consonantal contexts. The results demonstrate that while the CF vowels were mapped to two different native vowels, /e/ and /a/, in all consonantal contexts, the CE contrast was mapped to the single native vowel /a/ in four out of five contexts. Linear discriminant analysis revealed that acoustic similarity between native and target language vowels was a very good predictor of context-specific perceptual mappings. Predictions are made for Spanish learners of the /?/-/?/ contrast in CF and CE.  相似文献   

16.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

17.
Static, dynamic, and relational properties in vowel perception   总被引:2,自引:0,他引:2  
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.  相似文献   

18.
Changes in the speech spectrum of vowels and consonants before and after tonsillectomy were investigated to find out the impact of the operation on speech quality. Speech recordings obtained from patients were analyzed using the Kay Elemetrics, Multi-Dimensional Voice Processing (MDVP Advanced) software. Examination of the time-course changes after the operation revealed that certain speech parameters changed. These changes were mainly F3 (formant center frequency) and B3 (formant bandwidth) for the vowel /o/ and a slight decrease in B1 and B2 for the vowel /a/. The noise-to-harmonic ratio (NHR) also decreased slightly, suggesting less nasalized vowels. It was also observed that the fricative, glottal consonant /h/ has been affected. The larger the tonsil had been, the more changes were seen in the speech spectrum. The changes in the speech characteristics (except F3 and B3 for the vowel /o/) tended to recover, suggesting an involvement of auditory feedback and/or replacement of a new soft tissue with the tonsils. Although the changes were minimal and, therefore, have little effect on the extracted acoustic parameters, they cannot be disregarded for those relying on their voice for professional reasons, that is, singers, professional speakers, and so forth.  相似文献   

19.
During voice evaluation and treatment it is customary for clinicians to elicit samples of the vowel /a/ from clients using various elicitation techniques. The purpose of this study was to compare the effects of four commonly used stimulation tasks on the laryngeal mechanism. Eleven female singing students, studying at a university music school, served as subjects for the study. The subjects phonated the vowel /a/ using 4 vocal stimulation techniques: yawn-sigh, gentle onset, focus, and the use of the voiceless fricative. Videoendoscopic and acoustic evaluations of their productions were done. Results show that, in the first 100 ms following the end of the formant transition, these techniques affected voice differently. The fundamental frequency was found to be highest in the yawn-sigh condition, whereas the maximum frequency perturbation was obtained for the voiceless fricative condition. Planned comparisons were made by comparing the data across 2 dimensions: (1) vowels elicited with voiced contexts versus those elicited with voiceless consonantal contexts and (2) vowels elicited with obstruent versus vowels elicited with nonobstruent consonantal contexts. Some changes in acoustic parameters brought about by these stimulation techniques may be explained on the basis of coarticulatory effects of the consonantal context.  相似文献   

20.
Multichannel cochlear implant users vary greatly in their word-recognition abilities. This study examined whether their word recognition was related to the use of either highly dynamic or relatively steady-state vowel cues contained in /bVb/ and /wVb/ syllables. Nine conditions were created containing different combinations of formant transition, steady-state, and duration cues. Because processor strategies differ, the ability to perceive static and dynamic information may depend on the type of cochlear implant used. Ten Nucleus and ten Ineraid subjects participated, along with 12 normal-hearing control subjects. Vowel identification did not differ between implanted groups, but both were significantly poorer at identifying vowels than the normal-hearing group. Vowel identification was best when at least two kinds of cues were available. Using only one type of cue, performance was better with excised vowels containing steady-state formants than in "vowelless" syllables, where the center vocalic portion was deleted and transitions were joined. In the latter syllable type, Nucleus subjects identified vowels significantly better when /b/ was the initial consonant; the other two groups were not affected by specific consonantal context. Cochlear implant subjects' word-recognition was positively correlated with the use of dynamic vowel cues, but not with steady-state cues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号