首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
Current theories of cross-language speech perception claim that patterns of perceptual assimilation of non-native segments to native categories predict relative difficulties in learning to perceive (and produce) non-native phones. Cross-language spectral similarity of North German (NG) and American English (AE) vowels produced in isolated hVC(a) (di)syllables (study 1) and in hVC syllables embedded in a short sentence (study 2) was determined by discriminant analyses, to examine the extent to which acoustic similarity was predictive of perceptual similarity patterns. The perceptual assimilation of NG vowels to native AE vowel categories by AE listeners with no German language experience was then assessed directly. Both studies showed that acoustic similarity of AE and NG vowels did not always predict perceptual similarity, especially for "new" NG front rounded vowels and for "similar" NG front and back mid and mid-low vowels. Both acoustic and perceptual similarity of NG and AE vowels varied as a function of the prosodic context, although vowel duration differences did not affect perceptual assimilation patterns. When duration and spectral similarity were in conflict, AE listeners assimilated vowels on the basis of spectral similarity in both prosodic contexts.  相似文献   

2.
Monolingual Peruvian Spanish listeners identified natural tokens of the Canadian French (CF) and Canadian English (CE) /?/ and /?/, produced in five consonantal contexts. The results demonstrate that while the CF vowels were mapped to two different native vowels, /e/ and /a/, in all consonantal contexts, the CE contrast was mapped to the single native vowel /a/ in four out of five contexts. Linear discriminant analysis revealed that acoustic similarity between native and target language vowels was a very good predictor of context-specific perceptual mappings. Predictions are made for Spanish learners of the /?/-/?/ contrast in CF and CE.  相似文献   

3.
During voice evaluation and treatment it is customary for clinicians to elicit samples of the vowel /a/ from clients using various elicitation techniques. The purpose of this study was to compare the effects of four commonly used stimulation tasks on the laryngeal mechanism. Eleven female singing students, studying at a university music school, served as subjects for the study. The subjects phonated the vowel /a/ using 4 vocal stimulation techniques: yawn-sigh, gentle onset, focus, and the use of the voiceless fricative. Videoendoscopic and acoustic evaluations of their productions were done. Results show that, in the first 100 ms following the end of the formant transition, these techniques affected voice differently. The fundamental frequency was found to be highest in the yawn-sigh condition, whereas the maximum frequency perturbation was obtained for the voiceless fricative condition. Planned comparisons were made by comparing the data across 2 dimensions: (1) vowels elicited with voiced contexts versus those elicited with voiceless consonantal contexts and (2) vowels elicited with obstruent versus vowels elicited with nonobstruent consonantal contexts. Some changes in acoustic parameters brought about by these stimulation techniques may be explained on the basis of coarticulatory effects of the consonantal context.  相似文献   

4.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

5.
This study investigated the extent to which adult Japanese listeners' perceived phonetic similarity of American English (AE) and Japanese (J) vowels varied with consonantal context. Four AE speakers produced multiple instances of the 11 AE vowels in six syllabic contexts /b-b, b-p, d-d, d-t, g-g, g-k/ embedded in a short carrier sentence. Twenty-four native speakers of Japanese were asked to categorize each vowel utterance as most similar to one of 18 Japanese categories [five one-mora vowels, five two-mora vowels, plus/ei, ou/ and one-mora and two-mora vowels in palatalized consonant CV syllables, C(j)a(a), C(j)u(u), C(j)o(o)]. They then rated the "category goodness" of the AE vowel to the selected Japanese category on a seven-point scale. None of the 11 AE vowels was assimilated unanimously to a single J response category in all context/speaker conditions; consistency in selecting a single response category ranged from 77% for /eI/ to only 32% for /ae/. Median ratings of category goodness for modal response categories were somewhat restricted overall, ranging from 5 to 3. Results indicated that temporal assimilation patterns (judged similarity to one-mora versus two-mora Japanese categories) differed as a function of the voicing of the final consonant, especially for the AE vowels, /see text/. Patterns of spectral assimilation (judged similarity to the five J vowel qualities) of /see text/ also varied systematically with consonantal context and speakers. On the basis of these results, it was predicted that relative difficulty in the identification and discrimination of AE vowels by Japanese speakers would vary significantly as a function of the contexts in which they were produced and presented.  相似文献   

6.
The aim of the study was to establish whether /u/-fronting, a sound change in progress in standard southern British, could be linked synchronically to the fronting effects of a preceding anterior consonant both in speech production and speech perception. For the production study, which consisted of acoustic analyses of isolated monosyllables produced by two different age groups, it was shown for younger speakers that /u/ was phonetically fronted and that the coarticulatory influence of consonants on /u/ was less than in older speakers. For the perception study, responses were elicited from the same subjects to two minimal word-pair continua that differed in the direction of the consonants' coarticulatory fronting effects on /u/. Consistent with their speech production, young listeners' /u/ category boundary was shifted toward /i/ and they compensated perceptually less for the fronting effects of the consonants on /u/ than older listeners. The findings support Ohala's model in which certain sound changes can be linked to the listener's failure to compensate for coarticulation. The results are also shown to be consistent with episodic models of speech perception in which phonological frequency effects bring about a realignment of the variants of a phonological category in speech production and perception.  相似文献   

7.
The conditions under which listeners do and do not compensate for coarticulatory vowel nasalization were examined through a series of experiments of listeners' perception of naturally produced American English oral and nasal vowels spliced into three contexts: oral (C_C), nasal (N_N), and isolation. Two perceptual paradigms, a rating task in which listeners judged the relative nasality of stimulus pairs and a 4IAX discrimination task in which listeners judged vowel similarity, were used with two listener groups, native English speakers and native Thai speakers. Thai and English speakers were chosen because their languages differ in the temporal extent of anticipatory vowel nasalization. Listeners' responses were highly context dependent. For both perceptual paradigms and both language groups, listeners were less accurate at judging vowels in nasal than in non-nasal (oral or isolation) contexts; nasal vowels in nasal contexts were the most difficult to judge. Response patterns were generally consistent with the hypothesis that, given an appropriate and detectable nasal consonant context, listeners compensate for contextual vowel nasalization and attribute the acoustic effects of the nasal context to their coarticulatory source. However, the results also indicated that listeners do not hear nasal vowels in nasal contexts as oral; listeners retained some sensitivity to vowel nasalization in all contexts, indicating partial compensation for coarticulatory vowel nasalization. Moreover, there were small but systematic differences between the native Thai- and native English-speaking groups. These differences are as expected if perceptual compensation is partial and the extent of compensation is linked to patterns of coarticulatory nasalization in the listeners' native language.  相似文献   

8.
Current speech perception models propose that relative perceptual difficulties with non-native segmental contrasts can be predicted from cross-language phonetic similarities. Japanese (J) listeners performed a categorical discrimination task in which nine contrasts (six adjacent height pairs, three front/back pairs) involving eight American (AE) vowels [i?, ?, ε, ??, ɑ?, ?, ?, u?] in /hVb?/ disyllables were tested. The listeners also completed a perceptual assimilation task (categorization as J vowels with category goodness ratings). Perceptual assimilation patterns (quantified as categorization overlap scores) were highly predictive of discrimination accuracy (r(s)=0.93). Results suggested that J listeners used both spectral and temporal information in discriminating vowel contrasts.  相似文献   

9.
This study addresses three issues that are relevant to coarticulation theory in speech production: whether the degree of articulatory constraint model (DAC model) accounts for patterns of the directionality of tongue dorsum coarticulatory influences; the extent to which those patterns in tongue dorsum coarticulatory direction are similar to those for the tongue tip; and whether speech motor control and phonemic planning use a fixed or a context-dependent temporal window. Tongue dorsum and tongue tip movement data on vowel-to-vowel coarticulation are reported for Catalan VCV sequences with vowels /i/, /a/, and /u/, and consonants /p/, /n/, dark /l/, /s/, /S/, alveolopalatal /n/ and /k/. Electromidsagittal articulometry recordings were carried out for three speakers using the Carstens articulograph. Trajectory data are presented for the vertical dimension for the tongue dorsum, and for the horizontal dimension for tongue dorsum and tip. In agreement with predictions of the DAC model, results show that directionality patterns of tongue dorsum coarticulation can be accounted for to a large extent based on the articulatory requirements on consonantal production. While dorsals exhibit analogous trends in coarticulatory direction for all articulators and articulatory dimensions, this is mostly so for the tongue dorsum and tip along the horizontal dimension in the case of lingual fricatives and apicolaminal consonants. This finding results from different articulatory strategies: while dorsal consonants are implemented through homogeneous tongue body activation, the tongue tip and tongue dorsum act more independently for more anterior consonantal productions. Discontinuous coarticulatory effects reported in the present investigation suggest that phonemic planning is adaptative rather than context independent.  相似文献   

10.
Static, dynamic, and relational properties in vowel perception   总被引:2,自引:0,他引:2  
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.  相似文献   

11.
A number of studies, involving English, Swedish, French, and Spanish, have shown that, for sequences of rounded vowels separated by nonlabial consonants, both EMG activity and lip protrusion diminish during the intervocalic consonant interval, producing a "trough" pattern. A two-part study was conducted to (a) compare patterns of protrusion movement (upper and lower lip) and EMG activity (orbicularis oris) for speakers of English and Turkish, a language where phonological rules constrain vowels within a word to agree in rounding and (b) determine which of two current models of coarticulation, the "look-ahead" and "coproduction" models, best explained the data. Results showed Turkish speakers producing "plateau" patterns of movement rather than troughs, and unimodal rather than bimodal patterns of EMG activity. In the second part of the study, one prediction of the coproduction model, that articulatory gestures have stable profiles across contexts, was tested by adding and subtracting movement data signals to synthesize naturally occurring patterns. Results suggest English and Turkish may have different modes of coarticulatory organization.  相似文献   

12.
This study investigated whether F2 and F3 transition onsets could encode the vowel place feature as well as F2 and F3 "steady-state" measures [Syrdal and Gopal, J. Acoust. Soc. Am. 79, 1086-1100 (1986)]. Multiple comparisons were made using (a) scatterplots in multidimensional space, (b) critical band differences, and (c) linear discriminant functional analyses. Four adult male speakers produced /b/(v)/t/, /d/(v)/t/, and /g/(v)/t/ tokens with medial vowel contexts /i,I, E, ey, ae, a, v, c, o, u/. Each token was repeated in a random order five times, yielding a total of 150 tokens per subject. Formant measurements were taken at four loci: F2 onset, F2 vowel, F3 onset, and F3 vowel. Onset points coincided with the first glottal pulse following the release burst and steady-state measures were taken approximately 60-70 ms post-onset. Graphic analyses revealed two distinct, minimally overlapping subsets grouped by front versus back. This dichotomous grouping was also seen in two-dimensional displays using only "onset" data as coordinates. Conversion to a critical band (bark) scale confirmed that front vowels were characterized by F3-F2 bark differences within a critical 3-bark distance, while back vowels exceeded the 3-bark critical distance. Using the critical distance metric onset values categorized front vowels as well as steady-state measures, but showed a 20% error rate for back vowels. Front vowels had less variability than back vowels. Statistical separability was quantified with linear discriminant function analysis. Percent correct classification into vowel place groups was 87.5% using F2 and F3 onsets as input variables, and 95.7% using F2 and F3 vowel. Acoustic correlates of the vowel place feature are already present at second and third formant transition onsets.  相似文献   

13.
This paper seeks to characterize the nature, size, and range of acoustic amplitude variation in naturally produced coarticulated vowels in order to determine its potential contribution and relevance to vowel perception. The study is a partial replication and extension of the pioneering work by House and Fairbanks [J. Acoust. Soc. Am. 22, 105-113 (1953)], who reported large variation in vowel amplitude as a function of consonantal context. Eight American English vowels spoken by men and women were recorded in ten symmetrical CVC consonantal contexts. Acoustic amplitude measures included overall rms amplitude, amplitude of the rms peak along with its relative location in the CVC-word, and the amplitudes of individual formants F1-F4 along with their frequencies. House and Fairbanks' amplitude results were not replicated: Neither the overall rms nor the rms peak varied appreciably as a function of consonantal context. However, consonantal context was shown to affect significantly and systematically the amplitudes of individual formants at the vowel nucleus. These effects persisted in the auditory representation of the vowel signal. Auditory spectra showed that the pattern of spectral amplitude variation as a function of contextual effects may still be encoded and represented at early stages of processing by the peripheral auditory system.  相似文献   

14.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

15.
This study assessed the acoustic coarticulatory effects of phrasal accent on [V1.CV2] sequences, when separately applied to V1 or V2, surrounding the voiced stops [b], [d], and [g]. Three adult speakers each produced 360 tokens (six V1 contexts x ten V2 contexts x three stops x two emphasis conditions). Realizing that anticipatory coarticulation of V2 onto the intervocalic C can be influenced by prosodic effects, as well as by vowel context effects, a modified locus equation regression metric was used to isolate the effect of phrasal accent on consonantal F2 onsets, independently of prosodically induced vowel expansion effects. The analyses revealed two main emphasis-dependent effects: systematic differences in F2 onset values and the expected expansion of vowel space. By accounting for the confounding variable of stress-induced vowel space expansion, a small but consistent coarticulatory effect of emphatic stress on the consonant was uncovered in lingually produced stops, but absent in labial stops. Formant calculations based on tube models indicated similarly increased F2 onsets when stressed /d/ and /g/ were simulated with deeper occlusions resulting from more forceful closure movements during phrasal accented speech.  相似文献   

16.
There is increasing evidence that fine articulatory adjustments are made by speakers to reinforce and sometimes counteract the acoustic consequences of nasality. However, it is difficult to attribute the acoustic changes in nasal vowel spectra to either oral cavity configuration or to velopharyngeal opening (VPO). This paper takes the position that it is possible to disambiguate the effects of VPO and oropharyngeal configuration on the acoustic output of the vocal tract by studying the position and movement of the tongue and lips during the production of oral and nasal vowels. This paper uses simultaneously collected articulatory, acoustic, and nasal airflow data during the production of all oral and phonemically nasal vowels in Hindi (four speakers) to understand the consequences of the movements of oral articulators on the spectra of nasal vowels. For Hindi nasal vowels, the tongue body is generally lowered for back vowels, fronted for low vowels, and raised for front vowels (with respect to their oral congeners). These movements are generally supported by accompanying changes in the vowel spectra. In Hindi, the lowering of back nasal vowels may have originally served to enhance the acoustic salience of nasality, but has since engendered a nasal vowel chain shift.  相似文献   

17.
It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings.  相似文献   

18.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

19.
Naive listeners' perceptual assimilations of non-native vowels to first-language (L1) categories can predict difficulties in the acquisition of second-language vowel systems. This study demonstrates that listeners having two slightly different dialects as their L1s can differ in the perception of foreign vowels. Specifically, the study shows that Bohemian Czech and Moravian Czech listeners assimilate Dutch high front vowels differently to L1 categories. Consequently, the listeners are predicted to follow different paths in acquiring these Dutch vowels. These findings underscore the importance of carefully considering the specific dialect background of participants in foreign- and second-language speech perception studies.  相似文献   

20.
This study investigated the role of sensory feedback during the production of front vowels. A temporary aftereffect induced by tongue loading was employed to modify the somatosensory-based perception of tongue height. Following the removal of tongue loading, tongue height during vowel production was estimated by measuring the frequency of the first formant (F1) from the acoustic signal. In experiment 1, the production of front vowels following tongue loading was investigated either in the presence or absence of auditory feedback. With auditory feedback available, the tongue height of front vowels was not modified by the aftereffect of tongue loading. By contrast, speakers did not compensate for the aftereffect of tongue loading when they produced vowels in the absence of auditory feedback. In experiment 2, the characteristics of the masking noise were manipulated such that it masked energy either in the F1 region or in the region of the second and higher formants. The results showed that the adjustment of tongue height during the production of front vowels depended on information about F1 in the auditory feedback. These findings support the idea that speech goals include both auditory and somatosensory targets and that speakers are able to make use of information from both sensory modalities to maximize the accuracy of speech production.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号