首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In the experiments reported here, perceived speaker identity was controlled by manipulating the fundamental frequency (F0) range of carrier phrases in which speech tokens were embedded. In the first experiment, words from two "hood"-"hud" continua were synthesized with different F0. The words were then embedded in synthetic carrier phrases with intonation contours which reduced perceived speaker identity differences for test items with different F0. The results indicated that when perceived speaker identity differences were reduced, the effect of F0 on vowel identification was also reduced. Experiment 2 indicated that when items presented in carrier phrases are matched for speaker identity and F0 with items in isolation, there is no effect for presentation in a carrier phrase. Experiment 3 involved the presentation of vowels from the "hood"-"hud" continuum in two different intonational contexts which were judged to have been produced by different speakers, even though the F0 of the test word was identical in the two contexts. There was a shift in identification as a result of the intonational context which was interpreted as evidence for the role of perceived identity in vowel normalization. Overall, the experiments suggest that perceived speaker identity is a better predictor of vowel normalization effects than is intrinsic F0. This indicates that the role of F0 in vowel normalization is mediated through perceived speaker identity.  相似文献   

2.
The perception of subphonemic differences between vowels was investigated using multidimensional scaling techniques. Three experiments were conducted with natural-sounding synthetic stimuli generated by linear predictive coding (LPC) formant synthesizers. In the first experiment, vowel sets near the pairs (i-I), (epsilon-ae), or (u-U) were synthesized containing 11 vowels each. Listeners judged the dissimilarities between all pairs of vowels within a set several times. These perceptual differences were mapped into distances between the vowels in an n-dimensional space using two-way multidimensional scaling. Results for each vowel set showed that the physical stimulus space, which was specified by the two parameters F1 and F2, was always mapped into a two-dimensional perceptual space. The best metric for modeling the perceptual distances was the Euclidean distance between F1 and F2 in barks. The second experiment investigated the perception of the same vowels from the first experiment, but embedded in a consonantal context. Following the same procedures as experiment 1, listeners' perception of the (bv) dissimilarities was not different from their perception of the isolated vowel dissimilarities. The third experiment investigated dissimilarity judgments for the three vowels (ae-alpha-lambda) located symmetrically in the F1 X F2 vowel space. While the perceptual space was again two dimensional, the influence of phonetic identity on vowel difference judgments was observed. Implications for determining metrics for subphonemic vowel differences using multidimensional scaling are discussed.  相似文献   

3.
The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues.  相似文献   

4.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.  相似文献   

5.
Peruvian Spanish (PS) and Iberian Spanish (IS) learners were tested on their ability to categorically discriminate and identify Dutch vowels. It was predicted that the acoustic differences between the vowel productions of the two dialects, which compare differently to Dutch vowels, would manifest in differential L2 perception for listeners of these two dialects. The results show that although PS learners had higher general L2 proficiency, IS learners were more accurate at discriminating all five contrasts and at identifying six of the L2 Dutch vowels. These findings confirm that acoustic differences in native vowel production lead to differential L2 vowel perception.  相似文献   

6.
This paper examines four acoustic properties (duration F0, F1, and F2) of the monophthongal vowels of Iberian Spanish (IS) from Madrid and Peruvian Spanish (PS) from Lima in various consonantal contexts (/s/, /f/, /t/, /p/, and /k/) and in various phrasal contexts (in isolated words and sentence-internally). Acoustic measurements on 39 speakers, balanced by dialect and gender, can be generalized to the following differences between the two dialects. The vowel /a/ has a lower first formant in PS than in IS by 6.3%. The vowels /e/ and /o/ have more peripheral second-formant (F2) values in PS than in IS by about 4%. The consonant /s/ causes more centralization of the F2 of neighboring vowels in IS than in PS. No dialectal differences are found for the effect of phrasal context. Next to the between-dialect differences in the vowels, the present study finds that /s/ has a higher spectral center of gravity in PS than in IS by about 10%, that PS speakers speak slower than IS speakers by about 9%, and that Spanish-speaking women speak slower than Spanish-speaking men by about 5% (irrespective of dialect).  相似文献   

7.
This study assessed the acoustic coarticulatory effects of phrasal accent on [V1.CV2] sequences, when separately applied to V1 or V2, surrounding the voiced stops [b], [d], and [g]. Three adult speakers each produced 360 tokens (six V1 contexts x ten V2 contexts x three stops x two emphasis conditions). Realizing that anticipatory coarticulation of V2 onto the intervocalic C can be influenced by prosodic effects, as well as by vowel context effects, a modified locus equation regression metric was used to isolate the effect of phrasal accent on consonantal F2 onsets, independently of prosodically induced vowel expansion effects. The analyses revealed two main emphasis-dependent effects: systematic differences in F2 onset values and the expected expansion of vowel space. By accounting for the confounding variable of stress-induced vowel space expansion, a small but consistent coarticulatory effect of emphatic stress on the consonant was uncovered in lingually produced stops, but absent in labial stops. Formant calculations based on tube models indicated similarly increased F2 onsets when stressed /d/ and /g/ were simulated with deeper occlusions resulting from more forceful closure movements during phrasal accented speech.  相似文献   

8.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

9.
The objective of this study was to assess the difference in voice quality as defined by acoustical analysis using sustained vowel in laryngectomized patients in comparison with normal volunteers. This was designed as a retrospective single center cohort study. An adult tertiary referral unit formed the setting of this study. Fifty patients (40 males) who underwent total laryngectomy and 31 normal volunteers (18 male) participated. Group comparisons with the first three formant frequencies (F1, F2, and F3) using linear predictive coding (LPC) (Laryngograph Ltd, London, UK) was performed. The existence of any significant difference of F1, F2, and F3 between the two groups using the sustained vowel /i/ and the effects of other factors namely, tumor stage (T), chemoradiotherapy, pharyngectomy, cricothyroid myotomy, closure of pharyngoesophageal segment, and postoperative complication were analyzed. Formant frequencies F1, F2, and F3 were significantly different in male laryngectomees compared to controls: F1 (P<0.001, Mann-Whitney U test), F2 (P<0.001, Student's t test), and F3 (P=0.008, Student's t test). There was no significant difference between females in both groups for all three formant frequencies. Chemoradiotherapy and postoperative complications (pharyngocutaneous fistula) caused a significantly lower formant F1 in men, but showed little effect in F2 and F3. Laryngectomized males produced significantly higher formant frequencies, F1, F2, and F3, compared to normal volunteers, and this is consistent with literature. Chemoradiotherapy and postoperative complications significantly influenced the formant scores in the laryngectomee population. This study shows that robust and reliable data could be obtained using electroglottography and LPC in normal volunteers and laryngectomees using a sustained vowel.  相似文献   

10.
This study investigated whether F2 and F3 transition onsets could encode the vowel place feature as well as F2 and F3 "steady-state" measures [Syrdal and Gopal, J. Acoust. Soc. Am. 79, 1086-1100 (1986)]. Multiple comparisons were made using (a) scatterplots in multidimensional space, (b) critical band differences, and (c) linear discriminant functional analyses. Four adult male speakers produced /b/(v)/t/, /d/(v)/t/, and /g/(v)/t/ tokens with medial vowel contexts /i,I, E, ey, ae, a, v, c, o, u/. Each token was repeated in a random order five times, yielding a total of 150 tokens per subject. Formant measurements were taken at four loci: F2 onset, F2 vowel, F3 onset, and F3 vowel. Onset points coincided with the first glottal pulse following the release burst and steady-state measures were taken approximately 60-70 ms post-onset. Graphic analyses revealed two distinct, minimally overlapping subsets grouped by front versus back. This dichotomous grouping was also seen in two-dimensional displays using only "onset" data as coordinates. Conversion to a critical band (bark) scale confirmed that front vowels were characterized by F3-F2 bark differences within a critical 3-bark distance, while back vowels exceeded the 3-bark critical distance. Using the critical distance metric onset values categorized front vowels as well as steady-state measures, but showed a 20% error rate for back vowels. Front vowels had less variability than back vowels. Statistical separability was quantified with linear discriminant function analysis. Percent correct classification into vowel place groups was 87.5% using F2 and F3 onsets as input variables, and 95.7% using F2 and F3 vowel. Acoustic correlates of the vowel place feature are already present at second and third formant transition onsets.  相似文献   

11.
The contribution of extraneous sounds to the perceptual estimation of the first-formant (F1) frequency of voiced vowels was investigated using a continuum of vowels perceived as changing from/I/to/epsilon/as F1 was increased. Any phonetic effects of adding extraneous sounds were measured as a change in the position of the phoneme boundary on the continuum. Experiments 1-5 demonstrated that a pair of extraneous tones, mistuned from harmonic values of the fundamental frequency of the vowel, could influence perceived vowel quality when added in the F1 region. Perceived F1 frequency was lowered when the tones were added on the lower skirt of F1, and raised when they were added on the upper skirt. Experiments 6 and 7 demonstrated that adding a narrow-band noise in the F1 region could produce a similar pattern of boundary shifts, despite the differences in temporal properties and timbre between a noise band and a voiced vowel. The data are interpreted using the concept of the harmonic sieve [Duifhuis et al., J. Acoust. Soc. Am. 71, 1568-1580 (1982)]. The results imply a partial failure of the harmonic sieve to exclude extraneous sounds from the perceptual estimation of F1 frequency. Implications for the nature of the hypothetical harmonic sieve are discussed.  相似文献   

12.
Native speakers of Mandarin Chinese have difficulty producing native-like English stress contrasts. Acoustically, English lexical stress is multidimensional, involving manipulation of fundamental frequency (F0), duration, intensity and vowel quality. Errors in any or all of these correlates could interfere with perception of the stress contrast, but it is unknown which correlates are most problematic for Mandarin speakers. This study compares the use of these correlates in the production of lexical stress contrasts by 10 Mandarin and 10 native English speakers. Results showed that Mandarin speakers produced significantly less native-like stress patterns, although they did use all four acoustic correlates to distinguish stressed from unstressed syllables. Mandarin and English speakers' use of amplitude and duration were comparable for both stressed and unstressed syllables, but Mandarin speakers produced stressed syllables with a higher F0 than English speakers. There were also significant differences in formant patterns across groups, such that Mandarin speakers produced English-like vowel reduction in certain unstressed syllables, but not in others. Results suggest that Mandarin speakers' production of lexical stress contrasts in English is influenced partly by native-language experience with Mandarin lexical tones, and partly by similarities and differences between Mandarin and English vowel inventories.  相似文献   

13.
The effect of diminished auditory feedback on monophthong and diphthong production was examined in postlingually deafened Australian-English speaking adults. The participants were 4 female and 3 male speakers with severe to profound hearing loss, who were compared to 11 age- and accent-matched normally hearing speakers. The test materials were 5 repetitions of hVd words containing 18 vowels. Acoustic measures that were studied included F1, F2, discrete cosine transform coefficients (DCTs), and vowel duration information. The durational analyses revealed increased total vowel durations with a maintenance of the tense/lax vowel distinctions in the deafened speakers. The deafened speakers preserved a differentiated vowel space, although there were some gender-specific differences seen. For example, there was a retraction of F2 in the front vowels for the female speakers that did not occur in the males. However, all deafened speakers showed a close correspondence between the monophthong and diphthong formant movements that did occur. Gaussian classification highlighted vowel confusions resulting from changes in the deafened vowel space. The results support the view that postlingually deafened speakers maintain reasonably good speech intelligibility, in part by employing production strategies designed to bolster auditory feedback.  相似文献   

14.
This study examined whether individuals with a wide range of first-language vowel systems (Spanish, French, German, and Norwegian) differ fundamentally in the cues that they use when they learn the English vowel system (e.g., formant movement and duration). All subjects: (1) identified natural English vowels in quiet; (2) identified English vowels in noise that had been signal processed to flatten formant movement or equate duration; (3) perceptually mapped best exemplars for first- and second-language synthetic vowels in a five-dimensional vowel space that included formant movement and duration; and (4) rated how natural English vowels assimilated into their L1 vowel categories. The results demonstrated that individuals with larger and more complex first-language vowel systems (German and Norwegian) were more accurate at recognizing English vowels than were individuals with smaller first-language systems (Spanish and French). However, there were no fundamental differences in what these individuals learned. That is, all groups used formant movement and duration to recognize English vowels, and learned new aspects of the English vowel system rather than simply assimilating vowels into existing first-language categories. The results suggest that there is a surprising degree of uniformity in the ways that individuals with different language backgrounds perceive second language vowels.  相似文献   

15.
Formant dynamics in vowel nuclei contribute to vowel classification in English. This study examined listeners' ability to discriminate dynamic second formant transitions in synthetic high front vowels. Acoustic measurements were made from the nuclei (steady state and 20% and 80% of vowel duration) for the vowels /i, I, e, epsilon, ae/ spoken by a female in /bVd/ context. Three synthesis parameters were selected to yield twelve discrimination conditions: initial frequency value for F2 (2525, 2272, or 2068 Hz), slope direction (rising or falling), and duration (110 or 165 ms). F1 frequency was roved. In the standard stimuli, F0 and F1-F4 were steady state. In the comparison stimuli only F2 frequency varied linearly to reach a final frequency. Five listeners were tested under adaptive tracking to estimate the threshold for frequency extent, the minimal detectable difference in frequency between the initial and final F2 values, called deltaF extent. Analysis showed that initial F2 frequency and direction of movement for some F2 frequencies contributed to significant differences in deltaF extent. Results suggested that listeners attended to differences in the stimulus property of frequency extent (hertz), not formant slope (hertz/second). Formant extent thresholds were at least four times smaller than extents measured in the natural speech tokens, and 18 times smaller than for the diphthongized vowel /e/.  相似文献   

16.
Two auditory feedback perturbation experiments were conducted to examine the nature of control of the first two formants in vowels. In the first experiment, talkers heard their auditory feedback with either F1 or F2 shifted in frequency. Talkers altered production of the perturbed formant by changing its frequency in the opposite direction to the perturbation but did not produce a correlated alteration of the unperturbed formant. Thus, the motor control system is capable of fine-grained independent control of F1 and F2. In the second experiment, a large meta-analysis was conducted on data from talkers who received feedback where both F1 and F2 had been perturbed. A moderate correlation was found between individual compensations in F1 and F2 suggesting that the control of F1 and F2 is processed in a common manner at some level. While a wide range of individual compensation magnitudes were observed, no significant correlations were found between individuals' compensations and vowel space differences. Similarly, no significant correlations were found between individuals' compensations and variability in normal vowel production. Further, when receiving normal auditory feedback, most of the population exhibited no significant correlation between the natural variation in production of F1 and F2.  相似文献   

17.
A perceptual analysis of the French vowel [u] produced by 10 speakers under normal and perturbed conditions (Savariaux et al., 1995) is presented which aims at characterizing in the perceptual domain the task of a speaker for this vowel, and, then, at understanding the strategies developed by the speakers to deal with the lip perturbation. Identification and rating tests showed that the French [u] is perceptually fairly well described in the [F1, (F2-F0)] plane, and that the parameter (((F2-F0) + F1)/2) (all frequencies in bark) provides a good overall correlate of the "grave" feature classically used to describe the vowel [u] in all languages. This permitted reanalysis of the behavior of the speakers during the perturbation experiment. Three of them succeed in producing a good [u] in spite of the lip tube, thanks to a combination of limited changes on F1 and (F2-F0), but without producing the strong backward movement of the tongue, which would be necessary to keep the [F1,F2] pattern close to the one measured in normal speech. The only speaker who strongly moved his tongue back and maintained F1 and F2 at low values did not produce a perceptually well-rated [u], but additional tests demonstrate that this gesture allowed him to preserve the most important phonetic features of the French [u], which is primarily a back and rounded vowel. It is concluded that speech production is clearly guided by perceptual requirements, and that the speakers have a good representation of them, even if they are not all able to meet them in perturbed conditions.  相似文献   

18.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

19.
Acoustic characteristics of American English sentence stress produced by native Mandarin speakers are reported. Fundamental frequency (F0), vowel duration, and vowel intensity in the sentence-level stress produced by 40 Mandarin speakers were compared to those of 40 American English speakers. Results obtained from two methods of stress calculation indicated that Mandarin speakers of American English are able to differentiate stressed and unstressed words according to features of F0, duration, and intensity. Although the group of Mandarin speakers were able to signal stress in their sentence productions, the acoustic characteristics of stress were not identical to the American speakers. Mandarin speakers were found to produce stressed words with a significantly higher F0 and shorter duration compared to the American speakers. The groups also differed in production of unstressed words with Mandarin speakers using a higher F0 and greater intensity compared to American speakers. Although the acoustic differences observed may reflect an interference of L1 Mandarin in the production of L2 American English, the outcome of this study suggests no critical divergence between these speakers in the way they implement American English sentence stress.  相似文献   

20.
This paper examines vowel production in Swedish adolescents with cochlear implants. Twelve adolescents with cochlear implants and 11 adolescents with normal hearing participated. Measurements were made of the first and second formants in all the nine long Swedish vowels. The values in hertz were bark-transformed, and two measures of the size of the vowel space were obtained. The first of them was the average Euclidean distance in the F1-F2 plane between the nine vowels and the mean F1 and F2 values of all the vowels. The second was the mean Euclidean distance in the F1-F2 plane between all the vowels. The results showed a significant difference for both vowel space measures between the two groups of adolescents. The cochlear implant users had a smaller space than the adolescents with normal hearing. In general, the size of the vowel space showed no correlations with measures of receptive and productive linguistic abilities. However, the results of an identification test showed that the listeners made more confusions of the vowels produced by speakers who had a small mean distance in the F1-F2 plane between all the vowels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号