首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Measurements of the temporal characteristics of word-initial stressed syllables in CV CV-type words in Modern Greek showed that the timing of the initial consonant in terms of its closure duration and voice onset time (VOT) is dependent on place and manner of articulation. This is contrary to recent accounts of word-initial voiceless consonants in English which propose that closure and VOT together comprise a voiceless interval independent of place and manner of articulation. The results also contribute to the development of a timing model for Modern Greek which generates closure, VOT, and vowel durations for word-initial, stressed CV syllables. The model is made up of a series of rules operating in an ordered fashion on a given word duration to derive first a stressed syllable duration and then all intrasyllabic acoustic intervals.  相似文献   

2.
The purpose of this investigation was to study the effects of consonant environment on vowel duration for normally hearing males, hearing-impaired males with intelligible speech, and hearing-impaired males with semi-intelligible speech. The results indicated that the normally hearing and intelligible hearing-impaired speakers exhibited similar trends with respect to consonant influence on vowel duration; i.e., vowels were longer in duration, in a voiced environment as compared with a voiceless, and in a fricative environment as compared with a plosive. The semi-intelligible hearing-impaired speakers, however, failed to demonstrate a consonant effect on vowel duration, and produced the vowels with significantly longer durations when compared with the other two groups of speakers. These data provide information regarding temporal conditions which may contribute to the decreased intelligibility of hearing-impaired persons.  相似文献   

3.
A production study was conducted to investigate the effect of vowel lengthening before voiced obstruents, and the possible influence that the openness versus closedness of syllables have on the temporal structure of vowels in some languages. The results revealed that vowels were significantly longer when followed by voiced consonants than voiceless consonants. Vowel duration did not, however, vary with syllable structure. However, vowels in open syllables followed by [+ voiced] consonants tended to be longer than when the following consonants were [- voiced]. These results are discussed in the context of current knowledge of other languages.  相似文献   

4.
Most investigators agree that the acoustic information for American English vowels includes dynamic (time-varying) parameters as well as static "target" information contained in a single cross section of the syllable. Using the silent-center (SC) paradigm, the present experiment examined the case in which the initial and final portions of stop consonant-vowel-stop consonant (CVC) syllables containing the same vowel but different consonants were recombined into mixed-consonant SC syllables and presented to listeners for vowel identification. Ten vowels were spoken in six different syllables, /b Vb, bVd, bVt, dVb, dVd, dVt/, embedded in a carrier sentence. Initial and final transitional portions of these syllables were cross-matched in: (1) silent-center syllables with original syllable durations (silences) preserved (mixed-consonant SC condition) and (2) mixed-consonant SC syllables with syllable duration equated across the ten vowels (fixed duration mixed-consonant SC condition). Vowel-identification accuracy in these two mixed consonant SC conditions was compared with performance on the original SC and fixed duration SC stimuli, and in initial and final control conditions in which initial and final transitional portions were each presented alone. Vowels were identified highly accurately in both mixed-consonant SC and original syllable SC conditions (only 7%-8% overall errors). Neutralizing duration information led to small, but significant, increases in identification errors in both mixed-consonant and original fixed-duration SC conditions (14%-15% errors), but performance was still much more accurate than for initial and finals control conditions (35% and 52% errors, respectively). Acoustical analysis confirmed that direction and extent of formant change from initial to final portions of mixed-consonant stimuli differed from that of original syllables, arguing against a target + offglide explanation of the perceptual results. Results do support the hypothesis that temporal trajectories specifying "style of movement" provide information for the differentiation of American English tense and lax vowels, and that this information is invariant over the place of articulation and voicing of the surrounding stop consonants.  相似文献   

5.
Slope and y-intercepts of locus equations have previously been shown to successfully classify place of articulation for English voiced stop consonants when derived from measurements at vowel onset and vowel midpoint. However, listeners are capable of identifying English voiced stops when less than 30 ms of vowel is presented. The present results show that modified locus equation measurements made within the first several pitch periods of a vowel following an English voiced stop were also successful at classifying place of articulation, consistent with the amount of vocalic information necessary for perceptual identification of English voiced stops /b d g/.  相似文献   

6.
An acoustic analysis of whispered consonants in comparison to normally phonated consonants was conducted in time and intensity domains. Consonant duration and average root mean square intensity were measured for six speakers in both articulation modes. Each of 25 Serbian consonants (C) was sited between the vowel /a/ forming a syllable of /aCa/ type. Such a syllable was placed in initial, medial, and final position in the carrier sentence. Results showed that whispered consonants have a prolonged duration of about 10% on average (statistically significant, ANOVA test), and that the unvoiced consonants have a smaller time dimension extension (5.8%) than voiced ones (15.3%). Examination at subphonemic level showed that there is no difference in voice-onset-time and affrication duration in unvoiced plosives and affricates, in both whispered and phonated mode of articulation, but the difference is significant for voiced ones. Analysis of consonant duration versus place of articulation showed that palatal place is most sensitive in the process of whispering. In all experiments, the results are very consistent with respect to the subjects and test material (Pearson's correlation was between 0.6 and 0.9). In intensity domain, all unvoiced consonants in whispered mode of articulation have almost unchanged intensity in comparison to phonated mode (the difference is maximum 3.5 dB). On the contrary, voiced consonants in the whispered mode were reduced in intensity by as much as 25 dB, as nasals and semivowels. Average intensity of whispered consonants is lowered by 12d B in comparison to phonated ones, and does not depend on syllabic position inside the sentences.  相似文献   

7.
This paper reports acoustic measurements and results from a series of perceptual experiments on the voiced-voiceless distinction for syllable-final stop consonants in absolute final position and in the context of a following syllable beginning with a different stop consonant. The focus is on temporal cues to the distinction, with vowel duration and silent closure duration as the primary and secondary dimensions, respectively. The main results are that adding a second syllable to a monosyllable increases the number of voiced stop consonant responses, as does shortening of the closure duration in disyllables. Both of these effects are consistent with temporal regularities in speech production: Vowel durations are shorter in the first syllable of disyllables than in monosyllables, and closure durations are shorter for voiced than for voiceless stops in disyllabic utterances of this type. While the perceptual effects thus may derive from two separate sources of tacit phonetic knowledge available to listeners, the data are also consistent with an interpretation in terms of a single effect; one of temporal proximity of following context.  相似文献   

8.
The perception of voicing in final velar stop consonants was investigated by systematically varying vowel duration, change in offset frequency of the final first formant (F1) transition, and rate of frequency change in the final F1 transition for several vowel contexts. Consonant-vowel-consonant (CVC) continua were synthesized for each of three vowels, [i,I,ae], which represent a range of relatively low to relatively high-F1 steady-state values. Subjects responded to the stimuli under both an open- and closed-response condition. Results of the study show that both vowel duration and F1 offset properties influence perception of final consonant voicing, with the salience of the F1 offset property higher for vowels with high-F1 steady-state frequencies than low-F1 steady-state frequencies, and the opposite occurring for the vowel duration property. When F1 onset and offset frequencies were controlled, rate of the F1 transition change had inconsistent and minimal effects on perception of final consonant voicing. Thus the findings suggest that it is the termination value of the F1 offset transition rather than rate and/or duration of frequency change, which cues voicing in final velar stop consonants during the transition period preceding closure.  相似文献   

9.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

10.
This study was designed to examine the temporal acoustic differences between male trained singers and nonsingers during speaking and singing across voiced and voiceless English stop consonants. Recordings were made of 5 trained singers and 5 nonsingers, and acoustically analyzed for voice onset time (VOT). A mixed analysis of variance showed that the male trained singers had significantly longer mean VOT than did the nonsingers during voiceless stop production. Sung productions of voiceless stops had significantly longer mean VOTs than did the spoken productions. No significant differences were observed for the voiced stops, nor were any interactions observed. These results indicated that vocal training and phonatory task have a significant influence on VOT.  相似文献   

11.
Fundamental frequency (F0) and voice onset time (VOT) were measured in utterances containing voiceless aspirated [ph, th, kh], voiceless unaspirated [sp, st, sk], and voiced [b, d, g] stop consonants produced in the context of [i, e, u, o, a] by 8- to 9-year-old subjects. The results revealed that VOT reliably differentiated voiceless aspirated from voiceless unaspirated and voiced stops, whereas F0 significantly contrasted voiced with voiceless aspirated and unaspirated stops, except for the first glottal period, where voiceless unaspirated stops contrasted with the other two categories. Fundamental frequency consistently differentiated vowel height in alveolar and velar stop consonant environments only. In comparing the results of these children and of adults, it was observed that the acoustic correlates of stop consonant voicing and vowel quality were different not only in absolute values, but also in terms of variability. Further analyses suggested that children were more variable in production due to inconsistency in achieving specific targets. The findings also suggest that, of the acoustic correlates of the voicing feature, the primary distinction of VOT is strongly developed by 8-9 years of age, whereas the secondary distinction of F0 is still in an emerging state.  相似文献   

12.
This study examined the temporal phasing of tongue and lip movements in vowel-consonant-vowel sequences where the consonant is a bilabial stop consonant /p, b/ and the vowels one of /i, a, u/; only asymmetrical vowel contexts were included in the analysis. Four subjects participated. Articulatory movements were recorded using a magnetometer system. The onset of the tongue movement from the first to the second vowel almost always occurred before the oral closure. Most of the tongue movement trajectory from the first to the second vowel took place during the oral closure for the stop. For all subjects, the onset of the tongue movement occurred earlier with respect to the onset of the lip closing movement as the tongue movement trajectory increased. The influence of consonant voicing and vowel context on interarticulator timing and tongue movement kinematics varied across subjects. Overall, the results are compatible with the hypothesis that there is a temporal window before the oral closure for the stop during which the tongue movement can start. A very early onset of the tongue movement relative to the stop closure together with an extensive movement before the closure would most likely produce an extra vowel sound before the closure.  相似文献   

13.
The cricothyroid muscle in voicing control   总被引:1,自引:0,他引:1  
Initiation and maintenance of vibrations of the vocal folds require suitable conditions of adduction, longitudinal tension, and transglottal airflow. Thus manipulation of adduction/abduction, stiffening/slackening, or degree of transglottal flow may, in principle, be used to determine the voicing status of a speech segment. This study explores the control of voicing and voicelessness in speech with particular reference to the role of changes in the longitudinal tension of the vocal folds, as indicated by cricothyroid (CT) muscle activity. Electromyographic recordings were made from the CT muscle in two speakers of American English and one speaker of Dutch. The linguistic material consisted of reiterant speech made up of CV syllables where the consonants were voiced and voiceless stops, fricatives, and affricates. Comparison of CT activity associated with the voiced and voiceless consonants indicated a higher level for the voiceless consonants than for their voiced cognates. Measurements of the fundamental frequency (F0) at the beginning of a vowel following the consonant show the common pattern of higher F0 after voiceless consonants. For one subject, there was no difference in cricothyroid activity for voiced and voiceless affricates; in this case, the consonant-induced variations in the F0 of the following vowel were also less robust. Consideration of timing relationships between the EMG curves for voiced and voiceless consonants suggests that the differences most likely reflect control of vocal-fold tension for maintenance or suppression of phonatory vibrations. The same mechanism also seems to contribute to the well-known difference in F0 at the beginning of vowels following voiced and voiceless consonants.  相似文献   

14.
An important speech cue is that of voice onset time (VOT), a cue for the perception of voicing and aspiration in word-initial stops. Preaspiration, an [h]-like sound between a vowel and the following stop, can be cued by voice offset time, a cue which in most respects mirrors VOT. In Icelandic VOffT is much more sensitive to the duration of the preceding vowel than is VOT to the duration of the following vowel. This has been explained by noting that preaspiration can only follow a phonemically short vowel. Lengthening of the vowel, either by changing its duration or by moving the spectrum towards that appropriate for a long vowel, will thus demand a longer VOffT to cue preaspiration. An experiment is reported showing that this greater effect that vowel quantity has on the perception of VOffT than on the perception of VOT cannot be explained by the effect of F1 frequency at vowel offset.  相似文献   

15.
The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues.  相似文献   

16.
Intelligibility of average talkers in typical listening environments   总被引:1,自引:0,他引:1  
Intelligibility of conversationally produced speech for normal hearing listeners was studied for three male and three female talkers. Four typical listening environments were used. These simulated a quiet living room, a classroom, and social events in two settings with different reverberation characteristics. For each talker, overall intelligibility and intelligibility for vowels, consonant voicing, consonant continuance, and consonant place were quantified using the speech pattern contrast (SPAC) test. Results indicated that significant intelligibility differences are observed among normal talkers even in listening environments that permit essentially full intelligibility for everyday conversations. On the whole, talkers maintained their relative intelligibility across the four environments, although there was one exception which suggested that some voices may be particularly susceptible to degradation due to reverberation. Consonant place was the most poorly perceived feature, followed by continuance, voicing, and vowel intelligibility. However, there were numerous significant interactions between talkers and speech features, indicating that a talker of average overall intelligibility may produce certain speech features with intelligibility that is considerably higher or lower than average. Neither long-term rms speech spectrum nor articulation rate was found to be an adequate single criterion for selecting a talker of average intelligibility. Ultimately, an average talker was chosen on the basis of four speech contrasts: initial consonant place, and final consonant place, voicing, and continuance.  相似文献   

17.
The voice onset time (VOT) of a stop consonant is the interval between its burst onset and voicing onset. Among a variety of research topics on VOT, one that has been studied for years is how VOTs are efficiently measured. Manual annotation is a feasible way, but it becomes a time-consuming task when the corpus size is large. This paper proposes an automatic VOT estimation method based on an onset detection algorithm. At first, a forced alignment is applied to identify the locations of stop consonants. Then a random forest based onset detector searches each stop segment for its burst and voicing onsets to estimate a VOT. The proposed onset detection can detect the onsets in an efficient and accurate manner with only a small amount of training data. The evaluation data extracted from the TIMIT corpus were 2344 words with a word-initial stop. The experimental results showed that 83.4% of the estimations deviate less than 10 ms from their manually labeled values, and 96.5% of the estimations deviate by less than 20 ms. Some factors that influence the proposed estimation method, such as place of articulation, voicing of a stop consonant, and quality of succeeding vowel, were also investigated.  相似文献   

18.
Research on children's speech perception and production suggests that consonant voicing and place contrasts may be acquired early in life, at least in word-onset position. However, little is known about the development of the acoustic correlates of later-acquired, word-final coda contrasts. This is of particular interest in languages like English where many grammatical morphemes are realized as codas. This study therefore examined how various non-spectral acoustic cues vary as a function of stop coda voicing (voiced vs. voiceless) and place (alveolar vs. velar) in the spontaneous speech of 6 American-English-speaking mother-child dyads. The results indicate that children as young as 1;6 exhibited many adult-like acoustic cues to voicing and place contrasts, including longer vowels and more frequent use of voice bar with voiced codas, and a greater number of bursts and longer post-release noise for velar codas. However, 1;6-year-olds overall exhibited longer durations and more frequent occurrence of these cues compared to mothers, with decreasing values by 2;6. Thus, English-speaking 1;6-year-olds already exhibit adult-like use of some of the cues to coda voicing and place, though implementation is not yet fully adult-like. Physiological and contextual correlates of these findings are discussed.  相似文献   

19.
This study investigated the extent to which adult Japanese listeners' perceived phonetic similarity of American English (AE) and Japanese (J) vowels varied with consonantal context. Four AE speakers produced multiple instances of the 11 AE vowels in six syllabic contexts /b-b, b-p, d-d, d-t, g-g, g-k/ embedded in a short carrier sentence. Twenty-four native speakers of Japanese were asked to categorize each vowel utterance as most similar to one of 18 Japanese categories [five one-mora vowels, five two-mora vowels, plus/ei, ou/ and one-mora and two-mora vowels in palatalized consonant CV syllables, C(j)a(a), C(j)u(u), C(j)o(o)]. They then rated the "category goodness" of the AE vowel to the selected Japanese category on a seven-point scale. None of the 11 AE vowels was assimilated unanimously to a single J response category in all context/speaker conditions; consistency in selecting a single response category ranged from 77% for /eI/ to only 32% for /ae/. Median ratings of category goodness for modal response categories were somewhat restricted overall, ranging from 5 to 3. Results indicated that temporal assimilation patterns (judged similarity to one-mora versus two-mora Japanese categories) differed as a function of the voicing of the final consonant, especially for the AE vowels, /see text/. Patterns of spectral assimilation (judged similarity to the five J vowel qualities) of /see text/ also varied systematically with consonantal context and speakers. On the basis of these results, it was predicted that relative difficulty in the identification and discrimination of AE vowels by Japanese speakers would vary significantly as a function of the contexts in which they were produced and presented.  相似文献   

20.
This study examined whether individuals with a wide range of first-language vowel systems (Spanish, French, German, and Norwegian) differ fundamentally in the cues that they use when they learn the English vowel system (e.g., formant movement and duration). All subjects: (1) identified natural English vowels in quiet; (2) identified English vowels in noise that had been signal processed to flatten formant movement or equate duration; (3) perceptually mapped best exemplars for first- and second-language synthetic vowels in a five-dimensional vowel space that included formant movement and duration; and (4) rated how natural English vowels assimilated into their L1 vowel categories. The results demonstrated that individuals with larger and more complex first-language vowel systems (German and Norwegian) were more accurate at recognizing English vowels than were individuals with smaller first-language systems (Spanish and French). However, there were no fundamental differences in what these individuals learned. That is, all groups used formant movement and duration to recognize English vowels, and learned new aspects of the English vowel system rather than simply assimilating vowels into existing first-language categories. The results suggest that there is a surprising degree of uniformity in the ways that individuals with different language backgrounds perceive second language vowels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号