首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Adults whose native languages permit syllable-final obstruents, and show a vocalic length distinction based on the voicing of those obstruents, consistently weight vocalic duration strongly in their perceptual decisions about the voicing of final stops, at least in laboratory studies using synthetic speech. Children, on the other hand, generally disregard such signal properties in their speech perception, favoring formant transitions instead. These age-related differences led to the prediction that children learning English as a native language would weight vocalic duration less than adults, but weight syllable-final transitions more in decisions of final-consonant voicing. This study tested that prediction. In the first experiment, adults and children (eight and six years olds) labeled synthetic and natural CVC words with voiced or voiceless stops in final C position. Predictions were strictly supported for synthetic stimuli only. With natural stimuli it appeared that adults and children alike weighted syllable-offset transitions strongly in their voicing decisions. The predicted age-related difference in the weighting of vocalic duration was seen for these natural stimuli almost exclusively when syllable-final transitions signaled a voiced final stop. A second experiment with adults and children (seven and five years old) replicated these results for natural stimuli with four new sets of natural stimuli. It was concluded that acoustic properties other than vocalic duration might play more important roles in voicing decisions for final stops than commonly asserted, sometimes even taking precedence over vocalic duration.  相似文献   

2.
Fundamental frequency (F0) and voice onset time (VOT) were measured in utterances containing voiceless aspirated [ph, th, kh], voiceless unaspirated [sp, st, sk], and voiced [b, d, g] stop consonants produced in the context of [i, e, u, o, a] by 8- to 9-year-old subjects. The results revealed that VOT reliably differentiated voiceless aspirated from voiceless unaspirated and voiced stops, whereas F0 significantly contrasted voiced with voiceless aspirated and unaspirated stops, except for the first glottal period, where voiceless unaspirated stops contrasted with the other two categories. Fundamental frequency consistently differentiated vowel height in alveolar and velar stop consonant environments only. In comparing the results of these children and of adults, it was observed that the acoustic correlates of stop consonant voicing and vowel quality were different not only in absolute values, but also in terms of variability. Further analyses suggested that children were more variable in production due to inconsistency in achieving specific targets. The findings also suggest that, of the acoustic correlates of the voicing feature, the primary distinction of VOT is strongly developed by 8-9 years of age, whereas the secondary distinction of F0 is still in an emerging state.  相似文献   

3.
Learning to speak involves both mastering the requisite articulatory gestures of one's native language and learning to coordinate those gestures according to the rules of the language. Voice onset time (VOT) acquisition illustrates this point: The child must learn to produce the necessary upper vocal tract and laryngeal gestures and to coordinate them with very precise timing. This longitudinal study examined the acquisition of English VOT by audiotaping seven children at 2 month intervals from first words (around 15 months) to the appearance of three-word sentences (around 30 months) in spontaneous speech. Words with initial stops were excerpted, and (1) the numbers of words produced with intended voiced and voiceless initial stops were counted; (2) VOT was measured; and (3) within-child standard deviations of VOT were measured. Results showed that children (1) initially avoided saying words with voiceless initial stops, (2) initially did not delay the onset of the laryngeal adduction relative to the release of closure as long as adults do for voiceless stops, and (3) were more variable in VOT for voiceless than for voiced stops. Overall these results support a model of acquisition that focuses on the mastery of gestural coordination as opposed to the acquisition of segmental contrasts.  相似文献   

4.
The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues.  相似文献   

5.
Because laboratory studies are conducted in optimal listening conditions, often with highly stylized stimuli that attenuate or eliminate some naturally occurring cues, results may have constrained applicability to the "real world." Such studies show that English-speaking adults weight vocalic duration greatly and formant offsets slightly in voicing decisions for word-final obstruents. Using more natural stimuli, Nittrouer [J. Acoust. Soc. Am. 115, 1777-1790 (2004)] found different results, raising questions about what would happen if experimental conditions were even more like the real world. In this study noise was used to simulate the real world. Edited natural words with voiced and voiceless final stops were presented in quiet and noise to adults and children (4 to 8 years) for labeling. Hypotheses tested were (1) Adults (and perhaps older children) would weight vocalic duration more in noise than in quiet; (2) Previously reported age-related differences in cue weighting might not be found in this real-world simulation; and (3) Children would experience greater masking than adults. Results showed: (1) no increase for any age listeners in the weighting of vocalic duration in noise; (2) age-related differences in the weighting of cues in both quiet and noise; and (3) masking effects for all listeners, but more so for children than adults.  相似文献   

6.
A method for distinguishing burst onsets of voiceless stop consonants in terms of place of articulation is described. Four speakers produced the voiceless stops in word-initial position in six vowel contexts. A metric was devised to extract the characteristic burst-friction components at burst onset. The burst-friction components, derived from the metric as sensory formants, were then transformed into log frequency ratios and plotted as points in an auditory-perceptual space (APS). In the APS, each place of articulation was seen to be associated with a distinct region, or target zone. The metric was then applied to a test set of words with voiceless stops preceding ten different vowel contexts as produced by eight new speakers. The present method of analyzing voiceless stops in English enabled us to distinguish place of articulation in these new stimuli with 70% accuracy.  相似文献   

7.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

8.
Several types of measurements were made to determine the acoustic characteristics that distinguish between voiced and voiceless fricatives in various phonetic environments. The selection of measurements was based on a theoretical analysis that indicated the acoustic and aerodynamic attributes at the boundaries between fricatives and vowels. As expected, glottal vibration extended over a longer time in the obstruent interval for voiced fricatives than for voiceless fricatives, and there were more extensive transitions of the first formant adjacent to voiced fricatives than for the voiceless cognates. When two fricatives with different voicing were adjacent, there were substantial modifications of these acoustic attributes, particularly for the syllable-final fricative. In some cases, these modifications leads to complete assimilation of the voicing feature. Several perceptual studies with synthetic vowel-consonant-vowel stimuli and with edited natural stimuli examined the role of consonant duration, extent and location of glottal vibration, and extent of formant transitions on the identification of the voicing characteristics of fricatives. The perceptual results were in general consistent with the acoustic observations and with expectations based on the theoretical model. The results suggest that listeners base their voicing judgments of intervocalic fricatives on an assessment of the time interval in the fricative during which there is no glottal vibration. This time interval must exceed about 60 ms if the fricative is to be judged as voiceless, except that a small correction to this threshold is applied depending on the extent to which the first-formant transitions are truncated at the consonant boundaries.  相似文献   

9.
Voice onset time (VOT) is a perceptual cue in voicing contrast of stops in the word initial position. The current study aims to acoustically and perceptually characterize VOT in one of the major South Indian languages — Tulu. Stimuli consisted of 2 pairs of meaningful words with velar [/p/-/b/] and bilabial stops and [/k/-g/] in the initial position. These words were uttered by 8 normal native speakers of Tulu and recorded using Praat software. Both spectrogram and waveform views were used to identify the VOT. For perceptual experiment, 4 adult native speakers of Tulu were asked to identify the stimulus from where voicing was truncated in steps of 5 to 7 ms till lead VOT was 0 and silence was added after the burst in 5 msec steps till the lag VOT was 50 msec. The reaction time and the accuracy in identification were measured. Results of acoustic measurement showed no significant mean difference between lead VOTs of two voiced consonants. However, there was a significant difference between means of lag VOTs of voiceless consonants. Results of Perceptual measurement showed that as lead VOT reduces, probability of indentification of /g/ responses reduces; whereas changing VOT had little effect on reaction time and identification of /b/ responses. These results probably indicate that VOT is not necessary to perceive voiceles constants in Tulu but is necessary in the perception of voiced consonants. Thus VOT is a constant specific cue in Tulu.  相似文献   

10.
Acoustic duration and degree of vowel reduction are known to correlate with a word's frequency of occurrence. The present study broadens the research on the role of frequency in speech production to voice assimilation. The test case was regressive voice assimilation in Dutch. Clusters from a corpus of read speech were more often perceived as unassimilated in lower-frequency words and as either completely voiced (regressive assimilation) or, unexpectedly, as completely voiceless (progressive assimilation) in higher-frequency words. Frequency did not predict the voice classifications over and above important acoustic cues to voicing, suggesting that the frequency effects on the classifications were carried exclusively by the acoustic signal. The duration of the cluster and the period of glottal vibration during the cluster decreased while the duration of the release noises increased with frequency. This indicates that speakers reduce articulatory effort for higher-frequency words, with some acoustic cues signaling more voicing and others less voicing. A higher frequency leads not only to acoustic reduction but also to more assimilation.  相似文献   

11.
Research on children's speech perception and production suggests that consonant voicing and place contrasts may be acquired early in life, at least in word-onset position. However, little is known about the development of the acoustic correlates of later-acquired, word-final coda contrasts. This is of particular interest in languages like English where many grammatical morphemes are realized as codas. This study therefore examined how various non-spectral acoustic cues vary as a function of stop coda voicing (voiced vs. voiceless) and place (alveolar vs. velar) in the spontaneous speech of 6 American-English-speaking mother-child dyads. The results indicate that children as young as 1;6 exhibited many adult-like acoustic cues to voicing and place contrasts, including longer vowels and more frequent use of voice bar with voiced codas, and a greater number of bursts and longer post-release noise for velar codas. However, 1;6-year-olds overall exhibited longer durations and more frequent occurrence of these cues compared to mothers, with decreasing values by 2;6. Thus, English-speaking 1;6-year-olds already exhibit adult-like use of some of the cues to coda voicing and place, though implementation is not yet fully adult-like. Physiological and contextual correlates of these findings are discussed.  相似文献   

12.
This study was designed to examine the temporal acoustic differences between male trained singers and nonsingers during speaking and singing across voiced and voiceless English stop consonants. Recordings were made of 5 trained singers and 5 nonsingers, and acoustically analyzed for voice onset time (VOT). A mixed analysis of variance showed that the male trained singers had significantly longer mean VOT than did the nonsingers during voiceless stop production. Sung productions of voiceless stops had significantly longer mean VOTs than did the spoken productions. No significant differences were observed for the voiced stops, nor were any interactions observed. These results indicated that vocal training and phonatory task have a significant influence on VOT.  相似文献   

13.
This paper reports acoustic measurements and results from a series of perceptual experiments on the voiced-voiceless distinction for syllable-final stop consonants in absolute final position and in the context of a following syllable beginning with a different stop consonant. The focus is on temporal cues to the distinction, with vowel duration and silent closure duration as the primary and secondary dimensions, respectively. The main results are that adding a second syllable to a monosyllable increases the number of voiced stop consonant responses, as does shortening of the closure duration in disyllables. Both of these effects are consistent with temporal regularities in speech production: Vowel durations are shorter in the first syllable of disyllables than in monosyllables, and closure durations are shorter for voiced than for voiceless stops in disyllabic utterances of this type. While the perceptual effects thus may derive from two separate sources of tacit phonetic knowledge available to listeners, the data are also consistent with an interpretation in terms of a single effect; one of temporal proximity of following context.  相似文献   

14.
The cricothyroid muscle in voicing control   总被引:1,自引:0,他引:1  
Initiation and maintenance of vibrations of the vocal folds require suitable conditions of adduction, longitudinal tension, and transglottal airflow. Thus manipulation of adduction/abduction, stiffening/slackening, or degree of transglottal flow may, in principle, be used to determine the voicing status of a speech segment. This study explores the control of voicing and voicelessness in speech with particular reference to the role of changes in the longitudinal tension of the vocal folds, as indicated by cricothyroid (CT) muscle activity. Electromyographic recordings were made from the CT muscle in two speakers of American English and one speaker of Dutch. The linguistic material consisted of reiterant speech made up of CV syllables where the consonants were voiced and voiceless stops, fricatives, and affricates. Comparison of CT activity associated with the voiced and voiceless consonants indicated a higher level for the voiceless consonants than for their voiced cognates. Measurements of the fundamental frequency (F0) at the beginning of a vowel following the consonant show the common pattern of higher F0 after voiceless consonants. For one subject, there was no difference in cricothyroid activity for voiced and voiceless affricates; in this case, the consonant-induced variations in the F0 of the following vowel were also less robust. Consideration of timing relationships between the EMG curves for voiced and voiceless consonants suggests that the differences most likely reflect control of vocal-fold tension for maintenance or suppression of phonatory vibrations. The same mechanism also seems to contribute to the well-known difference in F0 at the beginning of vowels following voiced and voiceless consonants.  相似文献   

15.
We have examined the effects of the relative amplitude of the release burst on perception of the place of articulation of utterance-initial voiceless and voiced stop consonants. The amplitude of the burst, which occurs within the first 10-15 ms following consonant release, was systematically varied in 5-dB steps from -10 to +10 dB relative to a "normal" burst amplitude for two labial-to-alveolar synthetic speech continua--one comprising voiceless stops and the other, voiced stops. The distribution of spectral energy in the bursts for the labial and alveolar stops at the ends of the continuum was consistent with the spectrum shapes observed in natural utterances, and intermediate shapes were used for intermediate stimuli on the continuum. The results of identification tests with these stimuli showed that the relative amplitude of the burst significantly affected the perception of the place of articulation of both voiceless and voiced stops, but the effect was greater for the former than the latter. The results are consistent with a view that two basic properties contribute to the labial-alveolar distinction in English. One of these is determined by the time course of the change in amplitude in the high-frequency range (above 2500 Hz) in the few tens of ms following consonantal release, and the other is determined by the frequencies of spectral peaks associated with the second and third formants in relation to the first formant.  相似文献   

16.
Acoustic analyses were undertaken to explore the durational characteristics of the fricatives [f,theta,s,v,delta z] as cues to initial consonant voicing in English. Based on reports on the perception of voiced-voiceless fricatives, it was expected that there would be clear-cut duration differences distinguishing voiced and voiceless fricatives. Preliminary results for three speakers indicate that, although differences emerged in the overall mean duration of voiced and voiceless fricatives, contrary to expectations, there was a great deal of overlap in the duration distribution of voiced and voiceless fricative tokens. Further research is needed to examine the role of duration as a cue to syllable-initial fricative consonant voicing in English.  相似文献   

17.
Auditory-nerve fiber spike trains were recorded in response to spoken English stop consonant-vowel syllables, both voiced (/b,d,g/) and unvoiced (/p,t,k/), in the initial position of syllables with the vowels /i,a,u/. Temporal properties of the neural responses and stimulus spectra are displayed in a spectrographic format. The responses were categorized in terms of the fibers' characteristic frequencies (CF) and spontaneous rates (SR). High-CF, high-SR fibers generally synchronize to formants throughout the syllables. High-CF, low/medium-SR fibers may also synchronize to formants; however, during the voicing, there may be sufficient low-frequency energy present to suppress a fiber's synchronized response to a formant near its CF. Low-CF fibers, from both SR groups, synchronize to energy associated with voicing. Several proposed acoustic correlates to perceptual features of stop consonant-vowel syllables, including the initial spectrum, formant transitions, and voice-onset time, are represented in the temporal properties of auditory-nerve fiber responses. Nonlinear suppression affects the temporal features of the responses, particularly those of low/medium-spontaneous-rate fibers.  相似文献   

18.
Durations of the vocalic portions of speech are influenced by a large number of linguistic and nonlinguistic factors (e.g., stress and speaking rate). However, each factor affecting vowel duration may influence articulation in a unique manner. The present study examined the effects of stress and final-consonant voicing on the detailed structure of articulatory and acoustic patterns in consonant-vowel-consonant (CVC) utterances. Jaw movement trajectories and F 1 trajectories were examined for a corpus of utterances differing in stress and final-consonant voicing. Jaw lowering and raising gestures were more rapid, longer in duration, and spatially more extensive for stressed versus unstressed utterances. At the acoustic level, stressed utterances showed more rapid initial F 1 transitions and more extreme F 1 steady-state frequencies than unstressed utterances. In contrast to the results obtained in the analysis of stress, decreases in vowel duration due to devoicing did not result in a reduction in the velocity or spatial extent of the articulatory gestures. Similarly, at the acoustic level, the reductions in formant transition slopes and steady-state frequencies demonstrated by the shorter, unstressed utterances did not occur for the shorter, voiceless utterances. The results demonstrate that stress-related and voicing-related changes in vowel duration are accomplished by separate and distinct changes in speech production with observable consequences at both the articulatory and acoustic levels.  相似文献   

19.
As part of an investigation of the temporal implementation rules of English, measurements were made of voice-onset time for initial English stops and the duration of the following voiced vowel in monosyllabic words for New York City speakers. It was found that the VOT of a word-initial consonant was longer before a voiceless final cluster than before a single nasal, and longer before tense vowels than lax vowels. The vowels were also longer in environments where VOT was longer, but VOT did not maintain a constant ratio with the vowel duration, even for a single place of articulation. VOT was changed by a smaller proportion than the following voiced vowel in both cases. VOT changes associated with the vowel were consistent across place of articulation of the stop. In the final experiment, when vowel tensity and final consonant effects were combined, it was found that the proportion of vowel duration change that carried over to the preceding VOT is different for the two phonetic changes. These results imply that temporal implementation rules simultaneously influence several acoustic intervals including both VOT and the "inherent" interval corresponding to a segment, either by independent control of the relevant articulatory variables or by some unknown common mechanism.  相似文献   

20.
Interaction of Korean and English stop systems in Korean-English bilinguals as a function of age of acquisition (AOA) of English was investigated. It was hypothesized that early bilinguals (mean AOA=3.8 years) would more likely be native-like in production of English and Korean stops and maintain greater independence between Korean and English stop systems than late bilinguals (mean AOA=21.4 years). Production of Korean and English stops was analyzed in terms of three acoustic-phonetic properties: voice-onset time, amplitude difference between the first two harmonics, and fundamental frequency. Late bilinguals were different from English monolinguals for English voiceless and voiced stops in all three properties. As for Korean stops, late bilinguals were different from Korean monolinguals for fortis stops in voice-onset time. Early bilinguals were not different from the monolinguals of either language. Considering the independence of the two stop systems, late bilinguals seem to have merged English voiceless and Korean aspirated stops and produced English voiced stops with similarities to both Korean fortis and lenis stops, whereas early bilinguals produced five distinct stop types. Thus, the early bilinguals seem to have two independent stop systems, whereas the late bilinguals likely have a merged Korean-English system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号