首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Measurements were made of saggital plane movements of the larynx, soft palate, and portions of the tongue, from a high-speed cinefluorographic film of utterances produced by one adult male speaker of American English. These measures were then used to approximate the temporal variations in supraglottal cavity volume during the closures of voiced and voiceless stop consonants. All data were subsequently related to a synchronous acoustic recording of the utterances. Instances of /p,t,k/ were always accompanied by silent closures, and sometimes accompanied by decreases in supraglottal volume. In contrast, instances of /b,d,g/ were always accompanied both by significant intervals of vocal fold vibration during closure, and relatively large increases in supraglottal volume. However, the magnitudes of volume increments during the voiced stops, and the means by which those increments were achieved, differed considerably across place of articulation and phonetic environment. These results are discussed in the context of a well-known model of the breath-stream control mechanism, and their relevance for a general theory of speech motor control is considered.  相似文献   

2.
Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.  相似文献   

3.
This paper describes acoustic cues for classification of consonant voicing in a distinctive feature-based speech recognition system. Initial acoustic cues are selected by studying consonant production mechanisms. Spectral representations, band-limited energies, and correlation values, along with Mel-frequency cepstral coefficients features (MFCCs) are also examined. Analysis of variance is performed to assess relative significance of features. Overall, 82.2%, 80.6%, and 78.4% classification rates are obtained on the TIMIT database for stops, fricatives, and affricates, respectively. Combining acoustic parameters with MFCCs shows performance improvement in all cases. Also, performance in the NTIMIT telephone channel speech shows that acoustic parameters are more robust than MFCCs.  相似文献   

4.
Acoustic measurements were conducted to determine the degree to which vowel duration, closure duration, and their ratio distinguish voicing of word-final stop consonants across variations in sentential and phonetic environments. Subjects read CVC test words containing three different vowels and ending in stops of three different places of articulation. The test words were produced either in nonphrase-final or phrase-final position and in several local phonetic environments within each of these sentence positions. Our measurements revealed that vowel duration most consistently distinguished voicing categories for the test words. Closure duration failed to consistently distinguish voicing categories across the contextual variables manipulated, as did the ratio of closure and vowel duration. Our results suggest that vowel duration is the most reliable correlate of voicing for word-final stops in connected speech.  相似文献   

5.
The speech production skills of 12 dysphasic children and of 12 normal children were compared. The dysphasic children were found to have significantly greater difficulty than the normal children in producing stop consonants. In addition, it was found that seven of the dysphasic children, who had difficulty in perceiving initial stop consonants, had greater difficulty in producing stop consonants than the remaining five dysphasic children who showed no such perceptual difficulty. A detailed phonetic analysis indicated that the dysphasic children seldom omitted stops or substituted nonstop for stop consonants. Instead, their errors were predominantly of voicing or place of articulation. Acoustic analyses suggested that the voicing errors were related to lack of precise control over the timing of speech events, specifically, voice onset time for initial stops and vowel duration preceding final stops. The number of voicing errors on final stops, however, was greater than expected on the basis of lack of differentiation of vowel duration alone. They appeared also to be related to a tendency in the dysphasic children to produce final stops with exaggerated aspiration. The possible relationship of poor timing control in speech production in these children and auditory temporal processing deficits in speech perception is discussed.  相似文献   

6.
In obstruent consonants, a major constriction in the upper vocal tract yields an increase in intraoral pressure (P(io)). Phonation requires that subglottal pressure (P(sub)) exceed P(io) by a threshold value, so as the transglottal pressure reaches the threshold, phonation will cease. This work investigates how P(io) levels at phonation offset and onset vary before and after different German voiceless obstruents (stop, fricative, affricates, clusters), and with following high vs low vowels. Articulatory contacts, measured using electropalatography, were recorded simultaneously with P(io) to clarify how supraglottal constrictions affect P(io). Effects of consonant type on phonation thresholds could be explained mainly in terms of the magnitude and timing of vocal-fold abduction. Phonation offset occurred at lower values of P(io) before fricative-initial sequences than stop-initial sequences, and onset occurred at higher levels of P(io) following the unaspirated stops of clusters compared to fricatives, affricates, and aspirated stops. The vowel effects were somewhat surprising: High vowels had an inhibitory effect at voicing offset (phonation ceasing at lower values of P(io)) in short-duration consonant sequences, but a facilitating effect on phonation onset that was consistent across consonantal contexts. The vowel influences appear to reflect a combination of vocal-fold characteristics and vocal-tract impedance.  相似文献   

7.
This study examines English speakers' relative weighting of two voicing cues in production and perception. Participants repeated words differing in initial consonant voicing ([b] or [p]) and labeled synthesized tokens ranging between [ba] and [pa] orthogonally according to voice onset time (VOT) and onset f0. Discriminant function analysis and logistic regression were used to calculate individuals' relative weighting of each cue. Production results showed a significant negative correlation of VOT and onset f0, while perception results showed a trend toward a positive correlation. No significant correlations were found across perception and production, suggesting a complex relationship between the two domains.  相似文献   

8.
Important issues in selective adaptation research concern the relative contribution of response related (perceptual) and stimulus related (acoustic) effects of the adaptor in the adaptive process. Two response related issues pertain to the effects of the adaptor percept and verbal transformations on adaptation. This investigation systematically examined perceptual and acoustic contributions of the adaptor on the adaptation of the voicing feature. Subjects rated the degree of voicing/voicelessness of end-point VOT adaptors, i.e., 5- and 55-ms VOT, and an acoustically neutral adaptor, i.e., 25-ms VOT, during periods of repetitions. The number of adaptor repetitions during each of ten trials was either 5, 32, or 95, and the intensity of the adapting stimulus was either 50, 70, or 90 dB SPL. The major findings were as follows: (1) No significant correlations were found between ratings of voicing percept of the adaptor and magnitude of boundary shift; (2) Increases in repetitions and relative intensity level of end-point adaptors produced significantly greater phonetic boundary shifts and generally greater affects on ratings of test stimuli; and (3) The end-point adaptors produced significant shifts in rating of boundary and nonboundary stimuli. The findings indicate that neigher the adaptor percept or verbal transformations affected the magnitude of adaptation. These results and those for acoustic parameters strongly suggest an acoustic as opposed to a phonetic basis of adaptation of the voicing feature. Furthermore, the effects of end-point adaptors on boundary and nonboundary stimuli support the generalized change in feature sensitivity assumed by a fatigue model of adaptation.  相似文献   

9.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

10.
11.
I.IntroductionTheF,patternsofspeechareimportantnotonlyforthcprosodicfeaturesbuta1soforvoicesourcecharactcristics.Nowmoreandmorespeechscientistsrecognizedthatvoiceexcitationsourceintcxt-to-spccchsystemsp1aysanimportantro1elnbothintclligibilityandnaturalnessorsynthcticspcech.Espccially,forChinese,atone1anguagewithmulti-tonesystem,thetonalpatternswhicharcmainlydcmonstratedintheF,con-tourscarry1exicalmeaning.SomecomparativestudiesoftheF,pattcrnsinbetweentonelanguage(Chinese)andstress1anguage(En…  相似文献   

12.
Adults whose native languages permit syllable-final obstruents, and show a vocalic length distinction based on the voicing of those obstruents, consistently weight vocalic duration strongly in their perceptual decisions about the voicing of final stops, at least in laboratory studies using synthetic speech. Children, on the other hand, generally disregard such signal properties in their speech perception, favoring formant transitions instead. These age-related differences led to the prediction that children learning English as a native language would weight vocalic duration less than adults, but weight syllable-final transitions more in decisions of final-consonant voicing. This study tested that prediction. In the first experiment, adults and children (eight and six years olds) labeled synthetic and natural CVC words with voiced or voiceless stops in final C position. Predictions were strictly supported for synthetic stimuli only. With natural stimuli it appeared that adults and children alike weighted syllable-offset transitions strongly in their voicing decisions. The predicted age-related difference in the weighting of vocalic duration was seen for these natural stimuli almost exclusively when syllable-final transitions signaled a voiced final stop. A second experiment with adults and children (seven and five years old) replicated these results for natural stimuli with four new sets of natural stimuli. It was concluded that acoustic properties other than vocalic duration might play more important roles in voicing decisions for final stops than commonly asserted, sometimes even taking precedence over vocalic duration.  相似文献   

13.
A production study was conducted to investigate the effect of vowel lengthening before voiced obstruents, and the possible influence that the openness versus closedness of syllables have on the temporal structure of vowels in some languages. The results revealed that vowels were significantly longer when followed by voiced consonants than voiceless consonants. Vowel duration did not, however, vary with syllable structure. However, vowels in open syllables followed by [+ voiced] consonants tended to be longer than when the following consonants were [- voiced]. These results are discussed in the context of current knowledge of other languages.  相似文献   

14.
This study focuses on the initial component of the stop consonant release burst, the release transient. In theory, the transient, because of its impulselike source, should contain much information about the vocal tract configuration at release, but it is usually weak in intensity and difficult to isolate from the accompanying frication in natural speech. For this investigation, a human talker produced isolated release transients of /b,d,g/ in nine vocalic contexts by whispering these syllables very quietly. He also produced the corresponding CV syllables with regular phonation for comparison. Spectral analyses showed the isolated transients to have a clearly defined formant structure, which was not seen in natural release bursts, whose spectra were dominated by the frication noise. The formant frequencies varied systematically with both consonant place of articulation and vocalic context. Perceptual experiments showed that listeners can identify both consonants and vowels from isolated transients, though not very accurately. Knowing one of the two segments in advance did not help, but when the transients were followed by a compatible synthetic, steady-state vowel, consonant identification improved somewhat. On the whole, isolated transients, despite their clear formant structure, provided only partial information for consonant identification, but no less so, it seems, than excerpted natural release bursts. The information conveyed by artificially isolated transients and by natural (frication-dominated) release bursts appears to be perceptually equivalent.  相似文献   

15.
Previous work has shown that the lips are moving at a high velocity when the oral closure occurs for bilabial stop consonants, resulting in tissue compression and mechanical interactions between the lips. The present experiment recorded tongue movements in four subjects during the production of velar and alveolar stop consonants to examine kinematic events before, during, and after the stop closure. The results show that, similar to the lips, the tongue is often moving at a high velocity at the onset of closure. The tongue movements were more complex, with both horizontal and vertical components. Movement velocity at closure and release were influenced by both the preceding and the following vowel. During the period of oral closure, the tongue moved through a trajectory of usually less than 1 cm; again, the magnitude of the movement was context dependent. Overall, the tongue moved in forward-backward curved paths. The results are compatible with the idea that the tongue is free to move during the closure as long as an airtight seal is maintained. A new interpretation of the curved movement paths of the tongue in speech is also proposed. This interpretation is based on the principle of cost minimization that has been successfully applied in the study of hand movements in reaching.  相似文献   

16.
Speech perception requires the integration of information from multiple phonetic and phonological dimensions. A sizable literature exists on the relationships between multiple phonetic dimensions and single phonological dimensions (e.g., spectral and temporal cues to stop consonant voicing). A much smaller body of work addresses relationships between phonological dimensions, and much of this has focused on sequences of phones. However, strong assumptions about the relevant set of acoustic cues and/or the (in)dependence between dimensions limit previous findings in important ways. Recent methodological developments in the general recognition theory framework enable tests of a number of these assumptions and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A hierarchical Bayesian Gaussian general recognition theory model was fit to data from two experiments investigating identification of English labial stop and fricative consonants in onset (syllable initial) and coda (syllable final) position. The results underscore the importance of distinguishing between conceptually distinct processing levels and indicate that, for individual subjects and at the group level, integration of phonological information is partially independent with respect to perception and that patterns of independence and interaction vary with syllable position.  相似文献   

17.
The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words.  相似文献   

18.
A model of the vocal-tract area function is described that consists of four tiers. The first tier is a vowel substrate defined by a system of spatial eigenmodes and a neutral area function determined from MRI-based vocal-tract data. The input parameters to the first tier are coefficient values that, when multiplied by the appropriate eigenmode and added to the neutral area function, construct a desired vowel. The second tier consists of a consonant shaping function defined along the length of the vocal tract that can be used to modify the vowel substrate such that a constriction is formed. Input parameters consist of the location, area, and range of the constriction. Location and area roughly correspond to the standard phonetic specifications of place and degree of constriction, whereas the range defines the amount of vocal-tract length over which the constriction will influence the tract shape. The third tier allows length modifications for articulatory maneuvers such as lip rounding/spreading and larynx lowering/raising. Finally, the fourth tier provides control of the level of acoustic coupling of the vocal tract to the nasal tract. All parameters can be specified either as static or time varying, which allows for multiple levels of coarticulation or coproduction.  相似文献   

19.
Durations of the vocalic portions of speech are influenced by a large number of linguistic and nonlinguistic factors (e.g., stress and speaking rate). However, each factor affecting vowel duration may influence articulation in a unique manner. The present study examined the effects of stress and final-consonant voicing on the detailed structure of articulatory and acoustic patterns in consonant-vowel-consonant (CVC) utterances. Jaw movement trajectories and F 1 trajectories were examined for a corpus of utterances differing in stress and final-consonant voicing. Jaw lowering and raising gestures were more rapid, longer in duration, and spatially more extensive for stressed versus unstressed utterances. At the acoustic level, stressed utterances showed more rapid initial F 1 transitions and more extreme F 1 steady-state frequencies than unstressed utterances. In contrast to the results obtained in the analysis of stress, decreases in vowel duration due to devoicing did not result in a reduction in the velocity or spatial extent of the articulatory gestures. Similarly, at the acoustic level, the reductions in formant transition slopes and steady-state frequencies demonstrated by the shorter, unstressed utterances did not occur for the shorter, voiceless utterances. The results demonstrate that stress-related and voicing-related changes in vowel duration are accomplished by separate and distinct changes in speech production with observable consequences at both the articulatory and acoustic levels.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号