首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
This study addresses three issues that are relevant to coarticulation theory in speech production: whether the degree of articulatory constraint model (DAC model) accounts for patterns of the directionality of tongue dorsum coarticulatory influences; the extent to which those patterns in tongue dorsum coarticulatory direction are similar to those for the tongue tip; and whether speech motor control and phonemic planning use a fixed or a context-dependent temporal window. Tongue dorsum and tongue tip movement data on vowel-to-vowel coarticulation are reported for Catalan VCV sequences with vowels /i/, /a/, and /u/, and consonants /p/, /n/, dark /l/, /s/, /S/, alveolopalatal /n/ and /k/. Electromidsagittal articulometry recordings were carried out for three speakers using the Carstens articulograph. Trajectory data are presented for the vertical dimension for the tongue dorsum, and for the horizontal dimension for tongue dorsum and tip. In agreement with predictions of the DAC model, results show that directionality patterns of tongue dorsum coarticulation can be accounted for to a large extent based on the articulatory requirements on consonantal production. While dorsals exhibit analogous trends in coarticulatory direction for all articulators and articulatory dimensions, this is mostly so for the tongue dorsum and tip along the horizontal dimension in the case of lingual fricatives and apicolaminal consonants. This finding results from different articulatory strategies: while dorsal consonants are implemented through homogeneous tongue body activation, the tongue tip and tongue dorsum act more independently for more anterior consonantal productions. Discontinuous coarticulatory effects reported in the present investigation suggest that phonemic planning is adaptative rather than context independent.  相似文献   

2.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

3.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

4.
Anticipatory velar lowering: a coproduction account   总被引:1,自引:0,他引:1  
Feature spreading and coproduction models make fundamentally different assumptions about the nature and organization of speech motor control, and yet each model is supported by some, but not all, of the existing empirical data. This has led some researchers to conclude that speakers probably use alternative strategies at different times. This study suggests that the identification of coarticulatory influences requires the concurrent identification of intrinsic articulatory characteristics of the segment. Moreover, the evidence for feature spreading or variable coarticulation strategies derives from the misidentification of such intrinsic characteristics as context effects. This velar coarticulation study used a controlled comparison between CVnN and CVnC minimal pairs, where C is an oral consonant, Vn is any number of vowels, and N is a nasal consonant. Vocalic string duration was manipulated by varying the number of segments and speech rate, allowing us to alter the time between the onsets of vocalic and subsequent consonantal gestures. Velar lowering occurred in CVn sequences, whether or not a nasal consonant followed, and similar vocalic gestures were observed across minimally contrastive environments with and without the nasal consonant. Moreover, velar lowering for the nasal consonant began in close temporal proximity to the nasal murmur. These results strongly support the coproduction model and provide insight into previously conflicting reports.  相似文献   

5.
Control of rate and duration of speech movements   总被引:4,自引:0,他引:4  
A computerized pulsed-ultrasound system was used to monitor tongue dorsum movements during the production of consonant-vowel sequences in which speech rate, vowel, and consonant were varied. The kinematics of tongue movement were analyzed by measuring the lowering gesture of the tongue to give estimates of movement amplitude, duration, and maximum velocity. All three subjects in the study showed reliable correlations between the amplitude of the tongue dorsum movement and its maximum velocity. Further, the ratio of the maximum velocity to the extent of the gesture, a kinematic indicator of articulator stiffness, was found to vary inversely with the duration of the movement. This relationship held both within individual conditions and across all conditions in the study such that a single function was able to accommodate a large proportion of the variance due to changes in movement duration. As similar findings have been obtained both for abduction and adduction gestures of the vocal folds and for rapid voluntary limb movements, the data suggest that a wide range of changes in the duration of individual movements might all have a similar origin. The control of movement rate and duration through the specification of biomechanical characteristics of speech articulators is discussed.  相似文献   

6.
The role of auditory feedback in speech production was investigated by examining speakers' phonemic contrasts produced under increases in the noise to signal ratio (N/S). Seven cochlear implant users and seven normal-hearing controls pronounced utterances containing the vowels /i/, /u/, /e/ and /ae/ and the sibilants /s/ and /I/ while hearing their speech mixed with noise at seven equally spaced levels between their thresholds of detection and discomfort. Speakers' average vowel duration and SPL generally rose with increasing N/S. Average vowel contrast was initially flat or rising; at higher N/S levels, it fell. A contrast increase is interpreted as reflecting speakers' attempts to maintain clarity under degraded acoustic transmission conditions. As N/S increased, speakers could detect the extent of their phonemic contrasts less effectively, and the competing influence of economy of effort led to contrast decrements. The sibilant contrast was more vulnerable to noise; it decreased over the entire range of increasing N/S for controls and was variable for implant users. The results are interpreted as reflecting the combined influences of a clarity constraint, economy of effort and the effect of masking on achieving auditory phonemic goals-with implant users less able to increase contrasts in noise than controls.  相似文献   

7.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

8.
The present study investigated the relationship between functionally relevant compound gestures and single-articulator component movements of the jaw and the constrictors lower lip and tongue tip during rate-controlled syllable repetitions. In nine healthy speakers, the effects of speaking rate (3 vs 5 Hz), place of articulation, and vowel type during stop consonant-vowel repetitions (/pa/, /pi/, /ta/, /ti/) on the amplitude and peak velocity of differential jaw and constrictor opening-closing movements were measured by means of electromagnetic articulography. Rather than homogeneously scaled compound gestures, the results suggest distinct control mechanisms for the jaw and the constrictors. In particular, jaw amplitude was closely linked to vowel height during bilabial articulation, whereas the lower lip component amplitude turned out to be predominantly rate sensitive. However, the observed variability across subjects and conditions does not support the assumption that single-articulator gestures directly correspond to basic phonological units. The nonhomogeneous effects of speech rate on articulatory subsystem parameters indicate that single structures are differentially rate sensitive. On average, an increase in speech rate resulted in a more or less proportional increase of the steepness of peak velocity/amplitude scaling for jaw movements, whereas the constrictors were less rate sensitive in this respect. Negative covariation across repetitions between jaw and constrictor amplitudes has been considered an indicator of motor equivalence. Although significant in some cases, such a relationship was not consistently observed across subjects. Considering systematic sources of variability such as vowel height, speech rate, and subjects, jaw-constrictor amplitude correlations showed a nonhomogeneous pattern strongly depending on place of articulation.  相似文献   

9.
The timing of upper lip protrusion movements and accompanying acoustic events was examined for multiple repetitions of word pairs such as "lee coot" and "leaked coot" by four speakers of American English. The duration of the intervocalic consonant string was manipulated by using various combinations of /s/, /t/, /k/, /h/, and /#/. Pairwise comparisons were made of consonant string duration (acoustic /i/ offset to acoustic /u/ onset) with durations of: protrusion movement beginning to acoustic /u/ onset, maximum acceleration of the movement to acoustic /u/ onset, and acoustic /u/ onset to movement end. There were some consonant-specific protrusion effects, primarily on the movement beginning event for /s/. Inferences from measures of the maximum acceleration and movement end events for the non-/s/ subset suggested the simultaneous and variable expression of three competing constraints: (1) end the protrusion movement during the voiced part of the /u/; (2) use a preferred movement duration; and (3) begin the /u/-related protrusion movement when permitted by relaxation of the perceptually motivated constraint that the preceding /i/ be unrounded. The subjects differed in the degree of expression of each constraint, but the results generally indicate that anticipatory coarticulation of lip protrusion is influenced both by acoustic-phonetic context dependencies and dynamical properties of movements. Because of the extensive variation in the data and the small number of subjects, these ideas are tentative; additional work is needed to explore them further.  相似文献   

10.
This study was designed to test the hypothesis that the kinematic manipulations used by speakers in different speaking conditions are influenced by kinematic performance limits. A range of kinematic parameter values was elicited by having seven subjects produce cyclical CV movements of lips, tongue blade and tongue dorsum (/ba/, /da/, /ga/), at rates ranging from 1 to 6 Hz. The resulting measures were used to establish speaker- and articulator-specific kinematic performance spaces, defined by movement duration, displacement and peak speed. These data were compared with speech movement data produced by the subjects in several different speaking conditions in the companion study (Perkell et al., 2002). The amount of overlap of the speech data and cyclical data varied across speakers, from almost no overlap to complete overlap. Generally, for a given movement duration, speech movements were larger than cyclical movements, indicating that the speech movements were faster and were produced with greater effort, according to the performance space analysis. It was hypothesized that the cyclical movements of the tongue and lips were slower than the speech movements because they were more constrained by (coupled to) the relatively massive mandible. To test this hypothesis, a comparison was made of cyclical movements in maxillary versus mandibular frames of reference. The results indicate that the cyclical movements were not strongly constrained by mandible movements. The overall results generally indicate that the cyclical task did not succeed in defining the upper limits of kinematic performance spaces within which the speech data were confined. Thus, the hypothesis that performance limits influence speech kinematics could not be tested effectively. The differences between the speech and cyclical movements may be due to other factors, such as differences in speakers' "skill" with the two types of movement, or the size of the movements--the speech movements were larger, probably because of a well-defined target for the primary, stressed vowel.  相似文献   

11.
In this paper we present efforts for characterizing the three dimensional (3-D) movements of the right hand and the face of a French female speaker during the audiovisual production of cued speech. The 3-D trajectories of 50 hand and 63 facial flesh points during the production of 238 utterances were analyzed. These utterances were carefully designed to cover all possible diphones of the French language. Linear and nonlinear statistical models of the articulations and the postures of the hand and the face have been developed using separate and joint corpora. Automatic recognition of hand and face postures at targets was performed to verify a posteriori that key hand movements and postures imposed by cued speech had been well realized by the subject. Recognition results were further exploited in order to study the phonetic structure of cued speech, notably the phasing relations between hand gestures and sound production. The hand and face gestural scores are studied in reference with the acoustic segmentation. A first implementation of a concatenative audiovisual text-to-cued speech synthesis system is finally described that employs this unique and extensive data on cued speech in action.  相似文献   

12.
This study investigated the role of the amplitude envelope in the vicinity of consonantal release in the perception of the stop-glide contrast. Three sets of acoustic [b-w] continua, each in the vowel environments [a] and [i], were synthesized using parameters derived from natural speech. In the first set, amplitude, formant frequency, and duration characteristics were interpolated between exemplar stop and glide endpoints. In the second set, formant frequency and duration characteristics were interpolated, but all stimuli were given a stop amplitude envelope. The third set was like the second, except that all stimuli were given a glide amplitude envelope. Subjects were given both forced-choice and free-identification tasks. The results of the forced-choice task indicated that amplitude cues were able to override transition slope, duration, and formant frequency cues in the perception of the stop-glide contrast. However, results from the free-identification task showed that, although presence of a stop amplitude envelope turned all stimuli otherwise labeled as glides to stops, the presence of a glide amplitude envelope changed stimuli labeled otherwise as stops to fricatives rather than to glides. These results support the view that the amplitude envelope in the vicinity of the consonantal release is a critical acoustic property for the continuant / noncontinuant contrast. The results are discussed in relation to a theory of acoustic invariance.  相似文献   

13.
Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee.  相似文献   

14.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

15.
The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably.  相似文献   

16.
Tongue-surface movement patterns during speech and swallowing   总被引:4,自引:0,他引:4  
The tongue has been frequently characterized as being composed of several functionally independent articulators. The question of functional regionality within the tongue was examined by quantifying the strength of coupling among four different tongue locations across a large number of consonantal contexts and participants. Tongue behavior during swallowing was also described. Vertical displacements of pellets affixed to the tongue were extracted from the x-ray microbeam database. Forty-six participants recited 20 vowel-consonant-vowel (VCV) combinations and swallowed 10 ccs of water. Tongue-surface movement patterns were quantitatively described by computing the covariance between the vertical time-histories of all possible pellet pairs. Phonemic differentiation in vertical tongue motions was observed as coupling varied predictably across pellet pairs with place of articulation. Moreover, tongue displacements for speech and swallowing clustered into distinct groups based on their coupling profiles. Functional independence of anterior tongue regions was evidenced by a wide range of movement coupling relations between anterior tongue pellets. The strengths and weaknesses of the covariance-based analysis for characterizing tongue movement are considered.  相似文献   

17.
This study examined the temporal phasing of tongue and lip movements in vowel-consonant-vowel sequences where the consonant is a bilabial stop consonant /p, b/ and the vowels one of /i, a, u/; only asymmetrical vowel contexts were included in the analysis. Four subjects participated. Articulatory movements were recorded using a magnetometer system. The onset of the tongue movement from the first to the second vowel almost always occurred before the oral closure. Most of the tongue movement trajectory from the first to the second vowel took place during the oral closure for the stop. For all subjects, the onset of the tongue movement occurred earlier with respect to the onset of the lip closing movement as the tongue movement trajectory increased. The influence of consonant voicing and vowel context on interarticulator timing and tongue movement kinematics varied across subjects. Overall, the results are compatible with the hypothesis that there is a temporal window before the oral closure for the stop during which the tongue movement can start. A very early onset of the tongue movement relative to the stop closure together with an extensive movement before the closure would most likely produce an extra vowel sound before the closure.  相似文献   

18.
A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.  相似文献   

19.
This study assessed the acoustic coarticulatory effects of phrasal accent on [V1.CV2] sequences, when separately applied to V1 or V2, surrounding the voiced stops [b], [d], and [g]. Three adult speakers each produced 360 tokens (six V1 contexts x ten V2 contexts x three stops x two emphasis conditions). Realizing that anticipatory coarticulation of V2 onto the intervocalic C can be influenced by prosodic effects, as well as by vowel context effects, a modified locus equation regression metric was used to isolate the effect of phrasal accent on consonantal F2 onsets, independently of prosodically induced vowel expansion effects. The analyses revealed two main emphasis-dependent effects: systematic differences in F2 onset values and the expected expansion of vowel space. By accounting for the confounding variable of stress-induced vowel space expansion, a small but consistent coarticulatory effect of emphatic stress on the consonant was uncovered in lingually produced stops, but absent in labial stops. Formant calculations based on tube models indicated similarly increased F2 onsets when stressed /d/ and /g/ were simulated with deeper occlusions resulting from more forceful closure movements during phrasal accented speech.  相似文献   

20.
If more than one articulator is involved in the execution of a phonetic task, then the individual articulators have to be temporally coordinated with each other in a lawful manner. The present study aims at analyzing tongue-jaw cohesion in the temporal domain for the German coronal consonants [s, f, t, d, n, l], i.e., consonants produced with the same set of articulators--the tongue blade and the jaw--but differing in manner of articulation. The stability of obtained interaction patterns is evaluated by varying the degree of vocal effort: comfortable and loud. Tongue and jaw movements of five speakers of German were recorded by means of electromagnetic midsagittal articulography (EMMA) during [aCa] sequences. The results indicate that (1) tongue-jaw coordination varies with manner of articulation, i.e., a later onset and offset of the jaw target for the stops compared to the fricatives, the nasal and the lateral; (2) the obtained patterns are stable across vocal effort conditions; (3) the sibilants are produced with smaller standard deviations for latencies and target positions; and (4) adjustments to the lower jaw positions during the surrounding vowels in loud speech occur during the closing and opening movement intervals and not the consonantal target phases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号