首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study addresses three issues that are relevant to coarticulation theory in speech production: whether the degree of articulatory constraint model (DAC model) accounts for patterns of the directionality of tongue dorsum coarticulatory influences; the extent to which those patterns in tongue dorsum coarticulatory direction are similar to those for the tongue tip; and whether speech motor control and phonemic planning use a fixed or a context-dependent temporal window. Tongue dorsum and tongue tip movement data on vowel-to-vowel coarticulation are reported for Catalan VCV sequences with vowels /i/, /a/, and /u/, and consonants /p/, /n/, dark /l/, /s/, /S/, alveolopalatal /n/ and /k/. Electromidsagittal articulometry recordings were carried out for three speakers using the Carstens articulograph. Trajectory data are presented for the vertical dimension for the tongue dorsum, and for the horizontal dimension for tongue dorsum and tip. In agreement with predictions of the DAC model, results show that directionality patterns of tongue dorsum coarticulation can be accounted for to a large extent based on the articulatory requirements on consonantal production. While dorsals exhibit analogous trends in coarticulatory direction for all articulators and articulatory dimensions, this is mostly so for the tongue dorsum and tip along the horizontal dimension in the case of lingual fricatives and apicolaminal consonants. This finding results from different articulatory strategies: while dorsal consonants are implemented through homogeneous tongue body activation, the tongue tip and tongue dorsum act more independently for more anterior consonantal productions. Discontinuous coarticulatory effects reported in the present investigation suggest that phonemic planning is adaptative rather than context independent.  相似文献   

2.
SUMMARY: This study identified that physiologically the superior pharyngeal constrictor muscle at the level of the base of the tongue contributes to retrusive movement of the tongue with constriction of the mid-pharyngeal cavity and possesses unique properties in terms of motor speech control along with the genioglossus muscle. From a kinematic study involving trans-nasal fiberscopy and lateral X-ray fluorography, retrusive movement of the tongue was highly correlated with constrictive movement of the mid-pharyngeal cavity. An electromyographic study revealed that the superior pharyngeal constrictor muscle at the level of the base of the tongue contributes to retrusive movement of the tongue and that the genioglossus muscle contributes to protrusive movement. We also noted that this relationship between the activities of these two muscles were in response to postural changes during vowel productions without changes in the acoustic features. These findings suggest that these two muscles act not only antagonistically to produce retrusive and protrusive movement of the tongue, but also they complement each other to conserve the shape of the vocal tract for speech production. The functional relationship between these two muscles could contribute the consecutive movement of human speech production under various conditions and might be useful when applying rehabilitation approaches for the patients with neurological speech and swallowing disorders.  相似文献   

3.
An unconstrained optimization technique is used to find the values of parameters, of a combination of an articulatory and a vocal tract model, that minimize the difference between model spectra and natural speech spectra. The articulatory model is anatomically realistic and the vocal tract model is a "lossy" Webster equation for which a method of solution is given. For English vowels in the steady state, anatomically reasonable articulatory configurations whose corresponding spectra match those of human speech to within 2 dB have been computed in fewer than ten iterations. Results are also given which demonstrate a limited ability of the system to track the articulatory dynamics of voiced speech.  相似文献   

4.
A model is presented which predicts the movements of flesh points on the tongue, lips, and jaw during speech production, from time-aligned phonetic strings. Starting from a database of x-ray articulator trajectories, means and variances of articulator positions and curvatures at the midpoints of phonemes are extracted from the data set. During prediction, the amount of articulatory effort required in a particular phonetic context is estimated from the relative local curvature of the articulator trajectory concerned. Correlations between position and curvature are used to directly predict variations from mean articulator positions due to coarticulatory effects. Use of the explicit coarticulation model yields a significant increase in articulatory modeling accuracy with respect to x-ray traces, as compared with the use of mean articulator positions alone.  相似文献   

5.
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.  相似文献   

6.
This study reports an investigation of the well-known context-dependent variation in English /r/ using a biomechanical tongue-jaw-hyoid model. The simulation results show that preferred /r/ variants require less volume displacement, relative strain, and relative muscle stress than variants that are not preferred. This study also uncovers a previously unknown mechanism in tongue biomechanics for /r/ production: Torque in the sagittal plane about the mental spine. This torque enables raising of the tongue anterior for retroflexed [Symbol: see text] by activation of hyoglossus and relaxation of anterior genioglossus. The results provide a deeper understanding of the articulatory factors that govern contextual phonetic variation.  相似文献   

7.
A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.  相似文献   

8.
This paper investigates the functional relationship between articulatory variability and stability of acoustic cues during American English /r/ production. The analysis of articulatory movement data on seven subjects shows that the extent of intrasubject articulatory variability along any given articulatory direction is strongly and inversely related to a measure of acoustic stability (the extent of acoustic variation that displacing the articulators in this direction would produce). The presence and direction of this relationship is consistent with a speech motor control mechanism that uses a third formant frequency (F3) target; i.e., the final articulatory variability is lower for those articulatory directions most relevant to determining the F3 value. In contrast, no consistent relationship across speakers and phonetic contexts was found between hypothesized vocal-tract target variables and articulatory variability. Furthermore, simulations of two speakers' productions using the DIVA model of speech production, in conjunction with a novel speaker-specific vocal-tract model derived from magnetic resonance imaging data, mimic the observed range of articulatory gestures for each subject, while exhibiting the same articulatory/acoustic relations as those observed experimentally. Overall these results provide evidence for a common control scheme that utilizes an acoustic, rather than articulatory, target specification for American English /r/.  相似文献   

9.
A cine series of tagged magnetic resonance (MR) images of the tongue is used to measure tongue motion and its internal deformation during speech. Tagged images are collected in three slice orientations (sagittal, coronal, and axial) during repetitions of the utterance "disouk" (/disuk/). A new technique called harmonic phase MRI (HARP-MRI) is used to process the tagged MR images to measure the internal deformation of the tongue. The measurements include displacement and velocity of tissue points, principal strains, and strain in the line-of-action of specific muscles. These measurements are not restricted to tag intersections, but can be calculated at every pixel in the image. The different motion measurements complement each other in understanding the tongue kinematics and in hypothesizing the internal muscle activity of the tongue.  相似文献   

10.
Point-tracking techniques provide timing information about structural movements of the tongue. Imaging techniques provide information about cross-sectional and pharyngeal tongue shape and movement. This study joined these techniques in a single subject. Five pellets on the tongue surface were tracked using x-ray microbeam, and the midsagittal and coronal planes of the tongue were imaged using real-time ultrasound. The speech materials were the consonants [s] and [l] and the vowels [i], [a], and [o] combined in VCVCe utterances. Analyses concentrated on the difference in tongue movements related to the two consonants. A model of tongue movement was developed, in which critical features of consonant shape and position dominated the tongue opening movement. In this model, the tongue is divided into subdivisions termed "functional segments" in both the sagittal and coronal planes. Movements of the functional segments created observable opening movement patterns.  相似文献   

11.
Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.  相似文献   

12.
Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed.  相似文献   

13.
The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably.  相似文献   

14.
Tongue contact patterns for /s/ and /l/ were investigated using dynamic palatography. Both spatial and temporal asymmetries were commonly found extending into the vocalic transitions for these consonants. Implications for the adequacy of tongue motion data taken in a single midsagittal plane are discussed, as well as for articulatory interpretation of speech signals and speaker recognition applications.  相似文献   

15.
The departure point of the present paper is our effort to characterize and understand the spatiotemporal structure of articulatory patterns in speech. To do so, we removed segmental variation as much as possible while retaining the spoken act's stress and prosodic structure. Subjects produced two sentences from the "rainbow passage" using reiterant speech in which normal syllables were replaced by /ba/ or /ma/. This task was performed at two self-selected rates, conversational and fast. Infrared LEDs were placed on the jaw and lips and monitored using a modified SELSPOT optical tracking system. As expected, when pauses marking major syntactic boundaries were removed, a high degree of rhythmicity within rate was observed, characterized by well-defined periodicities and small coefficients of variation. When articulatory gestures were examined geometrically on the phase plane, the trajectories revealed a scaling relation between a gesture's peak velocity and displacement. Further quantitative analysis of articulator movement as a function of stress and speaking rate was indicative of a language-modulated dynamical system with linear stiffness and equilibrium (or rest) position as key control parameters. Preliminary modeling was consonant with this dynamical perspective which, importantly, does not require that time per se be a controlled variable.  相似文献   

16.
The acoustical consequences of articulatory maneuvers of [y] are studied in model experiments in order to obtain insights into articulator programming and speech motor control by elucidating the role of each component maneuver of a speech segment in setting up vocal tract resonance conditions for the spectral features of the speech wave. The maneuvers of [y] are found to provide a maximum and stable plain-flat spectral contrast with [i]. The results can be generalized to different vocal tract sizes. Tongue retraction and larynx depression are rejected as compensations to counteract labial undershoot. Larynx depression is complementary to lip rounding and restores spectral sensitivity to palatal and pharyngeal tongue movements otherwise disturbed by the labial activity. Spectral sensitivity then remains the same for [i] and [y], and there is no need for separate compensation programs for each of these phones.  相似文献   

17.
A new technique, tagged Cine-Magnetic Resonance Imaging (tMRI), was used to develop a mechanical model that represented local, homogeneous, internal tongue deformation during speech. The goal was to infer muscle activity within the tongue from tissue deformations seen on tMRI. Measurements were made in three sagittal slices (left, middle, right) during production of the syllable /ka/. Each slice was superimposed with a grid of tag lines, and the approximately 40 tag line intersections were tracked at 7 time-phases during the syllable. A local model, similar to a finite element analysis, represented planar stretch and shear between the consonant and vowel at 110 probed locations within the tongue. Principal strains were calculated at these locations and revealed internal compression and extension patterns from which inferences could be drawn about the activities of the Verticalis, Hyoglossus, and Superior Longitudinal muscles, among others.  相似文献   

18.
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme.  相似文献   

19.
A hybrid PARAFAC and principal-component model of tongue configuration in vowel production is presented, using a corpus of German vowels in multiple consonant contexts (fleshpoint data for seven speakers at two speech rates from electromagnetic articulography). The PARAFAC approach is attractive for explicitly separating speaker-independent and speaker-dependent effects within a parsimonious linear model. However, it proved impossible to derive a PARAFAC solution of the complete dataset (estimated to require three factors) due to complexities introduced by the consonant contexts. Accordingly, the final model was derived in two stages. First, a two-factor PARAFAC model was extracted. This succeeded; the result was treated as the basic vowel model. Second, the PARAFAC model error was subjected to a separate principal-component analysis for each subject. This revealed a further articulatory component mainly involving tongue-blade activity associated with the flanking consonants. However, the subject-specific details of the mapping from raw fleshpoint coordinates to this component were too complex to be consistent with the PARAFAC framework. The final model explained over 90% of the variance and gave a succinct and physiologically plausible articulatory representation of the German vowel space.  相似文献   

20.
This study explores the hypothesis that clear speech is produced with greater "articulatory effort" than normal speech. Kinematic and acoustic data were gathered from seven subjects as they pronounced multiple repetitions of utterances in different speaking conditions, including normal, fast, clear, and slow. Data were analyzed within a framework based on a dynamical model of single-axis frictionless movements, in which peak movement speed is used as a relative measure of articulatory effort (Nelson, 1983). There were differences in peak movement speed, distance and duration among the conditions and among the speakers. Three speakers produced the "clear" condition utterances with movements that had larger distances and durations than those for "normal" utterances. Analyses of the data within a peak speed, distance, duration "performance space" indicated increased effort (reflected in greater peak speed) in the clear condition for the three speakers, in support of the hypothesis. The remaining four speakers used other combinations of parameters to produce the clear condition. The validity of the simple dynamical model for analyzing these complex movements was considered by examining several additional parameters. Some movement characteristics differed from those required for the model-based analysis, presumably because the articulators are complicated structurally and interact with one another mechanically. More refined tests of control strategies for different speaking styles will depend on future analyses of more complicated movements with more realistic models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号