首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme.  相似文献   

2.
This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model.  相似文献   

3.
The many-to-one mapping from representations in the speech articulatory space to acoustic space renders the associated acoustic-to-articulatory inverse mapping non-unique. Among various techniques, imposing smoothness constraints on the articulator trajectories is one of the common approaches to handle the non-uniqueness in the acoustic-to-articulatory inversion problem. This is because, articulators typically move smoothly during speech production. A standard smoothness constraint is to minimize the energy of the difference of the articulatory position sequence so that the articulator trajectory is smooth and low-pass in nature. Such a fixed definition of smoothness is not always realistic or adequate for all articulators because different articulators have different degrees of smoothness. In this paper, an optimization formulation is proposed for the inversion problem, which includes a generalized smoothness criterion. Under such generalized smoothness settings, the smoothness parameter can be chosen depending on the specific articulator in a data-driven fashion. In addition, this formulation allows estimation of articulatory positions recursively over time without any loss in performance. Experiments with the MOCHA TIMIT database show that the estimated articulator trajectories obtained using such a generalized smoothness criterion have lower RMS error and higher correlation with the actual measured trajectories compared to those obtained using a fixed smoothness constraint.  相似文献   

4.
Peri- and intraoral devices are often used to obtain measurements concerning articulator motions and placements. Surprisingly, there are few formal evaluations of the potential influence of these devices on speech production behavior. In particular, the potential effects of lingual pellets or coils used in x-ray or electromagnetic studies of tongue motion have never been evaluated formally, even though a large x-ray database exists and electromagnetic systems are commercially available. The x-ray microbeam database [Westbury, J. "X-ray Microbeam Speech Production Database User's Handbook, version 1" (1994)] includes several utterances produced with pellets-off and -on, which allowed us to evaluate effects of pellets for the utterance, She had your dark suit in greasy wash water all year, using acoustic and perceptual measures. Overall, there were no acoustic or perceptual measures that showed consistent effects of pellets across speakers, but certain effects were consistent either within a given speaker or in direction across a subgroup of the speakers. The results are discussed in terms of the general goodness of the assumption that point parameterization of lingual motion does not interfere with normal articulatory behaviors. A brief screening procedure is suggested to protect articulatory kinematic experiments from those individuals who may show consistent effects of having devices placed on perioral structures.  相似文献   

5.
The method described here predicts the trajectories of articulatory movements for continuous speech by using a kinematic triphone model and the minimum-acceleration model. The kinematic triphone model, which is constructed from articulatory data obtained from experiments using an electro-magnetic articulographic system, is characterized by three kinematic features of a triphone and by the intervals between two successive phonemes in the triphone. After a kinematic feature of a phoneme in a given sentence is extracted, the minimum-acceleration trajectory that coincides with the extremum of the time integral of the squared magnitude of the articulator acceleration is formulated. The calculation of the minimum acceleration requires only linear computation. The method predicts both the qualitative features and the quantitative details of experimentally observed articulation.  相似文献   

6.
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.  相似文献   

7.
This paper investigates the functional relationship between articulatory variability and stability of acoustic cues during American English /r/ production. The analysis of articulatory movement data on seven subjects shows that the extent of intrasubject articulatory variability along any given articulatory direction is strongly and inversely related to a measure of acoustic stability (the extent of acoustic variation that displacing the articulators in this direction would produce). The presence and direction of this relationship is consistent with a speech motor control mechanism that uses a third formant frequency (F3) target; i.e., the final articulatory variability is lower for those articulatory directions most relevant to determining the F3 value. In contrast, no consistent relationship across speakers and phonetic contexts was found between hypothesized vocal-tract target variables and articulatory variability. Furthermore, simulations of two speakers' productions using the DIVA model of speech production, in conjunction with a novel speaker-specific vocal-tract model derived from magnetic resonance imaging data, mimic the observed range of articulatory gestures for each subject, while exhibiting the same articulatory/acoustic relations as those observed experimentally. Overall these results provide evidence for a common control scheme that utilizes an acoustic, rather than articulatory, target specification for American English /r/.  相似文献   

8.
An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.  相似文献   

9.
A hypothesis on the nature of articulatory targets for the vowels /i/ and /a/ is proposed, based on acoustic considerations and vowel articulations. The conjecture is that positioning of points on the tongue surface in a repetition experiment should be most accurate in the direction perpendicular to the vocal-tract midline, at the acoustically critical point of maximal constriction for each vowel. The hypothesis was tested by: examining x-ray microbeam data for three speakers, conducting a partial acoustical analysis, and performing a modeling study. Distributions were plotted of the midsagittal locations of three tongue points at the time of maximal excursion toward the vowel target for numbers of examples of the vowels, embedded in a variety of phonetic contexts. More variation was found along a direction parallel to the vocal tract midline than perpendicular to the midline, supporting the hypothesis. Statistics on formant values for one subject have been calculated, and pairwise regressions of displacement and formant data have been run. An articulatory synthesizer [Rubin et al., J. Acoust. Soc. Am. 70, 321-328 (1981)] has been manipulated through displacements similar to the subject's articulatory variation. Although articulatory synthesis showed systematic relationships between articulatory relationships and formant frequencies, there were no significant correlations between the subject's measured articulatory displacements and his formant data. These additional results raise questions about the methodology and point to the need for additional work for an adequate test of the hypothesis.  相似文献   

10.
In this study the effects of accent and prosodic boundaries on the production of English vowels (/a,i/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /i/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /i/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts.  相似文献   

11.
The acoustical consequences of articulatory maneuvers of [y] are studied in model experiments in order to obtain insights into articulator programming and speech motor control by elucidating the role of each component maneuver of a speech segment in setting up vocal tract resonance conditions for the spectral features of the speech wave. The maneuvers of [y] are found to provide a maximum and stable plain-flat spectral contrast with [i]. The results can be generalized to different vocal tract sizes. Tongue retraction and larynx depression are rejected as compensations to counteract labial undershoot. Larynx depression is complementary to lip rounding and restores spectral sensitivity to palatal and pharyngeal tongue movements otherwise disturbed by the labial activity. Spectral sensitivity then remains the same for [i] and [y], and there is no need for separate compensation programs for each of these phones.  相似文献   

12.
If more than one articulator is involved in the execution of a phonetic task, then the individual articulators have to be temporally coordinated with each other in a lawful manner. The present study aims at analyzing tongue-jaw cohesion in the temporal domain for the German coronal consonants [s, f, t, d, n, l], i.e., consonants produced with the same set of articulators--the tongue blade and the jaw--but differing in manner of articulation. The stability of obtained interaction patterns is evaluated by varying the degree of vocal effort: comfortable and loud. Tongue and jaw movements of five speakers of German were recorded by means of electromagnetic midsagittal articulography (EMMA) during [aCa] sequences. The results indicate that (1) tongue-jaw coordination varies with manner of articulation, i.e., a later onset and offset of the jaw target for the stops compared to the fricatives, the nasal and the lateral; (2) the obtained patterns are stable across vocal effort conditions; (3) the sibilants are produced with smaller standard deviations for latencies and target positions; and (4) adjustments to the lower jaw positions during the surrounding vowels in loud speech occur during the closing and opening movement intervals and not the consonantal target phases.  相似文献   

13.
The problems of evaluating the phonetic quality of speech and the characteristic features of a speaker’s articulatory base from the data of acoustic-phonetic measurements are considered. The evaluation is recommended to be performed using the GOST 50840-95 standard “Speech Transmission over Varied Communication Channels: Techniques for Measurement of Speech Quality, Intelligibility, and Voice Identification,” which was put into effect in Russia in 1997. Examples of experimental evaluation measurements in speech communication and criminalistic expertise are presented.  相似文献   

14.
Coordination between intrinsic and jaw-related components of tongue blade movement during the articulation of the alveolar consonant /t/ was examined across changes in phonetic context. Tongue-jaw interactions included compensatory responses of one articulatory component to a contextual effect on the position of the other articulatory component. A similar reciprocity has been observed in studies that introduced artificial perturbation of jaw position and studies of patterns of token-to-token variability. Thus the lingual-mandibular complex seems to respond in a similar manner to at least some natural and artificial perturbations.  相似文献   

15.
We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM.  相似文献   

16.
The purpose of this study is to test a methodology for describing the articulation of vowels. High front vowels are a test case because some theories suggest that high front vowels have little cross-linguistic variation. Acoustic studies appear to show counterexamples to these predictions, but purely acoustic studies are difficult to interpret because of the many-to-one relation between articulation and acoustics. In this study, vocal tract dimensions, including constriction degree and position, are measured from cinéradiographic and x-ray data on high front vowels from three different languages (North American English, French, and Mandarin Chinese). Statistical comparisons find several significant articulatory differences between North American English /i/ and Mandarin Chinese and French /i/. In particular, differences in constriction degree were found, but not constriction position. Articulatory synthesis is used to model the acoustic consequences of some of the significant articulatory differences, finding that the articulatory differences may have the acoustic consequences of making the latter languages' /i/ perceptually sharper by shifting the frequencies of F(2) and F(3) upwards. In addition, the vowel /y/ has specific articulations that differ from those for /i/, including a wider tongue constriction, and substantially different acoustic sensitivity functions for F(2) and F(3).  相似文献   

17.
18.
A method is presented that accounts for differences in the acoustics of vowel production caused by human talkers' vocal-tract anatomies and postural settings. Such a method is needed by an analysis-by-synthesis procedure designed to recover midsagittal articulatory movement from speech acoustics because the procedure employs an articulatory model as an internal model. The normalization procedure involves the adjustment of parameters of the articulatory model that are not of interest for the midsagittal movement recovery procedure. These parameters are adjusted so that acoustic signals produced by the human and the articulatory model match as closely as possible over an initial set of pairs of corresponding human and model midsagittal shapes. Further, these initial midsagittal shape correspondence need to be generalized so that all midsagittal shapes of the human can be obtained from midsagittal shapes of the model. Once these procedures are complete, the midsagittal articulatory movement recovery algorithm can be used to derive model articulatory trajectories that, subsequently, can be transformed into human articulatory trajectories. In this paper the proposed normalization procedure is outlined and the results of experiments with data from two talkers contained in the X-ray Microbeam Speech Production Database are presented. It was found to be possible to characterize these vocal tracts during vowel production with the proposed procedure and to generalize the initial midsagittal correspondences over a set of vowels to other vowels. The procedure was also found to aid in midsagittal articulatory movement recovery from speech acoustics in a vowel-to-vowel production for the two subjects.  相似文献   

19.
A dynamic model of articulatory movements is introduced. The research presented herein focuses on the method of representing the phonemic tasks, i.e., phoneme-specific articulatory targets. Phonemic tasks in our model are formally defined using invariant features of articulatory posture. The invariant features used in the model are characterized by the linear transformation of articulatory variables and found using a statistical analysis of measured articulatory movements, in which the articulatory features with minimum variability are taken to be the invariant features. Articulatory movements making vocal-tract constrictions or relative movements among articulators reflecting task-sharing structures are typical examples of the features found to have low variability. In the trajectory formation of articulatory movements, the dimension number of the phonemic task is set at a smaller value than that of articulatory variables. Consequently, the kinematic states of the articulators are partly constrained at given time instants by a sequence of phonemic tasks, and there remain unconstrained degrees of freedom of articulatory variables. Articulatory movements are determined so that they simultaneously satisfy given phonemic tasks and dynamic smoothness constraints. The dynamic smoothness constraints coupled with the underspecified phonemic targets allow our model to explain contextual articulatory variability using context-independent phonemic tasks. Finally, the capability of the model for predicting actual articulatory movements is quantitatively investigated using empirical articulatory data.  相似文献   

20.
Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号