首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
OBJECTIVE: To assess whether magnetic resonance imaging (MRI) allows the vocal tract (VT) area function to be determined for a normal male speaker. METHOD: VT shapes were acquired using MRI during sustained production of French points vowels: /i/, /a/, /u/. Cross-sectional areas were measured from a series of planes spaced at intervals of 1 cm along the length of the VT and were used as input in a previously described VT model to simulate the vowels. The first three formant frequencies, F1, F2, and F3, computed from the MRI-measured VT model were compared with subject's natural formant frequencies. RESULTS: Including piriform sinuses, calculated formants differed from measured formants F1, F2, and F3, respectively, for /i/ by -3.5%, +7.7%, and +27.5%; for /a/ by +11% +19.5%, and -4.3%; and for /u/ by +.9%, +23.4%, and +9.6%. Excluding piriform sinuses, calculated formants differed from measured formants F1, F2, and F3, respectively, for /i/ by -3.5%, +12%, and +28%, and for /u/ by +10.1%, +26.8%, and +13.7% The piriform sinuses were not visualized for /a/ on MRI. CONCLUSIONS: MRI is a noninvasive technique that allows VT imaging and determination of VT area function for a normal male speaker. Several possible sources of discrepancies are as follows: variability of the articulation, difficulties in assessment of VT wall boundaries, role of the piriform sinuses, and VT length.  相似文献   

2.
The many-to-one mapping from representations in the speech articulatory space to acoustic space renders the associated acoustic-to-articulatory inverse mapping non-unique. Among various techniques, imposing smoothness constraints on the articulator trajectories is one of the common approaches to handle the non-uniqueness in the acoustic-to-articulatory inversion problem. This is because, articulators typically move smoothly during speech production. A standard smoothness constraint is to minimize the energy of the difference of the articulatory position sequence so that the articulator trajectory is smooth and low-pass in nature. Such a fixed definition of smoothness is not always realistic or adequate for all articulators because different articulators have different degrees of smoothness. In this paper, an optimization formulation is proposed for the inversion problem, which includes a generalized smoothness criterion. Under such generalized smoothness settings, the smoothness parameter can be chosen depending on the specific articulator in a data-driven fashion. In addition, this formulation allows estimation of articulatory positions recursively over time without any loss in performance. Experiments with the MOCHA TIMIT database show that the estimated articulator trajectories obtained using such a generalized smoothness criterion have lower RMS error and higher correlation with the actual measured trajectories compared to those obtained using a fixed smoothness constraint.  相似文献   

3.
Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.  相似文献   

4.
Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.  相似文献   

5.
6.
A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.  相似文献   

7.
The effects of vowels on voice perturbation measures   总被引:1,自引:0,他引:1  
This study examines voice perturbation parameters of the sustained [a] in English and of the eight vowels in Turkish to discover whether any difference exists between these languages, and whether a correlation exists between voice perturbation parameters and articulatory and acoustic properties of the Turkish vowels. Eight Turkish vowels uttered by 26 healthy nonsmoker volunteer males who are native Turkish speakers were compared with a voice database that includes samples of normal and disordered voices belonging to American English speakers. Fundamental frequencies, the first and second formants, and perturbation parameters, such as jitter percent, pitch perturbation quotient, shimmer percent, and amplitude perturbation quotient of the sustained vowels, were measured. Also, the first and second formants of the sustained [a] in English were measured, and other parameters have been obtained from the database. When the voice perturbation parameters in Turkish and English were compared, statistically significant differences were not found. However, when Turkish vowels compared with each other, statistically significant differences were found among perturbation values. Categorical comparisons of the Turkish vowels like high-low, rounded-unrounded, and front-back revealed significant differences in perturbation values. In correlation analysis, a weak linear inverse relation between jitter percent and the first formant (r=-0.260, p<0.05) was found.  相似文献   

8.
An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched.  相似文献   

9.
This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model.  相似文献   

10.
Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels.  相似文献   

11.
In order to investigate the group characteristics of Putonghua monophthong for-mants, the tokens of 90 female students were surveyed. The formants were measured using LPC method. The averaged values and spread of formant frequencies were given with statistical meaning. The results show the difference from the previous measurements by other researchers decades ago. For all monophthongs, F4/F3 and F5/F4 are generally around 1.4. To discriminate monophthongs, F2/F1 and F3/F2 are possibly the two new parameters besides the first three formants.  相似文献   

12.
Speech production parameters of three postlingually deafened adults who use cochlear implants were measured: after 24 h of auditory deprivation (which was achieved by turning the subject's speech processor off); after turning the speech processor back on; and after turning the speech processor off again. The measured parameters included vowel acoustics [F1, F2, F0, sound-pressure level (SPL), duration and H1-H2, the amplitude difference between the first two spectral harmonics, a correlate of breathiness] while reading word lists, and average airflow during the reading of passages. Changes in speech processor state (on-to-off or vice versa) were accompanied by numerous changes in speech production parameters. Many changes were in the direction of normalcy, and most were consistent with long-term speech production changes in the same subjects following activation of the processors of their cochlear implants [Perkell et al., J. Acoust. Soc. Am. 91, 2961-2978 (1992)]. Changes in mean airflow were always accompanied by H1-H2 (breathiness) changes in the same direction, probably due to underlying changes in laryngeal posture. Some parameters (different combinations of SPL, F0, H1-H2 and formants for different subjects) showed very rapid changes when turning the speech processor on or off. Parameter changes were faster and more pronounced, however, when the speech processor was turned on than when it was turned off. The picture that emerges from the present study is consistent with a dual role for auditory feedback in speech production: long-term calibration of articulatory parameters as well as feedback mechanisms with relatively short time constants.  相似文献   

13.
We introduce two NMR inversion methods within the framework of 1D NMR to extract fluid saturations by varying echo spacing and wait time. The first method connects the T2 distribution of each fluid with the overall apparent T2 distribution using a shift matrix. Each fluid's saturation and T2 distribution are extracted by minimizing the difference between the model T2 distributions and measured apparent T2 distributions. The second method relates a model T2 distribution of each fluid with CPMG echo trains using a global evolution matrix that governs the evolution of magnetization under T1, T2 relaxation, and diffusion. These methods will be useful whenever data are not sufficient for 2D NMR inversion. They are also much faster than 2D for fluid typing. We also point out an inherent limitation associated with NMR inversion methods for fluid typing. Whenever there is singularity in the inversion matrix caused by similar behavior of model function for different fluids, most inversion algorithms remove the solution space associated with the singularity and choose a solution vector of the minimum length. This results in equal proportions of different fluids in the final answer. If prior knowledge such as saturation or T2 shape of the oil is available, there are several methods to tailor the solution to our desired outcome. However, if there is no prior knowledge available, such ambiguity always exists irregardless of the inversion schemes.  相似文献   

14.
A method is presented that accounts for differences in the acoustics of vowel production caused by human talkers' vocal-tract anatomies and postural settings. Such a method is needed by an analysis-by-synthesis procedure designed to recover midsagittal articulatory movement from speech acoustics because the procedure employs an articulatory model as an internal model. The normalization procedure involves the adjustment of parameters of the articulatory model that are not of interest for the midsagittal movement recovery procedure. These parameters are adjusted so that acoustic signals produced by the human and the articulatory model match as closely as possible over an initial set of pairs of corresponding human and model midsagittal shapes. Further, these initial midsagittal shape correspondence need to be generalized so that all midsagittal shapes of the human can be obtained from midsagittal shapes of the model. Once these procedures are complete, the midsagittal articulatory movement recovery algorithm can be used to derive model articulatory trajectories that, subsequently, can be transformed into human articulatory trajectories. In this paper the proposed normalization procedure is outlined and the results of experiments with data from two talkers contained in the X-ray Microbeam Speech Production Database are presented. It was found to be possible to characterize these vocal tracts during vowel production with the proposed procedure and to generalize the initial midsagittal correspondences over a set of vowels to other vowels. The procedure was also found to aid in midsagittal articulatory movement recovery from speech acoustics in a vowel-to-vowel production for the two subjects.  相似文献   

15.
A simple method is described for obtaining useful information on laser-induced dissociation processes of polyatomic molecules by studying VT relaxation in a regime of strong laser excitation. Our studies were carried out by using an interferometric technique. We also briefly outline a theoretical analysis of VT relaxation times in the presence of dissociation..  相似文献   

16.
The method described here predicts the trajectories of articulatory movements for continuous speech by using a kinematic triphone model and the minimum-acceleration model. The kinematic triphone model, which is constructed from articulatory data obtained from experiments using an electro-magnetic articulographic system, is characterized by three kinematic features of a triphone and by the intervals between two successive phonemes in the triphone. After a kinematic feature of a phoneme in a given sentence is extracted, the minimum-acceleration trajectory that coincides with the extremum of the time integral of the squared magnitude of the articulator acceleration is formulated. The calculation of the minimum acceleration requires only linear computation. The method predicts both the qualitative features and the quantitative details of experimentally observed articulation.  相似文献   

17.
An application of functional data analysis (FDA) (Ramsay and Silverman, 2005, Functional Data Analysis, 2nd ed. (Springer-Verlag, New York)) for linguistic experimentation is explored. The functional time-registration method provided by FDA is shown to offer novel advantages in the investigation of articulatory timing. Traditionally, articulatory studies examining the effects of linguistic variables such as prosody on articulatory timing have relied on comparing the durations of speech intervals of interest defined by kinematic landmarks. Such measurements, however, do not preserve information on the detailed, continuous pattern of articulatory timing that unfolds during these intervals. We present an approach that allows the analysis of entire, continuous kinematic trajectories obtained in a movement tracking experiment examining the influence of a phrasal boundary on articulatory patterning. FDA time deformation functions, after alignment of test and reference (control) signals, reveal delaying of articulator movement (i.e., slowing of the internal clock rate) in the presence of a phrase boundary as the speech stream recedes from the boundary. This is a theoretically predicted pattern (Byrd and Saltzman, 2003, The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening, Journal of Phonetics 31, 149-180.), which would be more difficult to validate with a traditional interval-based approach. It is concluded that the FDA time alignment method provides a useful tool for characterizing timing patterns in linguistic experimentation based on continuous kinematic trajectories.  相似文献   

18.
An original three-dimensional (3D) linear articulatory model of the velum and nasopharyngeal wall has been developed from magnetic resonance imaging (MRI) and computed tomography images of a French subject sustaining a set of 46 articulations, covering his articulatory repertoire. The velum and nasopharyngeal wall are represented by generic surface triangular meshes fitted to the 3D contours extracted from MRI for each articulation. Two degrees of freedom were uncovered by principal component analysis: first, VL accounts for 83% of the velum variance, corresponding to an oblique vertical movement seemingly related to the levator veli palatini muscle; second, VS explains another 6% of the velum variance, controlling a mostly horizontal movement possibly related to the sphincter action of the superior pharyngeal constrictor. The nasopharyngeal wall is also controlled by VL for 47% of its variance. Electromagnetic articulographic data recorded on the velum fitted these parameters exactly, and may serve to recover dynamic velum 3D shapes. The main oral and nasopharyngeal area functions controlled by the articulatory model, complemented by the area functions derived from the complex geometry of each nasal passage extracted from coronal MRIs, were fed to an acoustic model and gave promising results about the influence of velum movements on the spectral characteristics of nasals.  相似文献   

19.
This paper presents an efficient three-dimensional nonlinear electromagnetic inversion method in a multilayered medium for radar applications where the object size is comparable to the wavelength. In the first step of this two-step inversion algorithm, the diagonal tensor approximation is used in the Born iterative method. The solution of this approximate inversion is used as an initial guess for the second step in which further inversion is carried out using a distorted Born iterative method. Since the aim of the second step is to improve the accuracy of the inversion, a full-wave solver, the stabilized biconjugate-gradient fast Fourier transform algorithm, is used for forward modelling. The conjugate-gradient method is applied at each inversion iteration to minimize the functional cost. The usage of an iterative solver based on the FFT algorithm and the developed recursive matrix method combined with an interpolation technique to evaluate the layered medium Green's functions rapidly, makes this method highly efficient. An inversion problem with 32 768 complex unknowns can be solved with 1% relative error by using a simple personal computer. Several numerical experiments for arbitrarily located source and receiver arrays are presented to show the high efficiency and accuracy of the proposed method.  相似文献   

20.
A 3D cine-MRI technique was developed based on a synchronized sampling method [Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 (1999)] to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号