首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.  相似文献   

2.
The method described here predicts the trajectories of articulatory movements for continuous speech by using a kinematic triphone model and the minimum-acceleration model. The kinematic triphone model, which is constructed from articulatory data obtained from experiments using an electro-magnetic articulographic system, is characterized by three kinematic features of a triphone and by the intervals between two successive phonemes in the triphone. After a kinematic feature of a phoneme in a given sentence is extracted, the minimum-acceleration trajectory that coincides with the extremum of the time integral of the squared magnitude of the articulator acceleration is formulated. The calculation of the minimum acceleration requires only linear computation. The method predicts both the qualitative features and the quantitative details of experimentally observed articulation.  相似文献   

3.
A dynamic model of articulatory movements is introduced. The research presented herein focuses on the method of representing the phonemic tasks, i.e., phoneme-specific articulatory targets. Phonemic tasks in our model are formally defined using invariant features of articulatory posture. The invariant features used in the model are characterized by the linear transformation of articulatory variables and found using a statistical analysis of measured articulatory movements, in which the articulatory features with minimum variability are taken to be the invariant features. Articulatory movements making vocal-tract constrictions or relative movements among articulators reflecting task-sharing structures are typical examples of the features found to have low variability. In the trajectory formation of articulatory movements, the dimension number of the phonemic task is set at a smaller value than that of articulatory variables. Consequently, the kinematic states of the articulators are partly constrained at given time instants by a sequence of phonemic tasks, and there remain unconstrained degrees of freedom of articulatory variables. Articulatory movements are determined so that they simultaneously satisfy given phonemic tasks and dynamic smoothness constraints. The dynamic smoothness constraints coupled with the underspecified phonemic targets allow our model to explain contextual articulatory variability using context-independent phonemic tasks. Finally, the capability of the model for predicting actual articulatory movements is quantitatively investigated using empirical articulatory data.  相似文献   

4.
Virtual pitch in a computational physiological model   总被引:2,自引:0,他引:2  
A computational model of nervous activity in the auditory nerve, cochlear nucleus, and inferior colliculus is presented and evaluated in terms of its ability to simulate psychophysically-measured pitch perception. The model has a similar architecture to previous autocorrelation models except that the mathematical operations of autocorrelation are replaced by the combined action of thousands of physiologically plausible neuronal components. The evaluation employs pitch stimuli including complex tones with a missing fundamental frequency, tones with alternating phase, inharmonic tones with equally spaced frequencies and iterated rippled noise. Particular attention is paid to differences in response to resolved and unresolved component harmonics. The results indicate that the model is able to simulate qualitatively the related pitch-perceptions. This physiological model is similar in many respects to autocorrelation models of pitch and the success of the evaluations suggests that autocorrelation models may, after all, be physiologically plausible.  相似文献   

5.
Complex rhythms are observed in the physiological systems that control and carry out vital bodily functions. Theoretical approaches to analyze the physiological systems include control theory and computation theory. Complementary to these approaches is nonlinear dynamics, which offers ways to classify both normal and abnormal dynamics, and to analyze bifurcations occurring in physiological dynamics.  相似文献   

6.
An original three-dimensional (3D) linear articulatory model of the velum and nasopharyngeal wall has been developed from magnetic resonance imaging (MRI) and computed tomography images of a French subject sustaining a set of 46 articulations, covering his articulatory repertoire. The velum and nasopharyngeal wall are represented by generic surface triangular meshes fitted to the 3D contours extracted from MRI for each articulation. Two degrees of freedom were uncovered by principal component analysis: first, VL accounts for 83% of the velum variance, corresponding to an oblique vertical movement seemingly related to the levator veli palatini muscle; second, VS explains another 6% of the velum variance, controlling a mostly horizontal movement possibly related to the sphincter action of the superior pharyngeal constrictor. The nasopharyngeal wall is also controlled by VL for 47% of its variance. Electromagnetic articulographic data recorded on the velum fitted these parameters exactly, and may serve to recover dynamic velum 3D shapes. The main oral and nasopharyngeal area functions controlled by the articulatory model, complemented by the area functions derived from the complex geometry of each nasal passage extracted from coronal MRIs, were fed to an acoustic model and gave promising results about the influence of velum movements on the spectral characteristics of nasals.  相似文献   

7.
《Physics letters. A》2005,335(4):282-288
In this Letter, several new theorems on the stability of impulsive control systems are presented. These theorem are then used to find the conditions under which an advertising strategy can be asymptotically control to the equilibrium point by using impulsive control. Given the parameters of the financial model and the impulsive control law, an estimation of the upper bound of the impulse interval is given, i.e., number of advert can been decreased (i.e., can decrease cost) for to obtain the equivalent advertising effect.The result is illustrated to be efficient through a numerical example.  相似文献   

8.
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.  相似文献   

9.
The articulatory kinematics of final lengthening   总被引:4,自引:0,他引:4  
In order to understand better the phonetic control of final lengthening, the articulation of phrase-final syllables was compared with that of two other contexts known to increase syllable duration: accent and slow tempo. The kinematics of jaw movements in [pap] sequences and of lower lip movements in [pe] sequences for four subjects were interpreted in terms of a task-dynamic model. There was evidence of two different control strategies: decreasing intragestural stiffness to slow down some part of the syllable, and changing intergestural phasing to decrease overlap of the vowel gesture by the consonant. The first was used in slowing down tempo, whereas the second was used to increase the duration of accented syllables over unaccented syllables. Both strategies were implicated in phrase-final lengthening. In accented syllables, final closing gestures generally were longer and slower, but not more displaced. The two slowest subjects, however, used the other strategy in their slow-tempo final syllables. Final lengthening in reduced syllables was more difficult to interpret. The relationship between peak velocity and displacement suggested that a lesser stiffness is obscured by an increased gestural amplitude. Thus, by comparison to lengthening for accent, final lengthening is like a localized change in speaking tempo, although it cannot be equated directly with the specification of stiffness.  相似文献   

10.
The contributions of the static and dynamic articulatory information to speech recognition were evaluated, and the recognition approaches by combining the articulatory information with acoustic features were discussed. Articulatory movements were observed by the Electromagnetic Articulographic System for reading speech, and the speech signals were recorded simultaneously. First, we conducted several speech recognition experiments by using articulatory features alone, consisting of a number of specific articulatory channels, to evaluate the contribution of each observation point on articulators. Then, the displacement information of articulatory data were combined with acoustic features directly and adopted in speech recognition. The results show that articulatory information provides with additional information for speech recognition which is not encoded in acoustic features. Furthermore, the contribution of the dynamic information of the articulatory data was evaluated by combining them in speech recognition. It is found that the second derivative of articulatory information provided quite larger contribution to speech recognition comparing with the second derivative of acoustical information. At last, the combination methods of articulatory features and acoustic ones were investigated for speech recognition. The basic approach is that the Bayesian Network (BN) is added to each state of HMM, where the articulatory information is represented by the BN as a factor of observed signals during training the model and is marginalized as a hidden variable in recognition stage. Results based on this HMM/BN framework show a better performance than the traditional method.  相似文献   

11.
This paper investigates the coordination between the jaw, the tongue tip, and the lower lip during repetition with rate increase of labial-to-coronal (L(a)C(o)) consonant-vowel-consonant-vowel disyllables (e.g., /pata/) and coronal-to-labial (C(o)L(a)) ones (e.g., /tapa/) by French speakers. For the two types of disyllables: (1) the speeding process induces a shift from two jaw cycles per disyllable to a single cycle; (2) this shift modifies the coordination between the jaw and the constrictors, and (3) comes with a progression toward either a L(a)C(o) attractor [e.g., (/pata/ or /tapa/) --> /patá/ --> /ptá/] or a C(o)L(a) one (e.g., /pata/ or /tapa/ --> /tapá/ --> /tpá/). Yet, (4) the L(a)C(o) attractor is clearly favored regardless of the initial sequencing. These results are interpreted as evidence that a L(a)C(o) CVCV disyllable could be a more stable coordinative pattern for the lip-tongue-jaw motor system than a C(o)L(a) one. They are discussed in relation with the so-called LC effect that is the preference for L(a)C(o) associations rather than C(o)L(a) ones in CV.CV disyllables in both world languages and infants' first words.  相似文献   

12.
13.
The present article aims at exploring the invariant parameters involved in the perceptual normalization of French vowels. A set of 490 stimuli, including the ten French vowels /i y u e ? o E oe (inverted c) a/ produced by an articulatory model, simulating seven growth stages and seven fundamental frequency values, has been submitted as a perceptual identification test to 43 subjects. The results confirm the important effect of the tonality distance between F1 and f0 in perceived height. It does not seem, however, that height perception involves a binary organization determined by the 3-3.5-Bark critical distance. Regarding place of articulation, the tonotopic distance between F1 and F2 appears to be the best predictor of the perceived front-back dimension. Nevertheless, the role of the difference between F2 and F3 remains important. Roundedness is also examined and correlated to the effective second formant, involving spectral integration of higher formants within the 3.5-Bark critical distance. The results shed light on the issue of perceptual invariance, and can be interpreted as perceptual constraints imposed on speech production.  相似文献   

14.
With the use of an endoscopic, high-speed camera, vocal fold dynamics may be observed clinically during phonation. However, observation and subjective judgment alone may be insufficient for clinical diagnosis and documentation of improved vocal function, especially when the laryngeal disease lacks any clear morphological presentation. In this study, biomechanical parameters of the vocal folds are computed by adjusting the corresponding parameters of a three-dimensional model until the dynamics of both systems are similar. First, a mathematical optimization method is presented. Next, model parameters (such as pressure, tension and masses) are adjusted to reproduce vocal fold dynamics, and the deduced parameters are physiologically interpreted. Various combinations of global and local optimization techniques are attempted. Evaluation of the optimization procedure is performed using 50 synthetically generated data sets. The results show sufficient reliability, including 0.07 normalized error, 96% correlation, and 91% accuracy. The technique is also demonstrated on data from human hemilarynx experiments, in which a low normalized error (0.16) and high correlation (84%) values were achieved. In the future, this technique may be applied to clinical high-speed images, yielding objective measures with which to document improved vocal function of patients with voice disorders.  相似文献   

15.
An unconstrained optimization technique is used to find the values of parameters, of a combination of an articulatory and a vocal tract model, that minimize the difference between model spectra and natural speech spectra. The articulatory model is anatomically realistic and the vocal tract model is a "lossy" Webster equation for which a method of solution is given. For English vowels in the steady state, anatomically reasonable articulatory configurations whose corresponding spectra match those of human speech to within 2 dB have been computed in fewer than ten iterations. Results are also given which demonstrate a limited ability of the system to track the articulatory dynamics of voiced speech.  相似文献   

16.
Masking-period patterns (MPP) have been regarded by Zwicker [J. Acoust. Soc. Am. 59, 166-175 (1976)] as psychoacoustic equivalents of period histograms (PH) measured in auditory neurons. Various models have been proposed in the literature to account for his results. We present here a set of results on the MPP produced by a 40-Hz triangular masker that cannot be reproduced by any of these models. This leads to the elaboration of a new model for predicting MPP assuming the existence of nonlinearities in the basilar membrane, and based on neural fast adaptation and rectification, and a simple detection device. This model is shown to be able to account for the whole set of available results, and thus to provide a good basis for the use of MPP as a psychoacoustic tool for the study of PH.  相似文献   

17.
An application of functional data analysis (FDA) (Ramsay and Silverman, 2005, Functional Data Analysis, 2nd ed. (Springer-Verlag, New York)) for linguistic experimentation is explored. The functional time-registration method provided by FDA is shown to offer novel advantages in the investigation of articulatory timing. Traditionally, articulatory studies examining the effects of linguistic variables such as prosody on articulatory timing have relied on comparing the durations of speech intervals of interest defined by kinematic landmarks. Such measurements, however, do not preserve information on the detailed, continuous pattern of articulatory timing that unfolds during these intervals. We present an approach that allows the analysis of entire, continuous kinematic trajectories obtained in a movement tracking experiment examining the influence of a phrasal boundary on articulatory patterning. FDA time deformation functions, after alignment of test and reference (control) signals, reveal delaying of articulator movement (i.e., slowing of the internal clock rate) in the presence of a phrase boundary as the speech stream recedes from the boundary. This is a theoretically predicted pattern (Byrd and Saltzman, 2003, The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening, Journal of Phonetics 31, 149-180.), which would be more difficult to validate with a traditional interval-based approach. It is concluded that the FDA time alignment method provides a useful tool for characterizing timing patterns in linguistic experimentation based on continuous kinematic trajectories.  相似文献   

18.
19.
S Rajasekar 《Pramana》1997,48(1):249-258
In this paper we consider the Bonhoeffer-van der Pol (BVP) equation which describes propagation of nerve pulses in a neural membrane, and characterize the chaotic attractor at various bifurcations, and the probability distribution associated with weak and strong chaos. We illustrate control of chaos in the BVP equation by the Ott-Grebogi-Yorke method as well as through a periodic instantaneous burst.  相似文献   

20.
A mathematical model for protein lysozyme crystallization from an aqueous solution, caused by the spatial distribution of a precipitant (NaCl) and the point control action of temperature, was constructed, and numerical studies were performed. The mathematical model describes the formation of crystal nuclei and their growth as a function of the local supersaturation, as well as heat-and-mass transfer in the entire solution region, including protein crystals. The heat-and-mass transfer is described by Navier-Stokes equations in the Boussinesq approximation, taking into account thermogravitational and concentration convection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号