期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dynamic articulatory model based on multidimensional invariant-feature task representation

Kaburagi T Honda M 《The Journal of the Acoustical Society of America》2001,110(1):441-452

A dynamic model of articulatory movements is introduced. The research presented herein focuses on the method of representing the phonemic tasks, i.e., phoneme-specific articulatory targets. Phonemic tasks in our model are formally defined using invariant features of articulatory posture. The invariant features used in the model are characterized by the linear transformation of articulatory variables and found using a statistical analysis of measured articulatory movements, in which the articulatory features with minimum variability are taken to be the invariant features. Articulatory movements making vocal-tract constrictions or relative movements among articulators reflecting task-sharing structures are typical examples of the features found to have low variability. In the trajectory formation of articulatory movements, the dimension number of the phonemic task is set at a smaller value than that of articulatory variables. Consequently, the kinematic states of the articulators are partly constrained at given time instants by a sequence of phonemic tasks, and there remain unconstrained degrees of freedom of articulatory variables. Articulatory movements are determined so that they simultaneously satisfy given phonemic tasks and dynamic smoothness constraints. The dynamic smoothness constraints coupled with the underspecified phonemic targets allow our model to explain contextual articulatory variability using context-independent phonemic tasks. Finally, the capability of the model for predicting actual articulatory movements is quantitatively investigated using empirical articulatory data. 相似文献

2.

Articulatory tradeoffs reduce acoustic variability during American English /r/ production.

F H Guenther C Y Espy-Wilson S E Boyce M L Matthies M Zandipour J S Perkell 《The Journal of the Acoustical Society of America》1999,105(5):2854-2865

The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme. 相似文献

3.

Generation of articulatory movements by using a kinematic triphone model

Okadome T Honda M 《The Journal of the Acoustical Society of America》2001,110(1):453-463

The method described here predicts the trajectories of articulatory movements for continuous speech by using a kinematic triphone model and the minimum-acceleration model. The kinematic triphone model, which is constructed from articulatory data obtained from experiments using an electro-magnetic articulographic system, is characterized by three kinematic features of a triphone and by the intervals between two successive phonemes in the triphone. After a kinematic feature of a phoneme in a given sentence is extracted, the minimum-acceleration trajectory that coincides with the extremum of the time integral of the squared magnitude of the articulator acceleration is formulated. The calculation of the minimum acceleration requires only linear computation. The method predicts both the qualitative features and the quantitative details of experimentally observed articulation. 相似文献

4.

Laminar cortical dynamics of conscious speech perception: neural model of phonemic restoration using subsequent context in noise

Grossberg S Kazerounian S 《The Journal of the Acoustical Society of America》2011,130(1):440-460

How are laminar circuits of neocortex organized to generate conscious speech and language percepts? How does the brain restore information that is occluded by noise, or absent from an acoustic signal, by integrating contextual information over many milliseconds to disambiguate noise-occluded acoustical signals? How are speech and language heard in the correct temporal order, despite the influence of contexts that may occur many milliseconds before or after each perceived word? A neural model describes key mechanisms in forming conscious speech percepts, and quantitatively simulates a critical example of contextual disambiguation of speech and language; namely, phonemic restoration. Here, a phoneme deleted from a speech stream is perceptually restored when it is replaced by broadband noise, even when the disambiguating context occurs after the phoneme was presented. The model describes how the laminar circuits within a hierarchy of cortical processing stages may interact to generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks. Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is heard depends upon future context. 相似文献

5.

A self-learning predictive model of articulator movements during speech production

Blackburn CS Young S 《The Journal of the Acoustical Society of America》2000,107(3):1659-1670

A model is presented which predicts the movements of flesh points on the tongue, lips, and jaw during speech production, from time-aligned phonetic strings. Starting from a database of x-ray articulator trajectories, means and variances of articulator positions and curvatures at the midpoints of phonemes are extracted from the data set. During prediction, the amount of articulatory effort required in a particular phonetic context is estimated from the relative local curvature of the articulator trajectory concerned. Correlations between position and curvature are used to directly predict variations from mean articulator positions due to coarticulatory effects. Use of the explicit coarticulation model yields a significant increase in articulatory modeling accuracy with respect to x-ray traces, as compared with the use of mean articulator positions alone. 相似文献

6.

A modeling investigation of articulatory variability and acoustic stability during American English /r/ production

Nieto-Castanon A Guenther FH Perkell JS Curtin HD 《The Journal of the Acoustical Society of America》2005,117(5):3196-3212

This paper investigates the functional relationship between articulatory variability and stability of acoustic cues during American English /r/ production. The analysis of articulatory movement data on seven subjects shows that the extent of intrasubject articulatory variability along any given articulatory direction is strongly and inversely related to a measure of acoustic stability (the extent of acoustic variation that displacing the articulators in this direction would produce). The presence and direction of this relationship is consistent with a speech motor control mechanism that uses a third formant frequency (F3) target; i.e., the final articulatory variability is lower for those articulatory directions most relevant to determining the F3 value. In contrast, no consistent relationship across speakers and phonetic contexts was found between hypothesized vocal-tract target variables and articulatory variability. Furthermore, simulations of two speakers' productions using the DIVA model of speech production, in conjunction with a novel speaker-specific vocal-tract model derived from magnetic resonance imaging data, mimic the observed range of articulatory gestures for each subject, while exhibiting the same articulatory/acoustic relations as those observed experimentally. Overall these results provide evidence for a common control scheme that utilizes an acoustic, rather than articulatory, target specification for American English /r/. 相似文献

7.

A generalized smoothness criterion for acoustic-to-articulatory inversion

Ghosh PK Narayanan S 《The Journal of the Acoustical Society of America》2010,128(4):2162-2172

The many-to-one mapping from representations in the speech articulatory space to acoustic space renders the associated acoustic-to-articulatory inverse mapping non-unique. Among various techniques, imposing smoothness constraints on the articulator trajectories is one of the common approaches to handle the non-uniqueness in the acoustic-to-articulatory inversion problem. This is because, articulators typically move smoothly during speech production. A standard smoothness constraint is to minimize the energy of the difference of the articulatory position sequence so that the articulator trajectory is smooth and low-pass in nature. Such a fixed definition of smoothness is not always realistic or adequate for all articulators because different articulators have different degrees of smoothness. In this paper, an optimization formulation is proposed for the inversion problem, which includes a generalized smoothness criterion. Under such generalized smoothness settings, the smoothness parameter can be chosen depending on the specific articulator in a data-driven fashion. In addition, this formulation allows estimation of articulatory positions recursively over time without any loss in performance. Experiments with the MOCHA TIMIT database show that the estimated articulator trajectories obtained using such a generalized smoothness criterion have lower RMS error and higher correlation with the actual measured trajectories compared to those obtained using a fixed smoothness constraint. 相似文献

8.

Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis.

R S McGowan S Cushing 《The Journal of the Acoustical Society of America》1999,106(2):1090-1105

A method is presented that accounts for differences in the acoustics of vowel production caused by human talkers' vocal-tract anatomies and postural settings. Such a method is needed by an analysis-by-synthesis procedure designed to recover midsagittal articulatory movement from speech acoustics because the procedure employs an articulatory model as an internal model. The normalization procedure involves the adjustment of parameters of the articulatory model that are not of interest for the midsagittal movement recovery procedure. These parameters are adjusted so that acoustic signals produced by the human and the articulatory model match as closely as possible over an initial set of pairs of corresponding human and model midsagittal shapes. Further, these initial midsagittal shape correspondence need to be generalized so that all midsagittal shapes of the human can be obtained from midsagittal shapes of the model. Once these procedures are complete, the midsagittal articulatory movement recovery algorithm can be used to derive model articulatory trajectories that, subsequently, can be transformed into human articulatory trajectories. In this paper the proposed normalization procedure is outlined and the results of experiments with data from two talkers contained in the X-ray Microbeam Speech Production Database are presented. It was found to be possible to characterize these vocal tracts during vowel production with the proposed procedure and to generalize the initial midsagittal correspondences over a set of vowels to other vowels. The procedure was also found to aid in midsagittal articulatory movement recovery from speech acoustics in a vowel-to-vowel production for the two subjects. 相似文献

9.

Estimating the control parameters of an articulatory model from electromagnetic articulograph data

Toutios A Ouni S Laprie Y 《The Journal of the Acoustical Society of America》2011,129(5):3245-3257

Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA. 相似文献

10.

An EMA study of VCV coarticulatory direction

Recasens D 《The Journal of the Acoustical Society of America》2002,111(6):2828-2841

This study addresses three issues that are relevant to coarticulation theory in speech production: whether the degree of articulatory constraint model (DAC model) accounts for patterns of the directionality of tongue dorsum coarticulatory influences; the extent to which those patterns in tongue dorsum coarticulatory direction are similar to those for the tongue tip; and whether speech motor control and phonemic planning use a fixed or a context-dependent temporal window. Tongue dorsum and tongue tip movement data on vowel-to-vowel coarticulation are reported for Catalan VCV sequences with vowels /i/, /a/, and /u/, and consonants /p/, /n/, dark /l/, /s/, /S/, alveolopalatal /n/ and /k/. Electromidsagittal articulometry recordings were carried out for three speakers using the Carstens articulograph. Trajectory data are presented for the vertical dimension for the tongue dorsum, and for the horizontal dimension for tongue dorsum and tip. In agreement with predictions of the DAC model, results show that directionality patterns of tongue dorsum coarticulation can be accounted for to a large extent based on the articulatory requirements on consonantal production. While dorsals exhibit analogous trends in coarticulatory direction for all articulators and articulatory dimensions, this is mostly so for the tongue dorsum and tip along the horizontal dimension in the case of lingual fricatives and apicolaminal consonants. This finding results from different articulatory strategies: while dorsal consonants are implemented through homogeneous tongue body activation, the tongue tip and tongue dorsum act more independently for more anterior consonantal productions. Discontinuous coarticulatory effects reported in the present investigation suggest that phonemic planning is adaptative rather than context independent. 相似文献

11.

Adaptive computation of articulatory parameters from the speech signal

S E Levinson C E Schmidt 《The Journal of the Acoustical Society of America》1983,74(4):1145-1154

An unconstrained optimization technique is used to find the values of parameters, of a combination of an articulatory and a vocal tract model, that minimize the difference between model spectra and natural speech spectra. The articulatory model is anatomically realistic and the vocal tract model is a "lossy" Webster equation for which a method of solution is given. For English vowels in the steady state, anatomically reasonable articulatory configurations whose corresponding spectra match those of human speech to within 2 dB have been computed in fewer than ten iterations. Results are also given which demonstrate a limited ability of the system to track the articulatory dynamics of voiced speech. 相似文献

12.

A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model

Panchapagesan S Alwan A 《The Journal of the Acoustical Society of America》2011,129(4):2144-2162

In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants. 相似文献

13.

Statistical properties of Chinese phonemic networks

Shuiyuan Yu Chunshan Xu 《Physica A》2011,390(7):1370-1380

The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects. 相似文献

14.

Construction and control of a physiological articulatory model

Dang J Honda K 《The Journal of the Acoustical Society of America》2004,115(2):853-870

A physiological articulatory model has been constructed using a fast computation method, which replicates midsagittal regions of the speech organs to simulate articulatory movements during speech. This study aims to improve the accuracy of modeling by using the displacement-based finite-element method and to develop a new approach for controlling the model. A "semicontinuum" tongue tissue model was realized by a discrete truss structure with continuum viscoelastic cylinders. Contractile effects of the muscles were systemically examined based on model simulations. The results indicated that each muscle drives the tongue toward an equilibrium position (EP) corresponding to the magnitude of the activation forces. The EPs shifted monotonically as the activation force increased. The monotonic shift revealed a unique and invariant mapping, referred to as an EP map, between a spatial position of the articulators and the muscle forces. This study proposes a control method for the articulatory model based on the EP maps, in which co-contractions of agonist and antagonist muscles are taken into account. By utilizing the co-contraction, the tongue tip and tongue dorsum can be controlled to reach their targets independently. Model simulation showed that the co-contraction of agonist and antagonist muscles could increase the stability of a system in dynamic control. 相似文献

15.

Incorporation of phonetic constraints in acoustic-to-articulatory inversion

Potard B Laprie Y Ouni S 《The Journal of the Acoustical Society of America》2008,123(4):2310-2323

This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model. 相似文献

16.

The redundancy of phonemes in sentential context

Stilp CE 《The Journal of the Acoustical Society of America》2011,130(5):EL323-EL328

Printed English is highly redundant as demonstrated by readers' facility at guessing which letter comes next in text. However, such findings have been generalized to perception of connected speech without any direct assessment of phonemic redundancy. Here, participants guessed which phoneme or printed character came next throughout each of four unrelated sentences. Phonemes displayed significantly lower redundancy than letters, and possible contributing factors (task difficulty, experience, context) are discussed. Of three models tested, phonemic guessing was best approximated by word-initial and transitional probabilities between phonemes. Implications for information-theoretic accounts of speech perception are considered. 相似文献

17.

Effects of stress and final-consonant voicing on vowel production: articulatory and acoustic analyses

W V Summers 《The Journal of the Acoustical Society of America》1987,82(3):847-863

Durations of the vocalic portions of speech are influenced by a large number of linguistic and nonlinguistic factors (e.g., stress and speaking rate). However, each factor affecting vowel duration may influence articulation in a unique manner. The present study examined the effects of stress and final-consonant voicing on the detailed structure of articulatory and acoustic patterns in consonant-vowel-consonant (CVC) utterances. Jaw movement trajectories and F 1 trajectories were examined for a corpus of utterances differing in stress and final-consonant voicing. Jaw lowering and raising gestures were more rapid, longer in duration, and spatially more extensive for stressed versus unstressed utterances. At the acoustic level, stressed utterances showed more rapid initial F 1 transitions and more extreme F 1 steady-state frequencies than unstressed utterances. In contrast to the results obtained in the analysis of stress, decreases in vowel duration due to devoicing did not result in a reduction in the velocity or spatial extent of the articulatory gestures. Similarly, at the acoustic level, the reductions in formant transition slopes and steady-state frequencies demonstrated by the shorter, unstressed utterances did not occur for the shorter, voiceless utterances. The results demonstrate that stress-related and voicing-related changes in vowel duration are accomplished by separate and distinct changes in speech production with observable consequences at both the articulatory and acoustic levels. 相似文献

18.

Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion

Ghosh PK Narayanan S 《The Journal of the Acoustical Society of America》2011,130(4):EL251-EL257

An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched. 相似文献

19.

一种基于音素模型感知度的发音质量评价方法 总被引：1，自引：1，他引：0

下载免费PDF全文

张茹韩纪庆《声学学报》2013,38(2):201-207

为了提高发音质量判别精度,提出了一种基于音素模型感知度的发音质量评价方法。它采用不同语音样本集合下样本声学特征的对数后验概率期望差作为音素模型对变异发音的感知度,并以此为基础,生成各音素对应的识别模型候选集。实验表明,所提出的方法使语音识别网络候选音素模型集合尺寸减少约95%;在非母语语音数据库上,该方法评分与人工专家打分相关性为0.828,基于该方法得到的声韵母错误检出率为70.8%,声调错误检出率为42.5%,均优于其它方法。相似文献

20.

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

Ghosh PK Goldstein LM Narayanan SS 《The Journal of the Acoustical Society of America》2011,129(6):4014-4022

Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. 相似文献