首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM.  相似文献   

2.
3.
This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speaker's vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal scans of repetitive consonant-vowel (CV) syllable production. For reference, high-quality acoustic recordings of the speech material are also available. The raw data are made freely available for research purposes.  相似文献   

4.
SUMMARY: Acoustic pharyngometry evaluates the geometry of the vocal tract with acoustic reflections and provides information about vocal tract cross-sectional area and volume from lip to the glottis. Variations in vocal tract diameters are needed for speech scientists to validate various acoustic models and for medical professionals since the advent of endoscopic surgical techniques. Race is known to be one of the most important factors affecting the oral and nasal structures. This study compared vocal tract dimensions of White American, African American, and Chinese male and female speakers. One hundred and twenty healthy adult subjects with equal numbers of men and women were divided among three races. Subjects were controlled for age, gender, height, and weight. Six dimensional parameters of the speakers' vocal tract cavities were measured with acoustic reflection technology (AR). Significant gender and race main effects were found in certain vocal tract dimensions. The findings of this study now provide speech scientists, speech-language pathologists, and other health professionals with a new anatomical database of vocal tract variations for adult speakers from three different races.  相似文献   

5.
A 3D cine-MRI technique was developed based on a synchronized sampling method [Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 (1999)] to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.  相似文献   

6.
Magnetic resonance imaging (MRI) technique enables non-invasive analysis of the human vocal tract during phonation. Creation of MR images of the vocal tract is accompanied by simultaneous recording of the produced speech. The paper analyzes and compares spectral properties of an acoustical noise produced by mechanical vibration of the gradient coils during scanning in the open-air MRI equipment working in a weak magnetic field with low B0 up to 0.2 T. This noise exhibits harmonic character, so it is suitable to analyze its properties in the spectral domain. Obtained results of spectral analysis will be used to devise a new cepstral-based filtering method for noise suppression of recorded speech.  相似文献   

7.
An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha Hogskolan, Stockholm, 1975, pp. 1-14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am. 119, 715-718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels.  相似文献   

8.
The skilled use of nonperiodic phonation techniques in combination with spectrum analysis has been proposed here as a practical method for locating formant frequencies in the singing voice. The study addresses the question of the degree of similarity between sung phonations and their nonperiodic imitations, with respect to both frequency of the first two formants as well as posture of the vocal tract. Using magnetic resonance imaging (MRI), linear predictive coding (LPC), and spectrum analysis, two types of nonperiodic phonation (ingressive and vocal fry) are compared with singing phonations to determine the degree of similarity/difference in acoustic and spatial dimensions of the vocal tract when these phonation types are used to approximate the postures of singing. In comparing phonation types, the close similarity in acoustic data in combination with the relative dissimilarity in spatial data indicates that the accurate imitations are not primarily the result of imitating the singing postures, but have instead an aural basis.  相似文献   

9.
The acoustical consequences of articulatory maneuvers of [y] are studied in model experiments in order to obtain insights into articulator programming and speech motor control by elucidating the role of each component maneuver of a speech segment in setting up vocal tract resonance conditions for the spectral features of the speech wave. The maneuvers of [y] are found to provide a maximum and stable plain-flat spectral contrast with [i]. The results can be generalized to different vocal tract sizes. Tongue retraction and larynx depression are rejected as compensations to counteract labial undershoot. Larynx depression is complementary to lip rounding and restores spectral sensitivity to palatal and pharyngeal tongue movements otherwise disturbed by the labial activity. Spectral sensitivity then remains the same for [i] and [y], and there is no need for separate compensation programs for each of these phones.  相似文献   

10.
In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed.  相似文献   

11.
A method for the analysis of vocal tract parameters is developed, aimed to perform quantitative analysis of rigidity from speech signals of Parkinsonian patients. The cross-sectional area function of the vocal tract is calculated using pitch synchronous autoregressive moving average (ARMA) analysis. The changes in Parkinsonian subjects of the cross-sectional area during the utterance of sustained sounds are attributed to both Parkinsonian tremor and rigidity. In order to isolate the effects of the rigidity on the vocal tract from those of the tremor, an adaptive tremor cancellation (ATC) algorithm is developed, based on the correlation of tremor signals extracted from different locations of the speech production system.  相似文献   

12.
Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels.  相似文献   

13.
The transmission-line method is studied systematically as applied to the vocal tract approximated by a sequence of conical horns. The constructed scheme describes the propagation of plane waves in conical horns, with all factors interesting in terms of acoustic theory of speech production, viz., losses, nonrigid vocal tract walls, and potential side-branches, taken into account. The derived equations are tested on a cross-sectional areas of the vocal tract measured by magnetic-resonance tomography on a real speaker.  相似文献   

14.
The relation between the spatial configuration of the vocal tract as determined by magnetic resonance imaging (MRI) and the acoustical signal produced was investigated. A male subject carried out a set of phonatory tasks, comprising the utterance of the sustained vowels /i/ and /a/, each in a single articulation, and the vowel /epsilon/ with his larynx positioned variously on a vertical axis. Two- and three-dimensional measurements of the vocal tract were performed. The results of these measurements were used to calculate resonance frequencies, according to predictions from acoustical theory. Finally, calculated frequencies were compared with actually measured resonance frequencies in the audio signal. We found a strong relation between the acoustical signal produced and the spatial configuration for the first resonance frequencies of the articulations of the vowel /epsilon/, and first two resonance frequencies of the vowels /a/ and /i/. The capability to determine accurately vocal tract dimensions is a major advantage of this imaging technique.  相似文献   

15.
Although advances in techniques for image acquisition and analysis have facilitated the direct measurement of three-dimensional vocal tract air space shapes associated with specific speech phonemes, little information is available with regard to changes in three-dimensional (3-D) vocal tract shape as a function of vocal register, pitch, and loudness. In this study, 3-D images of the vocal tract during falsetto and chest register phonations at various pitch and loudness conditions were obtained using electron beam computed tomography (EBCT). Detailed measurements and differences in vocal tract configuration and formant characteristics derived from the eight measured vocal tract shapes are reported.  相似文献   

16.
Over the last few decades, researchers have been investigating the mechanisms involved in speech production. Image analysis can be a valuable aid in the understanding of the morphology of the vocal tract. The application of magnetic resonance imaging to study these mechanisms has been proven to be reliable and safe. We have applied deformable models in magnetic resonance images to conduct an automatic study of the vocal tract; mainly, to evaluate the shape of the vocal tract in the articulation of some European Portuguese sounds, and then to successfully automatically segment the vocal tract's shape in new images. Thus, a point distribution model has been built from a set of magnetic resonance images acquired during artificially sustained articulations of 21 sounds, which successfully extracts the main characteristics of the movements of the vocal tract. The combination of that statistical shape model with the gray levels of its points is subsequently used to build active shape models and active appearance models. Those models have then been used to segment the modeled vocal tract into new images in a successful and automatic manner. The computational models have thus been revealed to be useful for the specific area of speech simulation and rehabilitation, namely to simulate and recognize the compensatory movements of the articulators during speech production.  相似文献   

17.
Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.  相似文献   

18.
The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing.  相似文献   

19.
Acoustic radiation impedance of the mouth is an important parameter when the vocal tract is modelled by the equivalent electrical circuit. If the vocal tract is closed by a cavity, as when the speaker wears some kind of mask, total impedance acoustically loading the vocal tract becomes serial connection of the mouth radiation impedance and the mask impedance. In that case the mouth radiation impedance has to be changed compared to free field conditions. This paper introduces a simplified approach to the modelling of that change by an appropriate reduction coefficient. The analysis based on an experiment preformed by measurement in the vocal tract physical model accompanied with analytical estimation has shown that the value of such reduction coefficient is 0.5. The results reveal that for a vocal tract closed with mask cavity the change in mouth radiation impedance introduced in an equivalent electrical circuit can be approximated by the value for free field radiation decreased by about 50%.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号