首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The ability of subjects to identify vowels in vibrotactile transformations of consonant-vowel syllables was measured for two types of displays: a spectral display (frequency by intensity), and a vocal tract area function display (vocal tract location by cross-sectional area). Both displays were presented to the fingertip via the tactile display of the Optacon transducer. In the first experiments the spectral display was effective for identifying vowels in /b/V/ context when as many as 24 or as few as eight spectral channels were presented to the skin. However, performance fell when the 12- and 8-channel displays were reduced in size to occupy 1/2 or 1/3 of the 24-row tactile matrix. The effect of reducing the size of the display was greater when the spectrum was represented as a solid histogram ("filled" patterns) than when it was represented as a simple spectral contour ("unfilled" patterns). Spatial masking within the filled pattern was postulated as the cause for this decline in performance. Another experiment measured the utility of the spectral display when the syllables were produced by multiple speakers. The resulting increase in response confusions was primarily attributable to variations in the tactile patterns caused by differences in vocal tract resonances among the speakers. The final experiment found an area function display to be inferior to the spectral display for identification of vowels. The results demonstrate that a two-dimensional spectral display is worthy of further development as a basic vibrotactile display for speech.  相似文献   

2.
The purpose of this study was to use vocal tract simulation and synthesis as means to determine the acoustic and perceptual effects of changing both the cross-sectional area and location of vocal tract constrictions for six different vowels: Area functions at and near vocal tract constrictions are considered critical to the acoustic output and are also the central point of hypotheses concerning speech targets. Area functions for the six vowels, [symbol: see text] were perturbed by changing the cross-sectional area of the constriction (Ac) and the location of the constriction (Xc). Perturbations for Ac were performed for different values of Xc, producing several series of acoustic continua for the different vowels. Acoustic simulations for the different area functions were made using a frequency domain model of the vocal tract. Each simulated vowel was then synthesized as a 1-s duration steady-state segment. The phoneme boundaries of the perturbed synthesized vowels were determined by formal perception tests. Results of the perturbation analyses showed that formants for each of the vowels were more sensitive to changes in constriction cross-sectional area than changes in constriction location. Vowel perception, however, was highly resistant to both types of changes. Results are discussed in terms of articulatory precision and constriction-related speech production strategies.  相似文献   

3.
Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels.  相似文献   

4.
The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.  相似文献   

5.
Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.  相似文献   

6.
Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated based on vocal tract area functions and acoustic analysis. The laryngeal cavity consists of the vestibular and ventricular parts of the larynx, and gives rise to a regional acoustic resonance within the vocal tract, with this resonance imparting an extra formant to the vocal tract resonance pattern. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7 kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. Furthermore, a time-domain acoustic analysis of vowels indicated that the resonance component could be observed as a pitch-synchronized rise-and-fall pattern of the bandpass amplitude. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.  相似文献   

7.
The acoustic effects of the laryngeal cavity on the vocal tract resonance were investigated by using vocal tract area functions for the five Japanese vowels obtained from an adult male speaker. Transfer functions were examined with the laryngeal cavity eliminated from the whole vocal tract, volume velocity distribution patterns were calculated, and susceptance matching analysis was performed between the laryngeal cavity and the vocal tract excluding the laryngeal cavity (vocal tract proper). It was revealed that the laryngeal cavity generates one of the formants of the vocal tract, which is the fourth in the present study. At this formant, the resonance of the laryngeal cavity (the 1/4 wavelength resonance) induces the open-tube resonance of the vocal tract proper (the 3/2 wavelength resonance). At the other formants, on the other hand, the vocal tract proper acts as a closed tube, because the laryngeal cavity has only a small contribution to generating these formants and the effective closed end of the whole vocal tract is the junction between the laryngeal cavity and the vocal tract proper.  相似文献   

8.
The inverse filter is a serial cascade of filter elements with a transfer function that cancels the effect of the poles of the vocal tract transfer function on the acoustic waveform to reveal the underlying glottal volume velocity waveform. Inaccuracies in the glottal wave reconstruction derived from an all-zero inverse filter can be attributed to deviations of the vocal tract transfer function from an all-pole model. Presented is an analysis of the error stemming from the effect of the yielding vocal tract sidewalls on the vocal tract transfer function. Predictions about the resulting artifacts in the estimated glottal volume velocity are derived from an acoustic model. These predictions are confirmed by applying a linear predictive coding (LPC) inverse filter analysis method to vowels synthesized using a transmission line model of the vocal tract containing yielding sidewall parameters as well as natural productions of nonnasalized vowels.  相似文献   

9.
A technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value.  相似文献   

10.
This article investigates using real-time magnetic resonance imaging the vocal tract shaping of 5 soprano singers during the production of two-octave scales of sung vowels. A systematic shift of the first vocal tract resonance frequency with respect to the fundamental is shown to exist for high vowels across all subjects. No consistent systematic effect on the vocal tract resonance could be shown across all of the subjects for other vowels or for the second vocal tract resonance.  相似文献   

11.
An area function model of the vocal tract is tested for its ability to produce typical vowel formant frequencies with a perturbation at the lips. The model, which consists of a neutral shape and two weighted orthogonal shaping patterns (modes), has previously been shown to produce a nearly one-to-one mapping between formant frequencies and the weighting coefficients of the modes [Story and Titze, J. Phonetics, 26, 223-260 (1998)]. In this study, a perturbation experiment was simulated by imposing a constant area "lip tube" on the model. The mapping between the mode coefficients and formant frequencies was then recomputed with the lip tube in place and showed that formant frequencies (F1 and F2) representative of the vowels [u,o,u] could no longer be produced with the model. However, when the mode coefficients were allowed to exceed their typical bounding values, the mapping between them and the formant frequencies was expanded such that the vowels [u,o,u] were compensated. The area functions generated by these exaggerated coefficients were shown to be similar to vocal-tract shapes reported for real speakers under similar perturbed conditions [Savariaux, Perrier, and Orliaguet, J. Acoust. Soc. Am., 98, 2428-2442 (1995)]. This suggests that the structure of this particular model captures some of the human ability to configure the vocal-tract shape under both ordinary and extraordinary conditions.  相似文献   

12.
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.  相似文献   

13.
The vocal tract shape is three-dimensionally complex. For accurate acoustic analysis, a finite-difference time-domain method was introduced in the present study. By this method, transfer functions of the vocal tract for the five Japanese vowels were calculated from three-dimensionally reconstructed magnetic resonance imaging (MRI) data. The calculated transfer functions were compared with those obtained from acoustic measurements of vocal tract physical models precisely constructed from the same MRI data. Calculated transfer functions agreed well with measured ones up to 10 kHz. Acoustic effects of the piriform fossae, epiglottic valleculae, and inter-dental spaces were also examined. They caused spectral changes by generating dips. The amount of change was significant for the piriform fossae, while it was almost negligible for the other two. The piriform fossae and valleculae generated spectral dips for all the vowels. The dip frequencies of the piriform fossae were almost stable, while those of the valleculae varied among vowels. The inter-dental spaces generated very small spectral dips below 2.5 kHz for the high and middle vowels. In addition, transverse resonances within the oral cavity generated small spectral dips above 4 kHz for the low vowels.  相似文献   

14.
This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model.  相似文献   

15.
A 3D cine-MRI technique was developed based on a synchronized sampling method [Masaki et al., J. Acoust. Soc. Jpn. E 20, 375-379 (1999)] to measure the temporal changes in the vocal tract area function during a short utterance /aiueo/ in Japanese. A time series of head-neck volumes was obtained after 640 repetitions of the utterance produced by a male speaker, from which area functions were extracted frame-by-frame. A region-based analysis showed that the volumes of the front and back cavities tend to change reciprocally and that the areas near the larynx and posterior edge of the hard palate were almost constant throughout the utterance. The lower four formants were calculated from all the area functions and compared with those of natural speech sounds. The mean absolute percent error between calculated and measured formants among all the frames was 4.5%. The comparison of vocal tract shapes for the five vowels with those from the static MRI method suggested a problem of MRI observation of the vocal tract: data from static MRI tend to result in a deviation from natural vocal tract geometry because of the gravity effect.  相似文献   

16.
An unconstrained optimization technique is used to find the values of parameters, of a combination of an articulatory and a vocal tract model, that minimize the difference between model spectra and natural speech spectra. The articulatory model is anatomically realistic and the vocal tract model is a "lossy" Webster equation for which a method of solution is given. For English vowels in the steady state, anatomically reasonable articulatory configurations whose corresponding spectra match those of human speech to within 2 dB have been computed in fewer than ten iterations. Results are also given which demonstrate a limited ability of the system to track the articulatory dynamics of voiced speech.  相似文献   

17.
An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha Hogskolan, Stockholm, 1975, pp. 1-14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am. 119, 715-718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels.  相似文献   

18.
Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the model's definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.  相似文献   

19.
The relation between the spatial configuration of the vocal tract as determined by magnetic resonance imaging (MRI) and the acoustical signal produced was investigated. A male subject carried out a set of phonatory tasks, comprising the utterance of the sustained vowels /i/ and /a/, each in a single articulation, and the vowel /epsilon/ with his larynx positioned variously on a vertical axis. Two- and three-dimensional measurements of the vocal tract were performed. The results of these measurements were used to calculate resonance frequencies, according to predictions from acoustical theory. Finally, calculated frequencies were compared with actually measured resonance frequencies in the audio signal. We found a strong relation between the acoustical signal produced and the spatial configuration for the first resonance frequencies of the articulations of the vowel /epsilon/, and first two resonance frequencies of the vowels /a/ and /i/. The capability to determine accurately vocal tract dimensions is a major advantage of this imaging technique.  相似文献   

20.
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号