首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 422 毫秒
1.
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.  相似文献   

2.
An area function model of the vocal tract is tested for its ability to produce typical vowel formant frequencies with a perturbation at the lips. The model, which consists of a neutral shape and two weighted orthogonal shaping patterns (modes), has previously been shown to produce a nearly one-to-one mapping between formant frequencies and the weighting coefficients of the modes [Story and Titze, J. Phonetics, 26, 223-260 (1998)]. In this study, a perturbation experiment was simulated by imposing a constant area "lip tube" on the model. The mapping between the mode coefficients and formant frequencies was then recomputed with the lip tube in place and showed that formant frequencies (F1 and F2) representative of the vowels [u,o,u] could no longer be produced with the model. However, when the mode coefficients were allowed to exceed their typical bounding values, the mapping between them and the formant frequencies was expanded such that the vowels [u,o,u] were compensated. The area functions generated by these exaggerated coefficients were shown to be similar to vocal-tract shapes reported for real speakers under similar perturbed conditions [Savariaux, Perrier, and Orliaguet, J. Acoust. Soc. Am., 98, 2428-2442 (1995)]. This suggests that the structure of this particular model captures some of the human ability to configure the vocal-tract shape under both ordinary and extraordinary conditions.  相似文献   

3.
The purpose of this study was to use vocal tract simulation and synthesis as means to determine the acoustic and perceptual effects of changing both the cross-sectional area and location of vocal tract constrictions for six different vowels: Area functions at and near vocal tract constrictions are considered critical to the acoustic output and are also the central point of hypotheses concerning speech targets. Area functions for the six vowels, [symbol: see text] were perturbed by changing the cross-sectional area of the constriction (Ac) and the location of the constriction (Xc). Perturbations for Ac were performed for different values of Xc, producing several series of acoustic continua for the different vowels. Acoustic simulations for the different area functions were made using a frequency domain model of the vocal tract. Each simulated vowel was then synthesized as a 1-s duration steady-state segment. The phoneme boundaries of the perturbed synthesized vowels were determined by formal perception tests. Results of the perturbation analyses showed that formants for each of the vowels were more sensitive to changes in constriction cross-sectional area than changes in constriction location. Vowel perception, however, was highly resistant to both types of changes. Results are discussed in terms of articulatory precision and constriction-related speech production strategies.  相似文献   

4.
Although listeners routinely perceive both the sex and individual identity of talkers from their speech, explanations of these abilities are incomplete. Here, variation in vocal production-related anatomy was assumed to affect vowel acoustics thought to be critical for indexical cueing. Integrating this approach with source-filter theory, patterns of acoustic parameters that should represent sex and identity were identified. Due to sexual dimorphism, the combination of fundamental frequency (F0, reflecting larynx size) and vocal tract length cues (VTL, reflecting body size) was predicted to provide the strongest acoustic correlates of talker sex. Acoustic measures associated with presumed variations in supralaryngeal vocal tract-related anatomy occurring within sex were expected to be prominent in individual talker identity. These predictions were supported by results of analyses of 2500 tokens of the /epsilon/ phoneme, extracted from the naturally produced speech of 125 subjects. Classification by talker sex was virtually perfect when F0 and VTL were used together, whereas talker classification depended primarily on the various acoustic parameters associated with vocal-tract filtering.  相似文献   

5.
A methodological study is presented to examine the acoustic role of the vocal tract in playing the trumpet. Preliminary results obtained for one professional player are also shown to demonstrate the effectiveness of the method. Images of the vocal tract with a resolution of 0.5 mm (2 mm in thickness) were recorded with magnetic resonance imaging to observe the tongue posture and estimate the vocal-tract area function during actual performance. The input impedance was then calculated for the player's air column including both the supra- and subglottal tracts using an acoustic tube model including the effect of wall losses. Finally, a time-domain blowing simulation by Adachi and Sato [J. Acoust. Soc. Am. 99, 1200-1209 (1996)] was performed with a model of the lips. In this simulation, the oscillating frequency of the lips was slightly affected by using different shapes of the vocal tract measured for the player. In particular, when the natural frequency of the lips was gradually increased, the transition to the higher mode occurred at different frequencies for different vocal-tract shapes. Furthermore, simulation results showed that the minimum blowing pressure required to attain the lip oscillation can be reduced by adjusting the vocal-tract shape properly.  相似文献   

6.
Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels.  相似文献   

7.
The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing.  相似文献   

8.
A hypothesis on the nature of articulatory targets for the vowels /i/ and /a/ is proposed, based on acoustic considerations and vowel articulations. The conjecture is that positioning of points on the tongue surface in a repetition experiment should be most accurate in the direction perpendicular to the vocal-tract midline, at the acoustically critical point of maximal constriction for each vowel. The hypothesis was tested by: examining x-ray microbeam data for three speakers, conducting a partial acoustical analysis, and performing a modeling study. Distributions were plotted of the midsagittal locations of three tongue points at the time of maximal excursion toward the vowel target for numbers of examples of the vowels, embedded in a variety of phonetic contexts. More variation was found along a direction parallel to the vocal tract midline than perpendicular to the midline, supporting the hypothesis. Statistics on formant values for one subject have been calculated, and pairwise regressions of displacement and formant data have been run. An articulatory synthesizer [Rubin et al., J. Acoust. Soc. Am. 70, 321-328 (1981)] has been manipulated through displacements similar to the subject's articulatory variation. Although articulatory synthesis showed systematic relationships between articulatory relationships and formant frequencies, there were no significant correlations between the subject's measured articulatory displacements and his formant data. These additional results raise questions about the methodology and point to the need for additional work for an adequate test of the hypothesis.  相似文献   

9.
In obstruent consonants, a major constriction in the upper vocal tract yields an increase in intraoral pressure (P(io)). Phonation requires that subglottal pressure (P(sub)) exceed P(io) by a threshold value, so as the transglottal pressure reaches the threshold, phonation will cease. This work investigates how P(io) levels at phonation offset and onset vary before and after different German voiceless obstruents (stop, fricative, affricates, clusters), and with following high vs low vowels. Articulatory contacts, measured using electropalatography, were recorded simultaneously with P(io) to clarify how supraglottal constrictions affect P(io). Effects of consonant type on phonation thresholds could be explained mainly in terms of the magnitude and timing of vocal-fold abduction. Phonation offset occurred at lower values of P(io) before fricative-initial sequences than stop-initial sequences, and onset occurred at higher levels of P(io) following the unaspirated stops of clusters compared to fricatives, affricates, and aspirated stops. The vowel effects were somewhat surprising: High vowels had an inhibitory effect at voicing offset (phonation ceasing at lower values of P(io)) in short-duration consonant sequences, but a facilitating effect on phonation onset that was consistent across consonantal contexts. The vowel influences appear to reflect a combination of vocal-fold characteristics and vocal-tract impedance.  相似文献   

10.
Three experiments used the Coordinated Response Measure task to examine the roles that differences in F0 and differences in vocal-tract length have on the ability to attend to one of two simultaneous speech signals. The first experiment asked how increases in the natural F0 difference between two sentences (originally spoken by the same talker) affected listeners' ability to attend to one of the sentences. The second experiment used differences in vocal-tract length, and the third used both F0 and vocal-tract length differences. Differences in F0 greater than 2 semitones produced systematic improvements in performance. Differences in vocal-tract length produced systematic improvements in performance when the ratio of lengths was 1.08 or greater, particularly when the shorter vocal tract belonged to the target talker. Neither of these manipulations produced improvements in performance as great as those produced by a different-sex talker. Systematic changes in both F0 and vocal-tract length that simulated an incremental shift in gender produced substantially larger improvements in performance than did differences in F0 or vocal-tract length alone. In general, shifting one of two utterances spoken by a female voice towards a male voice produces a greater improvement in performance than shifting male towards female. The increase in performance varied with the intonation patterns of individual talkers, being smallest for those talkers who showed most variability in their intonation patterns between different utterances.  相似文献   

11.
Acoustic effects of the time-varying glottal area due to vocal fold vibration on the laryngeal cavity resonance were investigated based on vocal tract area functions and acoustic analysis. The laryngeal cavity consists of the vestibular and ventricular parts of the larynx, and gives rise to a regional acoustic resonance within the vocal tract, with this resonance imparting an extra formant to the vocal tract resonance pattern. Vocal tract transfer functions of the five Japanese vowels uttered by three male subjects were calculated under open- and closed-glottis conditions. The results revealed that the resonance appears at the frequency region from 3.0 to 3.7 kHz when the glottis is closed and disappears when it is open. Real spectra estimated from open- and closed-glottis periods of vowel sounds also showed the on-off pattern of the resonance within a pitch period. Furthermore, a time-domain acoustic analysis of vowels indicated that the resonance component could be observed as a pitch-synchronized rise-and-fall pattern of the bandpass amplitude. The cyclic nature of the resonance can be explained as the laryngeal cavity acting as a closed tube that generates the resonance during a closed-glottis period, but damps the resonance off during an open-glottis period.  相似文献   

12.
A theory of interaction between the source of sound in phonation and the vocal tract filter is developed. The degree of interaction is controlled by the cross-sectional area of the laryngeal vestibule (epilarynx tube), which raises the inertive reactance of the supraglottal vocal tract. Both subglottal and supraglottal reactances can enhance the driving pressures of the vocal folds and the glottal flow, thereby increasing the energy level at the source. The theory predicts that instabilities in vibration modes may occur when harmonics pass through formants during pitch or vowel changes. Unlike in most musical instruments (e.g., woodwinds and brasses), a stable harmonic source spectrum is not obtained by tuning harmonics to vocal tract resonances, but rather by placing harmonics into favorable reactance regions. This allows for positive reinforcement of the harmonics by supraglottal inertive reactance (and to a lesser degree by subglottal compliant reactance) without the risk of instability. The traditional linear source-filter theory is encumbered with possible inconsistencies in the glottal flow spectrum, which is shown to be influenced by interaction. In addition, the linear theory does not predict bifurcations in the dynamical behavior of vocal fold vibration due to acoustic loading by the vocal tract.  相似文献   

13.
14.
Key voice features--fundamental frequency (F0) and formant frequencies--can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry are discussed as are implications for speech origins theories, and challenges to them, related to laryngeal position and vocal tract length.  相似文献   

15.
Although there has been continuing interest in voice quality, much of this research has focused on the vocal folds rather than the supraglottal structures. This paper reports the use of videoendoscopy for studying supraglottal participation in various singing tasks. In a preliminary study presented last year by the present authors, CT scanning was used to corroborate videoendoscopic observation. Vocal tract activities observed included variation of laryngeal height with pitch, variation of pharyngeal wall dimension with pitch and vowel, and marked supraglottic constriction with certain vocal imitations. In order to gain a better understanding of vocal training, and its effect upon vocal tract physiology, a study was designed using videoendoscopy to observe singers with significant experience and training while performing various vocal tasks. The tasks focused on the following: (1) vocal tract activity associated with pitch changes; (2) the physiology involved in the production of “cover”; (3) the structures involved in the production of vibrato; and (4) the physiology of the singer's “ring.” It would appear that videoendoscopy will become increasingly more valuable to the voice community as our understanding of vocal tract physiology improves.  相似文献   

16.
The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.  相似文献   

17.
This letter analyzes the oscillation onset-offset conditions of the vocal folds as a function of laryngeal size. A version of the two-mass model of the vocal folds is used, coupled to a two-tube approximation of the vocal tract in configuration for the vowel /a/. The standard male configurations of the laryngeal and vocal tract models are used as reference, and their dimensions are scaled using a single factor. Simulations of the vocal fold oscillation and oral output are produced for varying values of the scaling factor. The results show that the oscillation threshold conditions become more restricted for smaller laryngeal sizes, such as those appropriate for females and children.  相似文献   

18.
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.  相似文献   

19.
This paper presents experimental results that quantify the range of influence of vocal tract manipulations used in saxophone performance. The experiments utilized a measurement system that provides a relative comparison of the upstream windway and downstream air column impedances under normal playing conditions, allowing researchers and players to investigate the effect of vocal-tract manipulations in real time. Playing experiments explored vocal-tract influence over the full range of the saxophone, as well as when performing special effects such as pitch bending, multiphonics, and "bugling." The results show that, under certain conditions, players can create an upstream windway resonance that is strong enough to override the downstream system in controlling reed vibrations. This can occur when the downstream air column provides only weak support of a given note or effect, especially for notes with fundamental frequencies an octave below the air column cutoff frequency and higher. Vocal-tract influence is clearly demonstrated when pitch bending notes high in the traditional range of the alto saxophone and when playing in the saxophone's extended register. Subtle timbre variations via tongue position changes are possible for most notes in the saxophone's traditional range and can affect spectral content from at least 800-2000 Hz.  相似文献   

20.
This article describes the results of two experiments. Experiment 1 was a cross-sectional study designed to explore developmental and cross-linguistic variation in the vowel space of 10- to 18-month-old infants, exposed to either Canadian English or Canadian French. Acoustic parameters of the infant vowel space were described (specifically the mean and standard deviation of the first and second formant frequencies) and then used to derive the grave, acute, compact, and diffuse features of the vowel space across age. A decline in mean F1 with age for French-learning infants and a decline in mean F2 with age for English-learning infants was observed. A developmental expansion of the vowel space into the high-front and high-back regions was also evident. In experiment 2, the Variable Linear Articulatory Model was used to model the infant vowel space taking into consideration vocal tract size and morphology. Two simulations were performed, one with full range of movement for all articulatory paramenters, and the other for movement of jaw and lip parameters only. These simulated vowel spaces were used to aid in the interpretation of the developmental changes and cross-linguistic influences on vowel production in experiment 1.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号