首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Two auditory feedback perturbation experiments were conducted to examine the nature of control of the first two formants in vowels. In the first experiment, talkers heard their auditory feedback with either F1 or F2 shifted in frequency. Talkers altered production of the perturbed formant by changing its frequency in the opposite direction to the perturbation but did not produce a correlated alteration of the unperturbed formant. Thus, the motor control system is capable of fine-grained independent control of F1 and F2. In the second experiment, a large meta-analysis was conducted on data from talkers who received feedback where both F1 and F2 had been perturbed. A moderate correlation was found between individual compensations in F1 and F2 suggesting that the control of F1 and F2 is processed in a common manner at some level. While a wide range of individual compensation magnitudes were observed, no significant correlations were found between individuals' compensations and vowel space differences. Similarly, no significant correlations were found between individuals' compensations and variability in normal vowel production. Further, when receiving normal auditory feedback, most of the population exhibited no significant correlation between the natural variation in production of F1 and F2.  相似文献   

2.
Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments.  相似文献   

3.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

4.
The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3-187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7-year-old boys and girls: duration of vowels and consonants, pausing and occurrence of creaky voice, mean and range of F0, certain formant frequencies (F1 in [a] and F3), sound-pressure level (SPL) of voiced segments and [s], and spectral emphasis. In addition to levels and emphasis, vowel duration, F0, and F1 were substantially affected. "Vocal effort" was defined as the communication distance estimated by a group of listeners for each utterance. Most of the observed effects correlated better with this measure than with the actual distance, since some additional factors affected the speakers' choice. Differences between speaker groups emerged in segment durations, pausing behavior, and in the extent to which the SPL of [s] was affected. The whispered versions are compared with the phonated versions produced by the same speakers at the same distance. Several effects of whispering are found to be similar to those of increasing vocal effort.  相似文献   

5.
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).  相似文献   

6.
Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F0-dependent distortion. The problem is severe at high F0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F0-independency observed in vowel identification.  相似文献   

7.
The effects of variations in vocal effort corresponding to common conversation situations on spectral properties of vowels were investigated. A database in which three degrees of vocal effort were suggested to the speakers by varying the distance to their interlocutor in three steps (close--0.4 m, normal--1.5 m, and far--6 m) was recorded. The speech materials consisted of isolated French vowels, uttered by ten naive speakers in a quiet furnished room. Manual measurements of fundamental frequency F0, frequencies, and amplitudes of the first three formants (F1, F2, F3, A1, A2, and A3), and on total amplitude were carried out. The speech materials were perceptually validated in three respects: identity of the vowel, gender of the speaker, and vocal effort. Results indicated that the speech materials were appropriate for the study. Acoustic analysis showed that F0 and F1 were highly correlated with vocal effort and varied at rates close to 5 Hz/dB for F0 and 3.5 Hz/dB for F1. Statistically F2 and F3 did not vary significantly with vocal effort. Formant amplitudes A1, A2, and A3 increased significantly; The amplitudes in the high-frequency range increased more than those in the lower part of the spectrum, revealing a change in spectral tilt. On the average, when the overall amplitude is increased by 10 dB, A1, A2, and A3 are increased by 11, 12.4, and 13 dB, respectively. Using "auditory" dimensions, such as the F1-F0 difference, and a "spectral center of gravity" between adjacent formants for representing vowel features did not reveal a better constancy of these parameters with respect to the variations of vocal effort and speaker. Thus a global view is evoked, in which all of the aspects of the signal should be processed simultaneously.  相似文献   

8.
Although listeners routinely perceive both the sex and individual identity of talkers from their speech, explanations of these abilities are incomplete. Here, variation in vocal production-related anatomy was assumed to affect vowel acoustics thought to be critical for indexical cueing. Integrating this approach with source-filter theory, patterns of acoustic parameters that should represent sex and identity were identified. Due to sexual dimorphism, the combination of fundamental frequency (F0, reflecting larynx size) and vocal tract length cues (VTL, reflecting body size) was predicted to provide the strongest acoustic correlates of talker sex. Acoustic measures associated with presumed variations in supralaryngeal vocal tract-related anatomy occurring within sex were expected to be prominent in individual talker identity. These predictions were supported by results of analyses of 2500 tokens of the /epsilon/ phoneme, extracted from the naturally produced speech of 125 subjects. Classification by talker sex was virtually perfect when F0 and VTL were used together, whereas talker classification depended primarily on the various acoustic parameters associated with vocal-tract filtering.  相似文献   

9.

Background  

The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain.  相似文献   

10.
Thresholds of vowel formant discrimination for F1 and F2 of isolated vowels with full and partial vowel spectra were measured for normal-hearing listeners at fixed and roving speech levels. Performance of formant discrimination was significantly better for fixed levels than for roving levels with both full and partial spectra. The effect of vowel spectral range was present only for roving levels, but not for fixed levels. These results, consistent with studies of profile analysis, indicated different perceptual mechanisms for listeners to discriminate vowel formant frequency at fixed and roving levels.  相似文献   

11.
Static, dynamic, and relational properties in vowel perception   总被引:2,自引:0,他引:2  
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.  相似文献   

12.
The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.  相似文献   

13.
This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary.  相似文献   

14.
The effect of speaking rate variations on second formant (F2) trajectories was investigated for a continuum of rates. F2 trajectories for the schwa preceding a voiced bilabial stop, and one of three target vocalic nuclei following the stop, were generated for utterances of the form "Put a bV here, where V was /i/,/ae/ or /oI/. Discrete spectral measures at the vowel-consonant and consonant-vowel interfaces, as well as vowel target values, were examined as potential parameters of rate variation; several different whole-trajectory analyses were also explored. Results suggested that a discrete measure at the vowel consonant (schwa-consonant) interface, the F2off value, was in many cases a good index of rate variation, provided the rates were not unusually slow (vowel durations less than 200 ms). The relationship of the spectral measure at the consonant-vowel interface, F2 onset, as well as that of the "target" for this vowel, was less clearly related to rate variation. Whole-trajectory analyses indicated that the rate effect cannot be captured by linear compressions and expansions of some prototype trajectory. Moreover, the effect of rate manipulation on formant trajectories interacts with speaker and vocalic nucleus type, making it difficult to specify general rules for these effects. However, there is evidence that a small number of speaker strategies may emerge from a careful qualitative and quantitative analysis of whole formant trajectories. Results are discussed in terms of models of speech production and a group of speech disorders that is usually associated with anomalies of speaking rate, and hence of formant frequency trajectories.  相似文献   

15.
Peta White   《Journal of voice》1999,13(4):570-582
High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children's spoken or sung vowel formants. Twenty-nine 11-year-old Swedish children were asked to produce 4 sustained spoken and sung vowels. In order to circumvent the problem of wide harmonic spacing, F1 and F2 measurements were taken from vowels produced with a sweeping F0. Experienced choir singers were selected as subjects in order to minimize the larynx height adjustments associated with pitch variation in less skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in girls than in boys suggesting longer vocal tracts in these preadolescent boys. Furthermore, formant scaling demonstrated vowel dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.  相似文献   

16.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.  相似文献   

17.
Previous studies suggest that speakers are systematically inaccurate, or biased, when imitating self-produced vowels. The direction of these biases in formant space and their variation may offer clues about the organization of the vowel perceptual space. To examine these patterns, three male speakers were asked to imitate 45 self-produced vowels that were systematically distributed in F1/F2 space. All three speakers showed imitation bias, and the bias magnitudes were significantly larger than those predicted by a model of articulatory noise. Each speaker showed a different pattern of bias directions, but the pattern was unrelated to the locations of prototypical vowels produced by that speaker. However, there were substantial quantitative regularities: (1) The distribution of imitation variability and bias magnitudes were similar for all speakers, (2) the imitation variability was independent of the bias magnitudes, and (3) the imitation variability (a production measure) was commensurate with the formant discrimination limen (a perceptual measure). These results indicate that there is additive Gaussian noise in the imitation process that independently affects each formant and that there are speaker-dependent and potentially nonlinguistic biases in vowel perception and production.  相似文献   

18.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

19.
The first three formant frequencies for 778 steady-state tokens of 30 nonretroflex vowel types uttered by a female speaker are found to lie close to a piecewise-planar surface (expressed numerically as 0.634F1 +0.603F2 -- 0.485F3 -- 366 = 0, for F2 greater than 0.027F1 +1692 and 0.686F1 -- 0.528F2 -- 0.501F3 +1569 = 0, otherwise). The rms distance of the vowels from this surface is only 86 Hz. The intersection between the two planes is a line of nearly constant F2, corresponding closely to the F2 of a uniform vocal tract of the same length as our speaker's. The piecewise-planar representation also suggests a way to test the hypotheses of uniform and nonuniform formant-frequency scaling between speakers.  相似文献   

20.
Selective adaption and anchoring effects in speech perception have generated several different hypotheses regarding the nature of contextual contrast, including auditory/phonetic feature detector fatigue, response bias, and auditory contrast. In the present study three different seven-step [hId]-[h epsilon d] continua were constructed to represent a low F0 (long vocal tract source), a high F0 (long vocal tract source), and a high F0 (short vocal tract source), respectively. Subjects identified the tokens from each of the stimulus continua under two conditions: an equiprobable control and an anchoring condition which included an endpoint stimulus from one of the three continua occurring at least three times more often than any other single stimulus. Differential contrast effects were found depending on whether the anchor differed from the test stimuli in terms of F0, absolute formant frequencies, or both. Results were inconsistent with both the feature detector fatigue and response bias hypothesis. Rather, the obtained data suggest that vowel contrast occurs on the basis of normalized formant values, thus supporting a version of the auditory-contrast theory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号