首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Field studies indicate that Japanese macaque (Macaca fuscata) communication signals vary with the social situation in which they occur [S. Green, "Variation of vocal pattern with social situation in the Japanese monkey (Macaca fuscata): A field study," in Primate Behavior, edited by L. A. Rosenblum (Academic, New York, 1975), Vol. 4]. A significant acoustic property of the contact calls produced by these primates is the temporal position of a frequency peak within the vocalization, that is, an inflection from rising to falling frequency [May et al., "Significant features of Japanese macaque communication sounds: A psychophysical study," Anim. Behav. 36, 1432-1444 (1988)]. The experiments reported here are based on the hypothesis that Japanese macaques derive meaning from this temporally graded feature by parceling the acoustic variation inherent in natural contact calls into two functional categories, and thus exhibit behavior that is analogous to the categorical perception of speech sounds by humans. To test this hypothesis, Japanese macaques were trained to classify natural contact calls by performing operant responses that signified either an early or late frequency peak position. Then, the subjects were tested in a series of experiments that required them to generalize this behavior to synthetic calls representing a continuum of peak positions. Demonstration of the classical perceptual effects noted for human listeners suggests that categorical perception reflects a principle of auditory information processing that influences the perception of sounds in the communication systems not only of humans, but of animals as well.  相似文献   

2.
Although the mammalian larynx exhibits little structural variation compared to sound-producing organs in other taxa (birds or insects), there are some morphological features which could lead to significant differences in acoustic functioning, such as air sacs and vocal membranes. The vocal membrane (or "vocal lip") is a thin upward extension of the vocal fold that is present in many bat and primate species. The vocal membrane was modeled as an additional geometrical element in a two-mass model of the larynx. It was found that vocal membranes of an optimal angle and length can substantially lower the subglottal pressure at which phonation is supported, thus increasing vocal efficiency, and that this effect is most pronounced at high frequencies. The implications of this finding are discussed for animals such as bats and primates which are able to produce loud, high-pitched calls. Modeling efforts such as this provide guidance for future empirical investigations of vocal membrane structure and function, can provide insight into the mechanisms of animal communication, and could potentially lead to better understanding of human clinical disorders such as sulcus vocalis.  相似文献   

3.
4.
Speech perception studies were conducted on three cochlear implant patients to investigate the relative merits of six speech processing schemes for presenting speech information to these patients. Electrical stimuli, described in this article as synthetic vowels, were constructed using tabulated data of formant frequencies of natural vowels. The six schemes differed in the number of formant frequencies encoded on the electrical signal dimension of electrode position, and/or in the range of electrode position used for encoding each formant frequency. Eleven synthetic vowels (i, I, E, ae, a, c, U, u, v, E, D) were used and were presented in a single-interval procedure for absolute identification. Single-formant vowels were used in two of the six schemes, two-formant vowels in three schemes, and three-formant vowels in the remaining scheme. The confusion matrices were subjected to conditional information transmission analysis on the basis of previous psychophysiological findings. Comparisons among the schemes in terms of the analyzed results showed that training, experience, and adaptability to new speech processing schemes were major factors influencing the identification of synthetic vowels. For vowels containing more than one formant, the information about each formant affected the perception of the other formants. In addition, there appeared to be differences between the perceptual processes for vowels containing more than one formant and the processes for single-formant vowels. Taking into consideration the effects of training, experience, and adaptability, the three-formant speech processing scheme appeared, on the basis of perceptual performance comparisons among the six schemes, to be the logical choice for implementation in speech processors for cochlear implant patients.  相似文献   

5.
Studying female response to variation in single acoustic components has provided important insights into how sexual selection operates on male acoustic signals. However, since vocal signals are typically composed of independent components, it is important to account for possible interactions between the studied parameter and other relevant acoustic features of vocal signals. Here, two key components of the male red deer roar, the fundamental frequency and the formant frequencies (an acoustic cue to body size), are independently manipulated in order to examine female response to calls characterized by different combinations of these acoustic components. The results revealed that red deer hinds showed greater overall attention and had lower response latencies to playbacks of roars where lower formants simulated larger males. Furthermore, female response to male roars simulating different size callers was unaffected by the fundamental frequency of the male roar when it was varied within the natural range. Finally, the fundamental frequency of the male roar had no significant separate effect on any of the female behavioral response categories. Taken together these findings indicate that directional intersexual selection pressures have contributed to the evolution of the highly mobile and descended larynx of red deer stags and suggest that the fundamental frequency of the male roar does not affect female perception of size-related formant information.  相似文献   

6.
The adult male Diana monkeys (Cercopithecus diana) produce predator-specific alarm calls in response to two of their predators, the crowned eagles and the leopards. The acoustic structure of these alarm calls is remarkable for a number of theoretical and empirical reasons. First, although pulsed phonation has been described in a variety of mammalian vocalizations, very little is known about the underlying production mechanism. Second, Diana monkey alarm calls are based almost exclusively on this vocal production mechanism to an extent that has never been documented in mammalian vocal behavior. Finally, the Diana monkeys' pulsed phonation strongly resembles the pulse register in human speech, where fundamental frequency is mainly controlled by subglottal pressure. Here, we report the results of a detailed acoustic analysis to investigate the production mechanism of Diana monkey alarm calls. Within calls, we found a positive correlation between the fundamental frequency and the pulse amplitude, suggesting that both humans and monkeys control fundamental frequency by subglottal pressure. While in humans pulsed phonation is usually considered pathological or artificial, male Diana monkeys rely exclusively on pulsed phonation, suggesting a functional adaptation. Moreover, we were unable to document any nonlinear phenomena, despite the fact that they occur frequently in the vocal repertoire of humans and nonhumans, further suggesting that the very robust Diana monkey pulse production mechanism has evolved for a particular functional purpose. We discuss the implications of these findings for the structural evolution of Diana monkey alarm calls and suggest that the restricted variability in fundamental frequency and robustness of the source signal gave rise to the formant patterns observed in Diana monkey alarm calls, used to convey predator information.  相似文献   

7.
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.  相似文献   

8.
Key voice features--fundamental frequency (F0) and formant frequencies--can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry are discussed as are implications for speech origins theories, and challenges to them, related to laryngeal position and vocal tract length.  相似文献   

9.
An area function model of the vocal tract is tested for its ability to produce typical vowel formant frequencies with a perturbation at the lips. The model, which consists of a neutral shape and two weighted orthogonal shaping patterns (modes), has previously been shown to produce a nearly one-to-one mapping between formant frequencies and the weighting coefficients of the modes [Story and Titze, J. Phonetics, 26, 223-260 (1998)]. In this study, a perturbation experiment was simulated by imposing a constant area "lip tube" on the model. The mapping between the mode coefficients and formant frequencies was then recomputed with the lip tube in place and showed that formant frequencies (F1 and F2) representative of the vowels [u,o,u] could no longer be produced with the model. However, when the mode coefficients were allowed to exceed their typical bounding values, the mapping between them and the formant frequencies was expanded such that the vowels [u,o,u] were compensated. The area functions generated by these exaggerated coefficients were shown to be similar to vocal-tract shapes reported for real speakers under similar perturbed conditions [Savariaux, Perrier, and Orliaguet, J. Acoust. Soc. Am., 98, 2428-2442 (1995)]. This suggests that the structure of this particular model captures some of the human ability to configure the vocal-tract shape under both ordinary and extraordinary conditions.  相似文献   

10.
An important problem in speech perception is to determine how humans extract the perceptually invariant place of articulation information in the speech wave across variable acoustic contexts. Although analyses have been developed that attempted to classify the voiced stops /b/ versus /d/ from stimulus onset information, most of the human perceptual research to date suggests that formant transition information is more important than onset information. The purpose of the present study was to determine if animal subjects, specifically Japanese macaque monkeys, are capable of categorizing /b/ versus /d/ in synthesized consonant-vowel (CV) syllables using only formant transition information. Three monkeys were trained to differentiate CV syllables with a "go-left" versus a "go-right" label. All monkeys first learned to differentiate a /za/ versus /da/ manner contrast and easily transferred to three new vowel contexts /[symbol: see text], epsilon, I/. Next, two of the three monkeys learned to differentiate a /ba/ versus /da/ stop place contrast, but were unable to transfer it to the different vowel contexts. These results suggest that animals may not use the same mechanisms as humans do for classifying place contrasts, and call for further investigation of animal perception of formant transition information versus stimulus onset information in place contrasts.  相似文献   

11.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

12.
According to recent model investigations, vocal tract resonance is relevant to vocal registers. However, no experimental corroboration of this claim has been published so far. In the present investigation, ten professional tenors' vocal tract configurations were analyzed using MRI volumetry. All subjects produced a sustained tone on the pitch F4 (349 Hz) on the vowel /a/ (1) in modal and (2) in falsetto register. The area functions were estimated from the MRI data and their associated formant frequencies were calculated. In a second condition the same subjects repeated the same tasks in a sound treated room and their formant frequencies were estimated by means of inverse filtering. In both recordings similar formant frequencies were observed. Vocal tract shapes differed between modal and falsetto register. In modal as compared to falsetto the lip opening and the oral cavity were wider and the first formant frequency was higher. In this sense the presented results are in agreement with the claim that the formant frequencies differ between registers.  相似文献   

13.
A single female professional vocal artist and pedagogue sang examples of “twang” and neutral voice quality, which a panel of experts classified, in almost complete agreement with the singer's intentions. Subglottal pressure was measured as the oral pressure during the occlusion during the syllable /pae/. This pressure tended to be higher in “twang,” whereas the sound pressure level (SPL) was invariably higher. Voice source properties and formant frequencies were analyzed by inverse filtering. In “twang,” as compared with neutral, the closed quotient was greater, the pulse amplitude and the fundamental were weaker, and the normalized amplitude tended to be lower, whereas formants 1 and 2 were higher and 3 and 5 were lower. The formant differences, which appeared to be the main cause of the SPL differences, were more important than the source differences for the perception of “twanginess.” As resonatory effects occur independently of the voice source, the formant frequencies in “twang” may reflect a vocal strategy that is advantageous from the point of view of vocal hygiene.  相似文献   

14.
An alternative and complete derivation of the vocal tract length sensitivity function, which is an equation for finding a change in formant frequency due to perturbation of the vocal tract length [Fant, Quarterly Progress and Status Rep. No. 4, Speech Transmission Laboratory, Kungliga Teknisha Hogskolan, Stockholm, 1975, pp. 1-14] is presented. It is based on the adiabatic invariance of the vocal tract as an acoustic resonator and on the radiation pressure on the wall and at the exit of the vocal tract. An algorithm for tuning the vocal tract shape to match the formant frequencies to target values, such as those of a recorded speech signal, which was proposed in Story [J. Acoust. Soc. Am. 119, 715-718 (2006)], is extended so that the vocal tract length can also be changed. Numerical simulation of this extended algorithm shows that it can successfully convert between the vocal tract shapes of a male and a female for each of five Japanese vowels.  相似文献   

15.
A part of becoming a mature perceiver involves learning what signal properties provide relevant information about objects and events in the environment. Regarding speech perception, evidence supports the position that allocation of attention to various signal properties changes as children gain experience with their native language, and so learn what information is relevant to recognizing phonetic structure in that language. However, one weakness in that work has been that data have largely come from experiments that all use similarly designed stimuli and show similar age-related differences in labeling. In this study, two perception experiments were conducted that used stimuli designed differently from past experiments, with different predictions. In experiment 1, adults and children (4, 6, and 8 years of age) labeled stimuli with natural /f/ and /[see text]/ noises and synthetic vocalic portions that had initial formant transitions varying in appropriateness for /f/ or /[see text]/. The prediction was that similar labeling patterns would be found for all listeners. In experiment 2, adults and children labeled stimuli with initial /s/-like and /[see text]/-like noises and synthetic vocalic portions that had initial formant transitions varying in appropriateness for /s/ or /[see text]/. The prediction was that, as found before, children would weight formant transitions more and fricative noises less than adults, but that this age-related difference would elicit different patterns of labeling from those found previously. Results largely matched predictions, and so further evidence was garnered for the position that children learn which properties of the speech signal provide relevant information about phonetic structure in their native language.  相似文献   

16.
Common chimpanzee (Pan troglodytes) "pant hoots" are multi-call events that build from quiet, consistently harmonic introductory sounds to loud, screamlike "climax" calls with acoustic irregularities known as "nonlinear phenomena" (NLP). Two possible functions of NLP in climax calls are to increase direct auditory impact on listeners and to signal physical condition. These possibilities were addressed by comparing climax calls from 12 wild chimpanzee males with "screams" and pant hoot "introduction" calls from the same individuals. Climax calls that included NLP were found to have higher fundamental frequencies (F0s) than introduction or climax calls that were purely harmonic. NLP onsets within climax calls were also specifically associated with local F0 maxima, suggesting vocalizers are vibrating their vocal folds at the upper limits of stability. Furthermore, climax calls showed far fewer NLP than did screams recorded from the same individuals, while showing equivalent or higher F0 values. Overall, the results are consistent with the hypothesis that the relative prevalence of NLP is a signal of physical condition, with callers "vocalizing at the edge" of regular, stable production while producing few NLP. The results are discussed in light of the initial hypotheses.  相似文献   

17.
Three competing accounts of vowel inherent spectral change in English all agree on the importance of initial formant frequencies; however, they disagree about the nature of the perceptually relevant aspects of formant change. The onset+offset hypothesis claims that the final formant values themselves matter. The onset+slope hypothesis claims that only the rate of change counts. The onset+direction hypothesis claims that only the general direction of change in formant frequencies is important. A synthetic-vowel perception experiment was designed to differentiate among the three. Results provide support for the superiority of the onset+offset hypothesis.  相似文献   

18.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

19.
Selective adaption and anchoring effects in speech perception have generated several different hypotheses regarding the nature of contextual contrast, including auditory/phonetic feature detector fatigue, response bias, and auditory contrast. In the present study three different seven-step [hId]-[h epsilon d] continua were constructed to represent a low F0 (long vocal tract source), a high F0 (long vocal tract source), and a high F0 (short vocal tract source), respectively. Subjects identified the tokens from each of the stimulus continua under two conditions: an equiprobable control and an anchoring condition which included an endpoint stimulus from one of the three continua occurring at least three times more often than any other single stimulus. Differential contrast effects were found depending on whether the anchor differed from the test stimuli in terms of F0, absolute formant frequencies, or both. Results were inconsistent with both the feature detector fatigue and response bias hypothesis. Rather, the obtained data suggest that vowel contrast occurs on the basis of normalized formant values, thus supporting a version of the auditory-contrast theory.  相似文献   

20.
Level and Center Frequency of the Singer''s Formant   总被引:2,自引:0,他引:2  
Johan Sundberg   《Journal of voice》2001,15(2):176-186
The "singer's formant" is a prominent spectrum envelope peak near 3 kHz, typically found in voiced sounds produced by classical operatic singers. According to previous research, it is mainly a resonatory phenomenon produced by a clustering of formants 3, 4, and 5. Its level relative to the first formant peak varies depending on vowel, vocal loudness, and other factors. Its dependence on vowel formant frequencies is examined. Applying the acoustic theory of voice production, the level difference between the first and third formant is calulated for some standard vowels. The difference between observed and calculated levels is determined for various voices. It is found to vary considerably more between vowels sung by professional singers than by untrained voices. The center frequency of the singer's formant as determined from long-term spectrum analysis of commercial recordings is found to increase slightly with the pitch range of the voice classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号