首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Budgerigars were trained to produce specific vocalizations (calls) using operant conditioning and food reinforcement. The bird's call was compared to a digital representation of the call stored in a computer to determine a match. Once birds were responding at a high level of precision, we measured the effect of several manipulations upon the accuracy and the intensity of call production. Also, by differentially reinforcing other aspects of vocal behavior, budgerigars were trained to produce a call that matched another bird's contact call and to alter the latency of their vocal response. Both the accuracy of vocal matching and the intensity level of vocal production increased significantly when the bird could hear the template immediately before each trial. Moreover, manipulating the delay between the presentation of an acoustic reference and the onset of vocal production did not significantly affect either vocal intensity or matching accuracy. Interestingly, the vocalizations learned and reinforced in these operant experiments were only occasionally used in more natural communicative situations, such as when birds called back and forth to one another in their home cages.  相似文献   

2.
Bouts of vocalizations given by seven red deer stags were recorded over the rutting period, and homomorphic analysis and hidden Markov models (two techniques typically used for the automatic recognition of human speech utterances) were used to investigate whether the spectral envelope of the calls was individually distinctive. Bouts of common roars (the most common call type) were highly individually distinctive, with an average recognition percentage of 93.5%. A "temporal" split-sample approach indicated that although in most individuals these identity cues held over the rutting period, the ability of the models trained with the bouts of roars recorded early in the rut to correctly classify later vocalizations decreased as the recording date increased. When Markov models trained using the bouts of common roars were used to classify other call types according to their individual membership, the classification results indicated that the cues to identity contained in the common roars were also present in the other call types. This is the first demonstration in mammals other than primates that individuals have vocal cues to identity that are common to the different call types that compose their vocal repertoire.  相似文献   

3.
The ability of subjects to identify vowels in vibrotactile transformations of consonant-vowel syllables was measured for two types of displays: a spectral display (frequency by intensity), and a vocal tract area function display (vocal tract location by cross-sectional area). Both displays were presented to the fingertip via the tactile display of the Optacon transducer. In the first experiments the spectral display was effective for identifying vowels in /b/V/ context when as many as 24 or as few as eight spectral channels were presented to the skin. However, performance fell when the 12- and 8-channel displays were reduced in size to occupy 1/2 or 1/3 of the 24-row tactile matrix. The effect of reducing the size of the display was greater when the spectrum was represented as a solid histogram ("filled" patterns) than when it was represented as a simple spectral contour ("unfilled" patterns). Spatial masking within the filled pattern was postulated as the cause for this decline in performance. Another experiment measured the utility of the spectral display when the syllables were produced by multiple speakers. The resulting increase in response confusions was primarily attributable to variations in the tactile patterns caused by differences in vocal tract resonances among the speakers. The final experiment found an area function display to be inferior to the spectral display for identification of vowels. The results demonstrate that a two-dimensional spectral display is worthy of further development as a basic vibrotactile display for speech.  相似文献   

4.
Many vocalizations produced by Weddell seals (Leptonychotes weddellii) are made up of repeated individual distinct sounds (elements). Patterning of multiple element calls was examined during the breeding season at Casey and Davis, Antarctica. Element and interval durations were measured from 405 calls all > 3 elements in length. The duration of the calls (22+/-16.6 s) did not seem to vary with an increasing number of elements (F4,404=1.83,p = 0.122) because element and interval durations decreased as the number of elements within a call increased. Underwater vocalizations showed seven distinct timing patterns of increasing, decreasing, or constant element and interval durations throughout the calls. One call type occurred with six rhythm patterns, although the majority exhibited only two rhythms. Some call types also displayed steady frequency changes as they progressed. Weddell seal multiple element calls are rhythmically repeated and thus the durations of the elements and intervals within a call occur in a regular manner. Rhythmical repetition used during vocal communication likely enhances the probability of a call being detected and has important implications for the extent to which the seals can successfully transmit information over long distances and during times of high level background noise.  相似文献   

5.
Feline isolation calls were analyzed, and a model was developed to relate the acoustical features of these calls to the physical processes used in their production. Fifty isolation calls were recorded from each of five cats for a total sample of 250 vocalizations. By combinations of Fourier transform, autocorrelation, and linear prediction methods, the fundamental frequency (glottal-pulse period) F0, the energy of F0, the frequency having maximum energy Fmax (not always F0), and the energy at this frequency were computed. Mean F0 ranged from 400-600 Hz for individual cats. For some cats F0 was consistent within calls, but for other cats sudden shifts in F0 occurred within calls. Here, Fmax was almost a harmonic of F0 and generally ranged from 1-2 kHz. For individual cats, the energy ratio E = (energy of Fmax/energy of F0) varied from 1 to 60 and the grand average E over the time course of the call varied from about 12 to 38. The mean rms call intensity was an inverted-U function of time. Measured jaw opening was strongly correlated with acoustical features of call. A Bessel-horn model with time-varying flare gave a good account of acoustical parameters such as Fmax. The presence of formantlike resonances in cat vocalizations and the important role of jaw movements (vocal gestures) in the production of these calls suggest that cats may provide a useful model for some aspects of human vocal behavior.  相似文献   

6.
Although both perceived vocal effort and intensity are known to influence the perceived distance of speech, little is known about the processes listeners use to integrate these two parameters into a single estimate of talker distance. In this series of experiments, listeners judged the distances of prerecorded speech samples presented over headphones in a large open field. In the first experiment, virtual synthesis techniques were used to simulate speech signals produced by a live talker at distances ranging from 0.25 to 64 m. In the second experiment, listeners judged the apparent distances of speech stimuli produced over a 60-dB range of different vocal effort levels (production levels) and presented over a 34-dB range of different intensities (presentation levels). In the third experiment, the listeners judged the distances of time-reversed speech samples. The results indicate that production level and presentation level influence distance perception differently for each of three distinct categories of speech. When the stimulus was high-level voiced speech (produced above 66 dB SPL 1 m from the talker's mouth), the distance judgments doubled with each 8-dB increase in production level and each 12-dB decrease in presentation level. When the stimulus was low-level voiced speech (produced at or below 66 dB SPL at 1 m), the distance judgments doubled with each 15-dB increase in production level but were relatively insensitive to changes in presentation level at all but the highest intensity levels tested. When the stimulus was whispered speech, the distance judgments were unaffected by changes in production level and only decreased with increasing presentation level when the intensity of the stimulus exceeded 66 dB SPL. The distance judgments obtained in these experiments were consistent across a range of different talkers, listeners, and utterances, suggesting that voice-based distance cueing could provide a robust way to control the apparent distances of speech sounds in virtual audio displays.  相似文献   

7.
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.  相似文献   

8.
We provide a direct demonstration that nonhuman primates spontaneously perceive changes in formant frequencies in their own species-typical vocalizations, without training or reinforcement. Formants are vocal tract resonances leading to distinctive spectral prominences in the vocal signal, and provide the acoustic determinant of many key phonetic distinctions in human languages. We developed algorithms for manipulating formants in rhesus macaque calls. Using the resulting computer-manipulated calls in a habituation/dishabituation paradigm, with blind video scoring, we show that rhesus macaques spontaneously respond to a change in formant frequencies within the normal macaque vocal range. Lack of dishabituation to a "synthetic replica" signal demonstrates that dishabituation was not due to an artificial quality of synthetic calls, but to the formant shift itself. These results indicate that formant perception, a significant component of human voice and speech perception, is a perceptual ability shared with other primates.  相似文献   

9.
Vocal communication within and between groups of individuals has been described extensively in birds and terrestrial mammals, however, little is known about how cetaceans utilize their sounds in their natural environment. Resident killer whales, Orcinus orca, live in highly stable matrilines and exhibit group-specific vocal dialects. Single call types cannot exclusively be associated with particular behaviors and calls are thought to function in group identification and intragroup communication. In the present study call usage of three closely related matrilines of the Northern resident community was compared in various intra- and intergroup contexts. In two out of the three matrilines significant changes in vocal behavior depending both on the presence and identity of accompanying whales were found. Most evidently, family-specific call subtypes, as well as aberrant and variable calls, were emitted at higher rates, whereas "low arousal" call types were used less in the presence of matrilines from different pods, subclans, or clans. Ways in which the observed changes may function both in intra- and intergroup communication.  相似文献   

10.
Although listeners routinely perceive both the sex and individual identity of talkers from their speech, explanations of these abilities are incomplete. Here, variation in vocal production-related anatomy was assumed to affect vowel acoustics thought to be critical for indexical cueing. Integrating this approach with source-filter theory, patterns of acoustic parameters that should represent sex and identity were identified. Due to sexual dimorphism, the combination of fundamental frequency (F0, reflecting larynx size) and vocal tract length cues (VTL, reflecting body size) was predicted to provide the strongest acoustic correlates of talker sex. Acoustic measures associated with presumed variations in supralaryngeal vocal tract-related anatomy occurring within sex were expected to be prominent in individual talker identity. These predictions were supported by results of analyses of 2500 tokens of the /epsilon/ phoneme, extracted from the naturally produced speech of 125 subjects. Classification by talker sex was virtually perfect when F0 and VTL were used together, whereas talker classification depended primarily on the various acoustic parameters associated with vocal-tract filtering.  相似文献   

11.
In the king penguin, Aptenodytes patagonicus, incubation and brooding duties are undertaken alternately by both partners of a pair. Birds returning from foraging at sea find their mate in the crowded colony using acoustic signals. Acoustic recognition of the mate maintains and strengthens the mate's fidelity and favors synchronization in the different stages of reproduction. In this study it was found that the king penguin vocalizes in response to the mate's playback calls, but not to those of neighbors or unfamiliar conspecific individuals. To study individual features used by the birds for individual recognition of mates, various experimental signals consisting of synthesized modifications of the mate' s call were played back to the incubating bird. Results indicated that birds attend to the FM profile of the call, in particular its initial inflexion. The frequency modulation shape of the syllable can be assimilated to a vocal signature repeated though the different syllables of the call. King penguins pay little attention to the call' s AM envelope or its absolute frequency.  相似文献   

12.
Analysis of pain-related vocalization in young pigs   总被引:1,自引:0,他引:1  
The assessment of pain constitutes a major issue for animal welfare research. The objective of this study was to classify vocalizations during castration pain and to assess alterations in vocalizations under local anaesthesia. The alterations in vocalization were measured by multiparametric call analysis. A total of 4537 calls of 70 young pigs were evaluated. With the data of this study three call types are distinguishable (grunt, squeal, scream). A high percentage (94.64%) of calls that could be classified in one of the three call types during the castration process within the confidence level of 95% was found. The comparison of the occurrence of the call types during treatments gives evidence for pain-related use of screams. The piglets castrated without local anaesthesia produced almost double the number of screams as piglets castrated with anaesthesia. The comparison of the recorded sound parameters reveals the particular position of screams in the call repertoire of young pigs. Screams are significantly different in their sound parameters than grunts or squeals. Castration in comparison to mere restraint produced a comprehensive change in sound parameters, with castration calls becoming more extended and more powerful. The findings in this study also show differences in the effectiveness of the parameters which indicate pain. Parameters that describe a single event in a call, such as peak level or peak frequency give better results than parameters that describe an average, such as weighted frequency and main frequency. The research indicated that pain-related changes of calls in piglets can be identified. On the basis of the results, automatic classification of call types during management operations may be developed. This could contribute to objective animal welfare assessment.  相似文献   

13.
Vervet monkeys routinely produce semantic alarm calls upon detection of various predators encountered in their natural environment. Two of these calls, snake and eagle alarms, were analyzed using digital signal processing techniques in order to identify potentially distinctive acoustic cues. Distinctive cues were sought in the periodicity of the source waveform associated with each call type, the probable vocal tract filtering functions, and in temporal patterning. Results were equivocal with respect to source periodicity, but a variety of distinguishing features were found in both supralaryngeal filtering and timing. These data provide a basis for psychoacoustic perceptual testing with vervets as subjects.  相似文献   

14.
Acoustic signals provide a basis for social recognition in a wide range of animals. Few studies, however, have attempted to relate the patterns of individual variation in signals to behavioral discrimination thresholds used by receivers to discriminate among individuals. North American bullfrogs (Rana catesbeiana) discriminate among familiar and unfamiliar individuals based on individual variation in advertisement calls. The sources, patterns, and magnitudes of variation in eight acoustic properties of multiple-note advertisement calls were examined to understand how patterns of within-individual variation might either constrain, or provide additional cues for, vocal recognition. Six of eight acoustic properties exhibited significant note-to-note variation within multiple-note calls. Despite this source of within-individual variation, all call properties varied significantly among individuals, and multivariate analyses indicated that call notes were individually distinct. Fine-temporal and spectral call properties exhibited less within-individual variation compared to gross-temporal properties and contributed most toward statistically distinguishing among individuals. Among-individual differences in the patterns of within-individual variation in some properties suggest that within-individual variation could also function as a recognition cue. The distributions of among-individual and within-individual differences were used to generate hypotheses about the expected behavioral discrimination thresholds of receivers.  相似文献   

15.
In this study we have simultaneously measured subglottic air pressure, airflow, and vocal intensity during speech in nine healthy subjects. Subglottic air pressure was measured directly by puncture of the cricothyroid membrane. The results show that the interaction between these aerodynamic properties is much more complex that previously believed. Certain trends were seen in most individuals, such as an increase in vocal intensity with increased subglottic air pressure. However, there was considerable variability in the overall aerodynamic properties between subjects and at different frequency and intensity ranges. At certain frequencies several subjects were able to generate significantly louder voices without a comparable increase in subglottic air pressure. We hypothesize that these increases in vocal efficiency are due to changes in vocal fold vibration properties. The relationship between fundamental frequency and subglottic pressure was also noted to vary depending on vocal intensity. Possible mechanisms for these behaviors are discussed.  相似文献   

16.
17.
To date very little is still known about the acoustic behavior of Norwegian killer whales, in particular that of individual whales. In this study a unique opportunity was presented to document the sounds produced by five captured killer whales in the Vestfjord area, northern Norway. Individuals produced 14 discrete and 7 compound calls. Two call types were used both by individuals 16178 and 23365 suggesting that they may belong to the same pod. Comparisons with calls documented in Strager (1993) showed that none of the call types used by the captured individuals were present. The lack of these calls in the available literature suggests that call variability within individuals is likely to be large. This short note adds to our knowledge of the vocal repertoire of this population and demonstrates the need for further studies to provide behavioural context to these sounds.  相似文献   

18.

Background

Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.

Results

We found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.

Conclusion

These results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.  相似文献   

19.
A technique to synthesize laughter based on time-domain behavior of real instances of human laughter is presented. In the speech synthesis community, interest in improving the expressive quality of synthetic speech has grown considerably. While the focus has been on the linguistic aspects, such as precise control of speech intonation to achieve desired expressiveness, inclusion of nonlinguistic cues could further enhance the expressive quality of synthetic speech. Laughter is one such cue used for communicating, say, a happy or amusing context. It can be generated in many varieties and qualities: from a short exhalation to a long full-blown episode. Laughter is modeled at two levels, the overall episode level and at the local call level. The first attempts to capture the overall temporal behavior in a parametric model based on the equations that govern the simple harmonic motion of a mass-spring system is presented. By changing a set of easily available parameters, the authors are able to synthesize a variety of laughter. At the call level, the authors relied on a standard linear prediction based analysis-synthesis model. Results of subjective tests to assess the acceptability and naturalness of the synthetic laughter relative to real human laughter samples are presented.  相似文献   

20.
Accurate parameter estimates relevant to the vocal behavior of marine mammals are needed to assess potential effects of anthropogenic sound exposure including how masking noise reduces the active space of sounds used for communication. Information about how these animals modify their vocal behavior in response to noise exposure is also needed for such assessment. Prior studies have reported variations in the source levels of killer whale sounds, and a more recent study reported that killer whales compensate for vessel masking noise by increasing their call amplitude. The objectives of the current study were to investigate the source levels of a variety of call types in southern resident killer whales while also considering background noise level as a likely factor related to call source level variability. The source levels of 763 discrete calls along with corresponding background noise were measured over three summer field seasons in the waters surrounding the San Juan Islands, WA. Both noise level and call type were significant factors on call source levels (1-40 kHz band, range of 135.0-175.7 dB(rms) re 1 [micro sign]Pa at 1 m). These factors should be considered in models that predict how anthropogenic masking noise reduces vocal communication space in marine mammals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号