首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Several experiments are described in which synthetic monophthongs from series varying between /i/ and /u/ are presented following filtered precursors. In addition to F(2), target stimuli vary in spectral tilt by applying a filter that either raises or lowers the amplitudes of higher formants. Previous studies have shown that both of these spectral properties contribute to identification of these stimuli in isolation. However, in the present experiments we show that when a precursor sentence is processed by the same filter used to adjust spectral tilt in the target stimulus, listeners identify synthetic vowels on the basis of F(2) alone. Conversely, when the precursor sentence is processed by a single-pole filter with center frequency and bandwidth identical to that of the F(2) peak of the following vowel, listeners identify synthetic vowels on the basis of spectral tilt alone. These results show that listeners ignore spectral details that are unchanged in the acoustic context. Instead of identifying vowels on the basis of incorrect acoustic information, however (e.g., all vowels are heard as /i/ when second formant is perceptually ignored), listeners discriminate the vowel stimuli on the basis of the more informative spectral property.  相似文献   

2.
Three studies demonstrate listeners' ability to use the rate of a sound's frequency change (velocity) to predict how the spectral path of the sound is likely to evolve, even in the event of an occlusion. Experiments 1 and 2 use a modified probe-signal method to measure attentional filters and demonstrate increased detection to sounds falling along implied paths of constant-linear velocity. Experiment 3 shows listeners perceive a suprathreshold tone as falling along a trajectory of constant velocity when the frequency is near to the region of greatest detection as measured in Experiments 1 and 2. Further, results show greater accuracy and decreased bias in the use of velocity information with increased exposure to a constant-velocity sound. As the duration of occlusion lengthens, results also show a downward shift (relative to a trajectory of constant velocity) in the frequency at which listeners' detection and experience of a continuous trajectory are greatest. A preliminary model of velocity processing is proposed to account for this downward shift. Results show listeners' use of velocity in extrapolating sounds with dynamically changing spectral and temporal properties and provide evidence for its role in perceptual auditory continuity within a noisy acoustic environment.  相似文献   

3.
Two experiments establish constraints on the ability of a common fundamental frequency (F0) to perceptually fuse low-pass filtered and complementary high-pass filtered speech presented to different ears. In experiment 1 the filter cut-off is set at 1 kHz. When the filters are sharp, giving little overlap in frequency between the two sounds, listeners report hearing two sounds even when the sounds at the two ears are on the same F0. Shallower filters give more fusion. In experiment 2, the filters' cut-off frequency is varied together with their slope. Fusion becomes more frequent when the signals at the two ears share low-frequency components. This constraint mirrors the natural filtering by head-shadow of sound sources presented to one side. The mechanisms underlying perceptual fusion may thus be similar to those underlying auditory localization.  相似文献   

4.
The perceptual fusion of harmonics is often assumed to result from the operation of a template mechanism that is also responsible for computing global pitch. This dual-role hypothesis was tested using frequency-shifted complexes. These sounds are inharmonic, but preserve a regular pattern of equal component spacing. The stimuli had a nominal fundamental (F0) frequency of 200 Hz (+/- 20%), and were frequency shifted either by 25.0% or 37.5% of F0. Three consecutive components (6-8) were removed and replaced with a sinusoidal probe, located at one of a set of positions spanning the gap. On any trial, subjects heard a complex tone followed by an adjustable pure tone in a continuous loop. Subjects were well able to match the pitch of the probe unless it corresponded with a position predicted by the spectral pattern of the complex. Peripheral factors could not account for this finding. In contrast, hit rates were not depressed for probes positioned at integer multiples of the F0(s) corresponding to the global pitch(es) of the complex, predicted from previous data [Patterson, J. Acoust. Soc. Am. 53, 1565-1572 (1973)]. These findings suggest that separate central mechanisms are responsible for computing global pitch and for the perceptual grouping of partials.  相似文献   

5.
The spectral envelope is a major determinant of the perceptual identity of many classes of sound including speech. When sounds are transmitted from the source to the listener, the spectral envelope is invariably and diversely distorted, by factors such as room reverberation. Perceptual compensation for spectral-envelope distortion was investigated here. Carrier sounds were distorted by spectral envelope difference filters whose frequency response is the spectral envelope of one vowel minus the spectral envelope of another. The filter /I/ minus /e/ and its inverse were used. Subjects identified a test sound that followed the carrier. The test sound was drawn from an /Itch/ to /etch/ continuum. Perceptual compensation produces a phoneme boundary difference between /I/ minus /e/ and its inverse. Carriers were the phrase "the next word is" spoken by the same (male) speaker as the test sounds, signal-correlated noise derived from this phrase, the same phrase spoken by a female speaker, male and female versions played backwards, and a repeated end-point vowel. The carrier and test were presented to the same ear, to different ears, and from different apparent directions (by varying interaural time delay). The results show that compensation is unlike peripheral phenomena, such as adaptation, and unlike phonetic perceptual phenomena. The evidence favors a central, auditory mechanism.  相似文献   

6.
The spectral properties of a complex stimulus (rippled noise) were varied over time, and listeners were asked to discriminate between this stimulus and a flat-spectrum, stationary noise. The spacing between the spectral peaks of rippled noise was changed sinusoidally as a function of time, or the location of the spectral peaks of rippled noise was moved up and down the spectrum as a sinusoidal function of time. In most conditions, listeners were able to make the discriminations up to rates of temporal modulation of 5-10 cycles per second. Beyond 5-10 cps the rippled noise with the temporally varying peaks was indiscriminable from a flat (nonrippled) noise. The results suggest that for temporal changes in the spectral peaks of rippled noise, listeners cannot monitor the output of a single (or small number of) auditory channel(s) (critical bands), or that the mechanism used to extract the perceptual information from these stimuli is slow. Temporal variations in the spectral properties of rippled noise may relate to temporal changes in the repetition pitch of complex sounds, the temporal properties of the coloration added to sound in a reverberant environment, and the nature of spectral peak changes such as those that occur in speech-formant transitions. The results are relevant to the general issue of the auditory system's ability to extract information from a complex spectral profile.  相似文献   

7.
Echolocating bats transmit ultrasonic vocalizations and use information contained in the reflected sounds to analyze the auditory scene. Auditory scene analysis, a phenomenon that applies broadly to all hearing vertebrates, involves the grouping and segregation of sounds to perceptually organize information about auditory objects. The perceptual organization of sound is influenced by the spectral and temporal characteristics of acoustic signals. In the case of the echolocating bat, its active control over the timing, duration, intensity, and bandwidth of sonar transmissions directly impacts its perception of the auditory objects that comprise the scene. Here, data are presented from perceptual experiments, laboratory insect capture studies, and field recordings of sonar behavior of different bat species, to illustrate principles of importance to auditory scene analysis by echolocation in bats. In the perceptual experiments, FM bats (Eptesicus fuscus) learned to discriminate between systematic and random delay sequences in echo playback sets. The results of these experiments demonstrate that the FM bat can assemble information about echo delay changes over time, a requirement for the analysis of a dynamic auditory scene. Laboratory insect capture experiments examined the vocal production patterns of flying E. fuscus taking tethered insects in a large room. In each trial, the bats consistently produced echolocation signal groups with a relatively stable repetition rate (within 5%). Similar temporal patterning of sonar vocalizations was also observed in the field recordings from E. fuscus, thus suggesting the importance of temporal control of vocal production for perceptually guided behavior. It is hypothesized that a stable sonar signal production rate facilitates the perceptual organization of echoes arriving from objects at different directions and distances as the bat flies through a dynamic auditory scene. Field recordings of E. fuscus, Noctilio albiventris, N. leporinus, Pippistrellus pippistrellus, and Cormura brevirostris revealed that spectral adjustments in sonar signals may also be important to permit tracking of echoes in a complex auditory scene.  相似文献   

8.
The effects of variations in vocal effort corresponding to common conversation situations on spectral properties of vowels were investigated. A database in which three degrees of vocal effort were suggested to the speakers by varying the distance to their interlocutor in three steps (close--0.4 m, normal--1.5 m, and far--6 m) was recorded. The speech materials consisted of isolated French vowels, uttered by ten naive speakers in a quiet furnished room. Manual measurements of fundamental frequency F0, frequencies, and amplitudes of the first three formants (F1, F2, F3, A1, A2, and A3), and on total amplitude were carried out. The speech materials were perceptually validated in three respects: identity of the vowel, gender of the speaker, and vocal effort. Results indicated that the speech materials were appropriate for the study. Acoustic analysis showed that F0 and F1 were highly correlated with vocal effort and varied at rates close to 5 Hz/dB for F0 and 3.5 Hz/dB for F1. Statistically F2 and F3 did not vary significantly with vocal effort. Formant amplitudes A1, A2, and A3 increased significantly; The amplitudes in the high-frequency range increased more than those in the lower part of the spectrum, revealing a change in spectral tilt. On the average, when the overall amplitude is increased by 10 dB, A1, A2, and A3 are increased by 11, 12.4, and 13 dB, respectively. Using "auditory" dimensions, such as the F1-F0 difference, and a "spectral center of gravity" between adjacent formants for representing vowel features did not reveal a better constancy of these parameters with respect to the variations of vocal effort and speaker. Thus a global view is evoked, in which all of the aspects of the signal should be processed simultaneously.  相似文献   

9.
10.

Background

Recent studies have shown that the human right-hemispheric auditory cortex is particularly sensitive to reduction in sound quality, with an increase in distortion resulting in an amplification of the auditory N1m response measured in the magnetoencephalography (MEG). Here, we examined whether this sensitivity is specific to the processing of acoustic properties of speech or whether it can be observed also in the processing of sounds with a simple spectral structure. We degraded speech stimuli (vowel /a/), complex non-speech stimuli (a composite of five sinusoidals), and sinusoidal tones by decreasing the amplitude resolution of the signal waveform. The amplitude resolution was impoverished by reducing the number of bits to represent the signal samples. Auditory evoked magnetic fields (AEFs) were measured in the left and right hemisphere of sixteen healthy subjects.

Results

We found that the AEF amplitudes increased significantly with stimulus distortion for all stimulus types, which indicates that the right-hemispheric N1m sensitivity is not related exclusively to degradation of acoustic properties of speech. In addition, the P1m and P2m responses were amplified with increasing distortion similarly in both hemispheres. The AEF latencies were not systematically affected by the distortion.

Conclusions

We propose that the increased activity of AEFs reflects cortical processing of acoustic properties common to both speech and non-speech stimuli. More specifically, the enhancement is most likely caused by spectral changes brought about by the decrease of amplitude resolution, in particular the introduction of periodic, signal-dependent distortion to the original sound. Converging evidence suggests that the observed AEF amplification could reflect cortical sensitivity to periodic sounds.  相似文献   

11.
The four experiments reported here measure listeners' accuracy and consistency in adjusting a formant frequency of one- or two-formant complex sounds to match the timbre of a target sound. By presenting the target and the adjustable sound on different fundamental frequencies, listeners are prevented from performing the task by comparing the absolute or relative levels of resolved spectral components. Experiment 1 uses two-formant vowellike sounds. When the two sounds have the same F0, the variability of matches (within-subject standard deviation) for either the first or the second formant is around 1%-3%, which is comparable to existing data on formant frequency discrimination thresholds. With a difference in F0, variability increases to around 8% for first-formant matches, but to only about 4% for second-formant matches. Experiment 2 uses sounds with a single formant at 1100 or 1200 Hz with both sounds on either low or high fundamental frequencies. The increase in variability produced by a difference in F0 is greater for high F0's (where the harmonics close to the formant peak are resolved) than it is for low F0's (where they are unresolved). Listeners also showed systematic errors in their mean matches to sounds with different high F0's. The direction of the systematic errors was towards the most intense harmonic. Experiments 3 and 4 showed that introduction of a vibratolike frequency modulation (FM) on F0 reduces the variability of matches, but does not reduce the systematic error. The experiments demonstrate, for the specific frequencies and FM used, that there is a perceptual cost to interpolating a spectral envelope across resolved harmonics.  相似文献   

12.
Perceptual linear predictive (PLP) analysis of speech   总被引:31,自引:0,他引:31  
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.  相似文献   

13.
The contribution of extraneous sounds to the perceptual estimation of the first-formant (F1) frequency of voiced vowels was investigated using a continuum of vowels perceived as changing from/I/to/epsilon/as F1 was increased. Any phonetic effects of adding extraneous sounds were measured as a change in the position of the phoneme boundary on the continuum. Experiments 1-5 demonstrated that a pair of extraneous tones, mistuned from harmonic values of the fundamental frequency of the vowel, could influence perceived vowel quality when added in the F1 region. Perceived F1 frequency was lowered when the tones were added on the lower skirt of F1, and raised when they were added on the upper skirt. Experiments 6 and 7 demonstrated that adding a narrow-band noise in the F1 region could produce a similar pattern of boundary shifts, despite the differences in temporal properties and timbre between a noise band and a voiced vowel. The data are interpreted using the concept of the harmonic sieve [Duifhuis et al., J. Acoust. Soc. Am. 71, 1568-1580 (1982)]. The results imply a partial failure of the harmonic sieve to exclude extraneous sounds from the perceptual estimation of F1 frequency. Implications for the nature of the hypothetical harmonic sieve are discussed.  相似文献   

14.
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.  相似文献   

15.
There exists no clear understanding of the importance of spectral tilt for perception of stop consonants. It is hypothesized that spectral tilt may be particularly salient when formant patterns are ambiguous or degraded. Here, it is demonstrated that relative change in spectral tilt over time, not absolute tilt, significantly influences perception of /b/ vs /d/. Experiments consisted of burstless synthesized stimuli that varied in spectral tilt and onset frequency of the second formant. In Experiment 1, tilt of the consonant at voice onset was varied. In Experiment 2, tilt of the vowel steady state was varied. Results of these experiments were complementary and revealed a significant contribution of relative spectral tilt change only when formant information was ambiguous. Experiments 3 and 4 replicated Experiments 1 and 2 in an /aba/-/ada/ context. The additional tilt contrast provided by the initial vowel modestly enhanced effects. In Experiment 5, there was no effect for absolute tilt when consonant and vowel tilts were identical. Consistent with earlier studies demonstrating contrast between successive local spectral features, perceptual effects of gross spectral characteristics are likewise relative. These findings have implications for perception in nonlaboratory environments and for listeners with hearing impairment.  相似文献   

16.
Reflex modification was used in a psychophysical technique to measure absolute auditory sensitivity of two species of anurans. Behavioral audiograms for these animals reveal that the bullfrog can detect sounds from 100 Hz to 3.2 kHz and the green tree frog from 100 Hz to 5 kHz. The shape and the sensitivity of these behavioral audiograms are similar to those of neural evoked-response audiograms of these animals. Absolute auditory sensitivity of anurans is only partially related to the spectral composition of their species-specific vocalizations.  相似文献   

17.
18.
为对室内不拆装情况下大型或整车上的多谱段光电装备进行光轴平行性检校,设计了大尺度多光谱多光轴平行性检校系统。系统采用一个多光谱平行光管提供多个谱段的无限远目标,通过二维移动平台实现平行光管的室内大跨度移动。利用倾角传感器、双线阵CCD测量系统和姿态调整机构来恢复和保证平行光管移动前后的光轴平行性,实现室内分布在车体上不同轴距不同谱段光电装备的光轴平行性进行统一检校。系统设计方案和误差分析结果表明:该系统平行光管移动前后的光轴平行性总误差小于0.142 mrad,在提高检校精度的同时还大大减小了光轴平行性检校的工作量;各分系统中倾角传感器和姿态调整机构误差对系统总误差贡献最大,通过选用更高精度的分系统还可进一步提高系统的总体精度,满足更高精度装备的光轴平行性检校要求。  相似文献   

19.
Performance on 19 auditory discrimination and identification tasks was measured for 340 listeners with normal hearing. Test stimuli included single tones, sequences of tones, amplitude-modulated and rippled noise, temporal gaps, speech, and environmental sounds. Principal components analysis and structural equation modeling of the data support the existence of a general auditory ability and four specific auditory abilities. The specific abilities are (1) loudness and duration (overall energy) discrimination; (2) sensitivity to temporal envelope variation; (3) identification of highly familiar sounds (speech and nonspeech); and (4) discrimination of unfamiliar simple and complex spectral and temporal patterns. Examination of Scholastic Aptitude Test (SAT) scores for a large subset of the population revealed little or no association between general or specific auditory abilities and general intellectual ability. The findings provide a basis for research to further specify the nature of the auditory abilities. Of particular interest are results suggestive of a familiar sound recognition (FSR) ability, apparently specialized for sound recognition on the basis of limited or distorted information. This FSR ability is independent of normal variation in both spectral-temporal acuity and of general intellectual ability.  相似文献   

20.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号