首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Moore and Se?k [J. Acoust. Soc. Am. 125, 3186-3193 (2009)] measured discrimination of a harmonic complex tone and a tone in which all harmonics were shifted upwards by the same amount in Hertz. Both tones were passed through a fixed bandpass filter and a background noise was used to mask combination tones. Performance was well above chance when the fundamental frequency was 800 Hz, and all audible components were above 8000 Hz. Moore and Se?k argued that this suggested the use of temporal fine structure information at high frequencies. However, the task may have been performed using excitation-pattern cues. To test this idea, performance on a similar task was measured as a function of level. The auditory filters broaden with increasing level, so performance based on excitation-pattern cues would be expected to worsen as level increases. The results did not show such an effect, suggesting that the task was not performed using excitation-pattern cues.  相似文献   

2.
Temporal resolution is often measured using the detection of temporal gaps or signals in temporal gaps embedded in long-duration stimuli. In this study, psychoacoustical paradigms are developed for measuring the temporal encoding of transient stimuli. The stimuli consisted of very short pips which, in two experiments, contained a steady state portion. The carrier was high-pass filtered, dynamically compressed noise, refreshed for every stimulus presentation. The first experiment shows that, with these very short stimuli, gap detection thresholds are about the same as obtained in previous investigations. Experiments II and III show that, using the same stimuli, temporal-separation thresholds and duration-discrimination thresholds are better than gap-detection thresholds. Experiment IV investigates the significance of residual spectral cues for the listeners' performance. In experiment V, temporal separation thresholds were measured as a function of the signal-pip sensation level (SL) in both forward- and backward-masking conditions. The separation thresholds show a strong temporal asymmetry with good separation thresholds independent of signal-pip SL in backward-masking conditions and increasing separation thresholds with decreasing signal-pip SL in forward-masking conditions. A model of the auditory periphery is used to stimulate the gap-detection and temporal-separation thresholds quantitatively. By varying parameters like auditory-filter width and transduction time constants, the model provides some insight into how the peripheral auditory system may cope with temporal processing tasks and thus represents a more physiology-related complement to current models of temporal processing.  相似文献   

3.
Perceptual equivalence of acoustic cues that differentiate /r/ and /l/   总被引:1,自引:0,他引:1  
The perceptual effects of orthogonal variations in two acoustic parameters which differentiate American English prevocalic /r/ and /l/ were examined. A spectral cue (frequency onset and transition of F2 and F3) and a temporal cue (relative duration of initial steady state and transition of F1) were varied in synthetic versions of "rock" and "lock." Four temporal variations in each of ten stimuli of a spectral-cue continuum were generated. Phonetic identification and oddity discrimination tasks with the four series showed systematic displacement of perceptual boundaries and discrimination peaks, thus reflecting a trading relation between the two cues. The perceptual equivalence of spectral and temporal cues was investigated by comparing the accuracy of discrimination of three types of stimulus comparisons: phonetically facilitating two-cue pairs, one-cue pairs, and phonetically conflicting two-cue pairs. As predicted, discrimination accuracy was ordered: Facilitating cues greater than one-cue greater than conflicting cues, indicating that perceivers discriminated on the basis of an integrated phonetic percept.  相似文献   

4.
While a large portion of the variance among listeners in speech recognition is associated with the audibility of components of the speech waveform, it is not possible to predict individual differences in the accuracy of speech processing strictly from the audiogram. This has suggested that some of the variance may be associated with individual differences in spectral or temporal resolving power, or acuity. Psychoacoustic measures of spectral-temporal acuity with nonspeech stimuli have been shown, however, to correlate only weakly (or not at all) with speech processing. In a replication and extension of an earlier study [Watson et al., J. Acoust. Soc. Am. Suppl. 1 71. S73 (1982)] 93 normal-hearing college students were tested on speech perception tasks (nonsense syllables, words, and sentences in a noise background) and on six spectral-temporal discrimination tasks using simple and complex nonspeech sounds. Factor analysis showed that the abilities that explain performance on the nonspeech tasks are quite distinct from those that account for performance on the speech tasks. Performance was significantly correlated among speech tasks and among nonspeech tasks. Either, (a) auditory spectral-temporal acuity for nonspeech sounds is orthogonal to speech processing abilities, or (b) the appropriate tasks or types of nonspeech stimuli that challenge the abilities required for speech recognition have yet to be identified.  相似文献   

5.
Although in a number of experiments noise-band vocoders have been shown to provide acoustic models for speech perception in cochlear implants (CI), the present study assesses in four experiments whether and under what limitations noise-band vocoders can be used as an acoustic model for pitch perception in CI. The first two experiments examine the effect of spectral smearing on simulated electrode discrimination and fundamental frequency (FO) discrimination. The third experiment assesses the effect of spectral mismatch in an FO-discrimination task with two different vocoders. The fourth experiment investigates the effect of amplitude compression on modulation rate discrimination. For each experiment, the results obtained from normal-hearing subjects presented with vocoded stimuli are compared to results obtained directly from CI recipients. The results show that place pitch sensitivity drops with increased spectral smearing and that place pitch cues for multi-channel stimuli can adequately be mimicked when the discriminability of adjacent channels is adjusted by varying the spectral slopes to match that of CI subjects. The results also indicate that temporal pitch sensitivity is limited for noise-band carriers with low center frequencies and that the absence of a compression function in the vocoder might alter the saliency of the temporal pitch cues.  相似文献   

6.
Performance on 19 auditory discrimination and identification tasks was measured for 340 listeners with normal hearing. Test stimuli included single tones, sequences of tones, amplitude-modulated and rippled noise, temporal gaps, speech, and environmental sounds. Principal components analysis and structural equation modeling of the data support the existence of a general auditory ability and four specific auditory abilities. The specific abilities are (1) loudness and duration (overall energy) discrimination; (2) sensitivity to temporal envelope variation; (3) identification of highly familiar sounds (speech and nonspeech); and (4) discrimination of unfamiliar simple and complex spectral and temporal patterns. Examination of Scholastic Aptitude Test (SAT) scores for a large subset of the population revealed little or no association between general or specific auditory abilities and general intellectual ability. The findings provide a basis for research to further specify the nature of the auditory abilities. Of particular interest are results suggestive of a familiar sound recognition (FSR) ability, apparently specialized for sound recognition on the basis of limited or distorted information. This FSR ability is independent of normal variation in both spectral-temporal acuity and of general intellectual ability.  相似文献   

7.
This study presents a psychoacoustic analysis of the integration of spectral and temporal cues in the discrimination of simple nonspeech sounds. The experimental task was a same-different discrimination between a standard and a comparison pair of tones. Each pair consists of two 80-ms, 1500-Hz tone bursts separated by a 60-ms interval. The just-discriminable (d' = 2.0) increment in duration delta t, of one of the bursts was measured as a function of increments in the frequency delta f, of the other burst. A trade off between the values of delta t and delta f required to perform at d' = 2.0 was observed, which suggests that listeners integrate the evidence from the two dimensions. Integration occurred with both sub- and supra-threshold values of delta t or delta f, regardless of the order in which the cues were presented. The performance associated to the integration of cues was found to be determined by the discriminability of delta t plus that of delta f, and thus, it is within the psychophysical limits of auditory processing. To a first approximation the results agreed with the prediction of orthogonal vector summation of evidence stemming from signal detection theory. It is proposed that the ability to integrate spectral and temporal cues is in the repertoire of auditory processing capabilities. This integration does not appear to depend on perceiving sounds as members of phonetic classes.  相似文献   

8.
A spectral discrimination task was used to estimate the frequency range over which information about the temporal envelope is consolidated. The standard consisted of n equal intensity, random phase sinusoids, symmetrically placed around a signal component. The signal was an intensity increment of the central sinusoid, which on average was 1000 Hz. Pitch cues were degraded by randomly selecting the center frequency of the complex and single channel energy cues were degraded with a roving-level procedure. Stimulus bandwidth was controlled by varying the number of tones and the frequency separation between tones. For a fixed frequency separation, thresholds increased as n increased until a certain bandwidth was reached, beyond which thresholds decreased. This discontinuity in threshold functions suggests that different auditory processes predominate at different bandwidths, presumably an envelope analysis at bandwidths less than the breakpoint and across channel level comparisons for wider stimulus bandwidths. Estimates of the "transition bandwidth" for 46 listeners ranged from 100 to 1250 Hz. The results are consistent with a peripheral filtering system having multiple filterbanks.  相似文献   

9.
A variety of dolphin sonar discrimination experiments have been conducted, yet little is known about the cues utilized by dolphins in making fine target discriminations. In order to gain insights on cues available to echolocating dolphins, sonar discrimination experiments were conducted with human subjects using the same targets employed in dolphin experiments. When digital recordings of echoes from targets ensonified with a dolphinlike signal were played back at a slower rate to human subjects, they could also make fine target discriminations under controlled laboratory conditions about as well as dolphins under less controlled conditions. Subjects reported that time-separation-pitch and duration cues were important. They also reported that low-amplitude echo components 32 dB below the maximum echo component were usable. The signal-to-noise ratio had to be greater than 10 dB above the detection threshold for simple discrimination and 30 dB for difficult discrimination. Except for two cases in which spectral cues in the form of "click pitch" were important, subjects indicated that time-domain rather than frequency-domain processing seemed to be more relevant in analyzing the echoes.  相似文献   

10.
Twenty-eight audiometrically normal adult listeners were given a variety of auditory tests, ranging from quiet and masked thresholds through the discrimination of simple and moderately complex temporal patterns. Test-retest reliability was good. Individual differences persisted on a variety of psychoacoustic tasks following a period of training using adaptive threshold-tracking methods, and with trial-by-trial feedback. Large individual differences in performance on temporal-sequence-discrimination tasks suggest that this form of temporal processing may be of clinical significance. In addition, high correlations were obtained within given classes of tests (as, between all tests of frequency discrimination) and between certain classes of tests (as, between tests of frequency discrimination and those of sequence discrimination). Patterns of individual differences were found which support the conclusion that individual differences in auditory performance are, in part, a function of patterns of independent abilities.  相似文献   

11.
Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features.  相似文献   

12.
A previous comparative analysis of normalized click amplitude spectra from a Tursiops truncatus has shown that those frequencies with the lowest click-to-click variability in spectral content were the frequencies the animal paid attention to during target discrimination tasks. In that case, the dolphin only paid attention to the frequency range between 29-42 kHz which had a significantly higher degree of consistency in spectral content than frequencies above 42 kHz. Here it is shown that despite their morphological and behavioral differences, this same pattern of consistency was used by a Pseudorca crassidens performing a similar discrimination task. This comparison between species provides a foundation for using spectral level variability to determine the frequencies most important for echolocation in rare species and non-captive animals. Such results provide key information for successful management.  相似文献   

13.
Limited consonant phonemic information can be conveyed by the temporal characteristics of speech. In the two experiments reported here, the effects of practice and of multiple talkers on identification of temporal consonant information were evaluated. Naturally produced /aCa/disyllables were used to create "temporal-only" stimuli having instantaneous amplitudes identical to the natural speech stimuli, but flat spectra. Practice improved normal-hearing subjects' identification of temporal-only stimuli from a single talker over that reported earlier for a different group of unpracticed subjects [J. Acoust. Soc. Am. 82, 1152-1161 (1987)]. When the number of talkers was increased to six, however, performance was poorer than that observed for one talker, demonstrating that subjects had been able to learn the individual stimulus items derived from the speech of the single talker. Even after practice, subjects varied greatly in their abilities to extract temporal information related to consonant voicing and manner. Identification of consonant place was uniformly poor in the multiple-talker situation, indicating that for these stimuli consonant place is cued via spectral information. Comparison of consonant identification by users of multi-channel cochlear implants showed that the implant users' identification of temporal consonant information was largely within the range predicted from the normal data. In the instances where the implant users were performing especially well, they were identifying consonant place information at levels well beyond those predicted by the normal-subject data. Comparison of implant-user performance with the temporal-only data reported here can help determine whether the speech information available to the implant user consists of entirely temporal cues, or is augmented by spectral cues.  相似文献   

14.
A melodic pitch experiment was performed to demonstrate the importance of time-interval resolution for pitch strength. The experiments show that notes with a low fundamental (75 Hz) and relatively few resolved harmonics support better performance than comparable notes with a higher fundamental (300 Hz) and more resolved harmonics. Two four note melodies were presented to listeners and one note in the second melody was changed by one or two semitones. Listeners were required to identify the note that changed. There were three orthogonal stimulus dimensions: F0 (75 and 300 Hz); lowest frequency component (3, 7, 11, or 15); and number of harmonics (4 or 8). Performance decreased as the frequency of the lowest component increased for both F0's, but performance was better for the lower F0. The spectral and temporal information in the stimuli were compared using a time-domain model of auditory perception. It is argued that the distribution of time intervals in the auditory nerve can explain the decrease in performance as F0, and spectral resolution increase. Excitation patterns based on the same time-interval information do not contain sufficient resolution to explain listener's performance on the melody task.  相似文献   

15.
The acoustic environment of the bottlenose dolphin often consists of noise where energy across frequency regions is coherently modulated in time (e.g., ambient noise from snapping shrimp). However, most masking studies with dolphins have employed random Gaussian noise for estimating patterns of masked thresholds. The current study demonstrates a pattern of masking where temporally fluctuating comodulated noise produces lower masked thresholds (up to a 17 dB difference) compared to Gaussian noise of the same spectral density level. Noise possessing wide bandwidths, low temporal modulation rates, and across-frequency temporal envelope coherency resulted in lower masked thresholds, a phenomenon known as comodulation masking release. The results are consistent with a model where dolphins compare temporal envelope information across auditory filters to aid in signal detection. Furthermore, results suggest conventional models of masking derived from experiments using random Gaussian noise may not generalize well to environmental noise that dolphins actually encounter.  相似文献   

16.
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features likely critical in the perception of sound. Simplified, more specifically tailored versions of this model have already been validated by successful application in the assessment of speech intelligibility [Elhilali et al., Speech Commun. 41(2-3), 331-348 (2003); Chi et al., J. Acoust. Soc. Am. 106, 2719-2732 (1999)] and in explaining the perception of monaural phase sensitivity [R. Carlyon and S. Shamma, J. Acoust. Soc. Am. 114, 333-348 (2003)]. Here we provide a more complete mathematical formulation of the model, illustrating how complex signals are transformed through various stages of the model, and relating it to comparable existing models of auditory processing. Furthermore, we outline several reconstruction algorithms to resynthesize the sound from the model output so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept.  相似文献   

17.
A model of computational auditory signal-processing and perception that accounts for various aspects of simultaneous and nonsimultaneous masking in human listeners is presented. The model is based on the modulation filterbank model described by Dau et al. [J. Acoust. Soc. Am. 102, 2892 (1997)] but includes major changes at the peripheral and more central stages of processing. The model contains outer- and middle-ear transformations, a nonlinear basilar-membrane processing stage, a hair-cell transduction stage, a squaring expansion, an adaptation stage, a 150-Hz lowpass modulation filter, a bandpass modulation filterbank, a constant-variance internal noise, and an optimal detector stage. The model was evaluated in experimental conditions that reflect, to a different degree, effects of compression as well as spectral and temporal resolution in auditory processing. The experiments include intensity discrimination with pure tones and broadband noise, tone-in-noise detection, spectral masking with narrow-band signals and maskers, forward masking with tone signals and tone or noise maskers, and amplitude-modulation detection with narrow- and wideband noise carriers. The model can account for most of the key properties of the data and is more powerful than the original model. The model might be useful as a front end in technical applications.  相似文献   

18.
A computational model of auditory localization resulting in performance similar to humans is reported. The model incorporates both the monaural and binaural cues available to a human for sound localization. Essential elements used in the simulation of the processes of auditory cue generation and encoding by the nervous system include measured head-related transfer functions (HRTFs), minimum audible field (MAF), and the Patterson-Holdsworth cochlear model. A two-layer feed-forward back-propagation artificial neural network (ANN) was trained to transform the localization cues to a two-dimensional map that gives the direction of the sound source. The model results were compared with (i) the localization performance of the human listener who provided the HRTFs for the model and (ii) the localization performance of a group of 19 other human listeners. The localization accuracy and front-back confusion error rates exhibited by the model were similar to both the single listener and the group results. This suggests that the simulation of the cue generation and extraction processes as well as the model parameters were reasonable approximations to the overall biological processes. The amplitude resolution of the monaural spectral cues was varied and the influence on the model's performance was determined. The model with 128 cochlear channels required an amplitude resolution of approximately 20 discrete levels for encoding the spectral cue to deliver similar localization performance to the group of human listeners.  相似文献   

19.
Spectral integration refers to the summation of activity beyond the bandwidth of the peripheral auditory filter. Several experimental lines have sought to determine the bandwidth of this "supracritical" band phenomenon. This paper reports on two experiments which tested the limit on spectral integration in the same listeners. Experiment I verified the critical separation of 3.5 bark in two-formant synthetic vowels as advocated by the center-of-gravity (COG) hypothesis. According to the COG effect, two formants are integrated into a single perceived peak if their separation does not exceed approximately 3.5 bark. With several modifications to the methods of a classic COG matching task, the present listeners responded to changes in pitch in two-formant synthetic vowels, not estimating their phonetic quality. By changing the amplitude ratio of the formants, the frequency of the perceived peak was closer to that of the stronger formant. This COG effect disappeared with larger formant separation. In a second experiment, auditory spectral resolution bandwidths were measured for the same listeners using common-envelope, two-tone complex signals. Results showed that the limits of spectral averaging in two-formant vowels and two-tone spectral resolution bandwidth were related for two of the three listeners. The third failed to perform the discrimination task. For the two subjects who completed both tasks, the results suggest that the critical region in vowel task and the complex-tone discriminability estimates are linked to a common mechanism, i.e., to an auditory spectral resolving power. A signal-processing model is proposed to predict the COG effect in two-formant synthetic vowels. The model introduces two modifications to Hermansky's [J. Acoust. Soc. Am. 87, 1738-1752 (1990)] perceptual linear predictive (PLP) model. The model predictions are generally compatible with the present experimental results and with the predictions of several earlier models accounting for the COG effect.  相似文献   

20.
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号