首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
The relative importance of temporal information in broad spectral regions for consonant identification was assessed in normal-hearing listeners. For the purpose of forcing listeners to use primarily temporal-envelope cues, speech sounds were spectrally degraded using four-noise-band vocoder processing Frequency-weighting functions were determined using two methods. The first method consisted of measuring the intelligibility of speech with a hole in the spectrum either in quiet or in noise. The second method consisted of correlating performance with the randomly and independently varied signal-to-noise ratio within each band. Results demonstrated that all bands contributed equally to consonant identification when presented in quiet. In noise, however, both methods indicated that listeners consistently placed relatively more weight upon the highest frequency band. It is proposed that the explanation for the difference in results between quiet and noise relates to the shape of the modulation spectra in adjacent frequency bands. Overall, the results suggest that normal-hearing listeners use a common listening strategy in a given condition. However, this strategy may be influenced by the competing sounds, and thus may vary according to the context. Some implications of the results for cochlear implantees and hearing-impaired listeners are discussed.  相似文献   

2.
The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.  相似文献   

3.
The word recognition ability of 4 normal-hearing and 13 cochlearly hearing-impaired listeners was evaluated. Filtered and unfiltered speech in quiet and in noise were presented monaurally through headphones. The noise varied over listening situations with regard to spectrum, level, and temporal envelope. Articulation index theory was applied to predict the results. Two calculation methods were used, both based on the ANSI S3.5-1969 20-band method [S3.5-1969 (American National Standards Institute, New York)]. Method I was almost identical to the ANSI method. Method II included a level- and hearing-loss-dependent calculation of masking of stationary and on-off gated noise signals and of self-masking of speech. Method II provided the best prediction capability, and it is concluded that speech intelligibility of cochlearly hearing-impaired listeners may also, to a first approximation, be predicted from articulation index theory.  相似文献   

4.
This study investigated the effects of age and hearing loss on perception of accented speech presented in quiet and noise. The relative importance of alterations in phonetic segments vs. temporal patterns in a carrier phrase with accented speech also was examined. English sentences recorded by a native English speaker and a native Spanish speaker, together with hybrid sentences that varied the native language of the speaker of the carrier phrase and the final target word of the sentence were presented to younger and older listeners with normal hearing and older listeners with hearing loss in quiet and noise. Effects of age and hearing loss were observed in both listening environments, but varied with speaker accent. All groups exhibited lower recognition performance for the final target word spoken by the accented speaker compared to that spoken by the native speaker, indicating that alterations in segmental cues due to accent play a prominent role in intelligibility. Effects of the carrier phrase were minimal. The findings indicate that recognition of accented speech, especially in noise, is a particularly challenging communication task for older people.  相似文献   

5.
When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.  相似文献   

6.
Relations between perception of suprathreshold speech and auditory functions were examined in 24 hearing-impaired listeners and 12 normal-hearing listeners. The speech intelligibility index (SII) was used to account for audibility. The auditory functions included detection efficiency, temporal and spectral resolution, temporal and spectral integration, and discrimination of intensity, frequency, rhythm, and spectro-temporal shape. All auditory functions were measured at 1 kHz. Speech intelligibility was assessed with the speech-reception threshold (SRT) in quiet and in noise, and with the speech-reception bandwidth threshold (SRBT), previously developed for investigating speech perception in a limited frequency region around 1 kHz. The results showed that the elevated SRT in quiet could be explained on the basis of audibility. Audibility could only partly account for the elevated SRT values in noise and the deviant SRBT values, suggesting that suprathreshold deficits affected intelligibility in these conditions. SII predictions for the SRBT improved significantly by including the individually measured upward spread of masking in the SII model. Reduced spectral resolution, reduced temporal resolution, and reduced frequency discrimination appeared to be related to speech perception deficits. Loss of peripheral compression appeared to have the smallest effect on the intelligibility of suprathreshold speech.  相似文献   

7.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

8.
The role of perceived spatial separation in the unmasking of speech   总被引:12,自引:0,他引:12  
Spatial separation of speech and noise in an anechoic space creates a release from masking that often improves speech intelligibility. However, the masking release is severely reduced in reverberant spaces. This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections. Listeners' identification of nonsense sentences spoken by a female talker was measured in the presence of either speech-spectrum noise or other sentences spoken by a second female talker. Target and interference stimuli were presented in an anechoic chamber from loudspeakers directly in front and 60 degrees to the right in single-source and precedence-effect (lead-lag) conditions. For speech-spectrum noise, the spatial separation advantage for speech recognition (8 dB) was predictable from articulation index computations based on measured release from masking for narrow-band stimuli. The spatial separation advantage was only 1 dB in the lead-lag condition, despite the fact that a large perceptual separation was produced by the precedence effect. For the female talker interference, a much larger advantage occurred, apparently because informational masking was reduced by differences in perceived locations of target and interference.  相似文献   

9.
Frequency resolution was evaluated for two normal-hearing and seven hearing-impaired subjects with moderate, flat sensorineural hearing loss by measuring percent correct detection of a 2000-Hz tone as the width of a notch in band-reject noise increased. The level of the tone was fixed for each subject at a criterion performance level in broadband noise. Discrimination of synthetic speech syllables that differed in spectral content in the 2000-Hz region was evaluated as a function of the notch width in the same band-reject noise. Recognition of natural speech consonant/vowel syllables in quiet was also tested; results were analyzed for percent correct performance and relative information transmitted for voicing and place features. In the hearing-impaired subjects, frequency resolution at 2000 Hz was significantly correlated with the discrimination of synthetic speech information in the 2000-Hz region and was not related to the recognition of natural speech nonsense syllables unless (a) the speech stimuli contained the vowel /i/ rather than /a/, and (b) the score reflected information transmitted for place of articulation rather than percent correct.  相似文献   

10.
Temporal integration for a 1000-Hz signal was determined for normal-hearing and cochlear hearing-impaired listeners in quiet and in masking noise of variable bandwidth. Critical ratio and 3-dB critical band measures of frequency resolution were derived from the masking data. Temporal integration for the normal-hearing listeners was markedly reduced in narrow-band noise, when contrasted with temporal integration in quiet or in wideband noise. The effect of noise bandwidth on temporal integration was smaller for the hearing-impaired group. Hearing-impaired subjects showed both reduced temporal integration and reduced frequency resolution for the 200-ms signal. However, a direct relation between temporal integration and frequency resolution was not indicated. Frequency resolution for the normal-hearing listeners did not differ from that of the hearing-impaired listeners for the 20-ms signal. It was suggested that some of the frequency resolution and temporal integration differences between normal-hearing and hearing-impaired listeners could be accounted for by off-frequency listening.  相似文献   

11.
Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.  相似文献   

12.
In the present study, the effects of interference from combined noises on speech transmission were investigated in a simulated open public space. Sound fields for dominant noises were predicted using a typical urban square model surrounded by buildings. Then road traffic noise and two types of construction noises, corresponding to stationary and impulsive noises, were selected as background noises. Listening tests were performed on a group of adults, and the quality of speech transmission was evaluated using listening difficulty as well as intelligibility scores. During the listening tests, two factors that affect speech transmission performance were considered: (1) temporal characteristics of construction noise (stationary or impulsive) and (2) the levels of the construction and road traffic noises. The results indicated that word intelligibility scores and listening difficulty ratings were affected by the temporal characteristics of construction noise due to fluctuations in the background noise level. It was also observed that listening difficulty is unable to describe the speech transmission in noisy open public spaces showing larger variation than did word intelligibility scores.  相似文献   

13.
Vowel identification was tested in quiet, noise, and reverberation with 20 normal-hearing subjects and 20 hearing-impaired subjects. Stimuli were 15 English vowels spoken in a /b-t/context by six male talkers. Each talker produced five tokens of each vowel. In quiet, all stimuli were identified by two judges as the intended targets. The stimuli were degraded by reverberation or speech-spectrum noise. Vowel identification scores depended upon talker, listening condition, and subject type. The relationship between identification errors and spectral details of the vowels is discussed.  相似文献   

14.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

15.
Recent simulations of continuous interleaved sampling (CIS) cochlear implant speech processors have used acoustic stimulation that provides only weak cues to pitch, periodicity, and aperiodicity, although these are regarded as important perceptual factors of speech. Four-channel vocoders simulating CIS processors have been constructed, in which the salience of speech-derived periodicity and pitch information was manipulated. The highest salience of pitch and periodicity was provided by an explicit encoding, using a pulse carrier following fundamental frequency for voiced speech, and a noise carrier during voiceless speech. Other processors included noise-excited vocoders with envelope cutoff frequencies of 32 and 400 Hz. The use of a pulse carrier following fundamental frequency gave substantially higher performance in identification of frequency glides than did vocoders using envelope-modulated noise carriers. The perception of consonant voicing information was improved by processors that preserved periodicity, and connected discourse tracking rates were slightly faster with noise carriers modulated by envelopes with a cutoff frequency of 400 Hz compared to 32 Hz. However, consonant and vowel identification, sentence intelligibility, and connected discourse tracking rates were generally similar through all of the processors. For these speech tasks, pitch and periodicity beyond the weak information available from 400 Hz envelope-modulated noise did not contribute substantially to performance.  相似文献   

16.
The effects of intensity on monosyllabic word recognition were studied in adults with normal hearing and mild-to-moderate sensorineural hearing loss. The stimuli were bandlimited NU#6 word lists presented in quiet and talker-spectrum-matched noise. Speech levels ranged from 64 to 99 dB SPL and S/N ratios from 28 to -4 dB. In quiet, the performance of normal-hearing subjects remained essentially constant in noise, at a fixed S/N ratio, it decreased as a linear function of speech level. Hearing-impaired subjects performed like normal-hearing subjects tested in noise when the data were corrected for the effects of audibility loss. From these and other results, it was concluded that: (1) speech intelligibility in noise decreases when speech levels exceed 69 dB SPL and the S/N ratio remains constant; (2) the effects of speech and noise level are synergistic; (3) the deterioration in intelligibility can be modeled as a relative increase in the effective masking level; (4) normal-hearing and hearing-impaired subjects are affected similarly by increased signal level when differences in speech audibility are considered; (5) the negative effects of increasing speech and noise levels on speech recognition are similar for all adult subjects, at least up to 80 years; and (6) the effective dynamic range of speech may be larger than the commonly assumed value of 30 dB.  相似文献   

17.
Internal noise generated by hearing-aid circuits can be audible and objectionable to aid users, and may lead to the rejection of hearing aids. Two expansion algorithms were developed to suppress internal noise below a threshold level. The multiple-channel algorithm's expansion thresholds followed the 55-dB SPL long-term average speech spectrum, while the single-channel algorithm suppressed sounds below 45 dBA. With the recommended settings in static conditions, the single-channel algorithm provided lower noise levels, which were perceived as quieter by most normal-hearing participants. However, in dynamic conditions "pumping" noises were more noticeable with the single-channel algorithm. For impaired-hearing listeners fitted with the ADRO amplification strategy, both algorithms maintained speech understanding for words in sentences presented at 55 dB SPL in quiet (99.3% correct). Mean sentence reception thresholds in quiet were 39.4, 40.7, and 41.8 dB SPL without noise suppression, and with the single- and multiple-channel algorithms, respectively. The increase in the sentence reception threshold was statistically significant for the multiple-channel algorithm, but not the single-channel algorithm. Thus, both algorithms suppressed noise without affecting the intelligibility of speech presented at 55 dB SPL, with the single-channel algorithm providing marginally greater noise suppression in static conditions, and the multiple-channel algorithm avoiding pumping noises.  相似文献   

18.
Two experiments investigated the impact of reverberation and masking on speech understanding using cochlear implant (CI) simulations. Experiment 1 tested sentence recognition in quiet. Stimuli were processed with reverberation simulation (T=0.425, 0.266, 0.152, and 0.0 s) and then either processed with vocoding (6, 12, or 24 channels) or were subjected to no further processing. Reverberation alone had only a small impact on perception when as few as 12 channels of information were available. However, when the processing was limited to 6 channels, perception was extremely vulnerable to the effects of reverberation. In experiment 2, subjects listened to reverberated sentences, through 6- and 12-channel processors, in the presence of either speech-spectrum noise (SSN) or two-talker babble (TTB) at various target-to-masker ratios. The combined impact of reverberation and masking was profound, although there was no interaction between the two effects. This differs from results obtained in subjects listening to unprocessed speech where interactions between reverberation and masking have been shown to exist. A speech transmission index (STI) analysis indicated a reasonably good prediction of speech recognition performance. Unlike previous investigations, the SSN and TTB maskers produced equivalent results, raising questions about the role of informational masking in CI processed speech.  相似文献   

19.
Although in a number of experiments noise-band vocoders have been shown to provide acoustic models for speech perception in cochlear implants (CI), the present study assesses in four experiments whether and under what limitations noise-band vocoders can be used as an acoustic model for pitch perception in CI. The first two experiments examine the effect of spectral smearing on simulated electrode discrimination and fundamental frequency (FO) discrimination. The third experiment assesses the effect of spectral mismatch in an FO-discrimination task with two different vocoders. The fourth experiment investigates the effect of amplitude compression on modulation rate discrimination. For each experiment, the results obtained from normal-hearing subjects presented with vocoded stimuli are compared to results obtained directly from CI recipients. The results show that place pitch sensitivity drops with increased spectral smearing and that place pitch cues for multi-channel stimuli can adequately be mimicked when the discriminability of adjacent channels is adjusted by varying the spectral slopes to match that of CI subjects. The results also indicate that temporal pitch sensitivity is limited for noise-band carriers with low center frequencies and that the absence of a compression function in the vocoder might alter the saliency of the temporal pitch cues.  相似文献   

20.
This study was designed to characterize the effect of background noise on the identification of syllables using behavioral and electrophysiological measures. Twenty normal-hearing adults (18-30 years) performed an identification task in a two-alternative forced-choice paradigm. Stimuli consisted of naturally produced syllables [da] and [ga] embedded in white noise. The noise was initiated 1000 ms before the onset of the speech stimuli in order to separate the auditory event related potentials (AERP) response to noise onset from that to the speech. Syllables were presented in quiet and in five SNRs: +15, +3, 0, -3, and -6 dB. Results show that (1) performance accuracy, d', and reaction time were affected by the noise, more so for reaction time; (2) both N1 and P3 latency were prolonged as noise levels increased, more so for P3; (3) [ga] was better identified than [da], in all noise conditions; and (4) P3 latency was longer for [da] than for [ga] for SNR 0 through -6 dB, while N1 latency was longer for [ga] than for [da] in most listening conditions. In conclusion, the unique stimuli structure utilized in this study demonstrated the effects of noise on speech recognition at both the physical and the perceptual processing levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号