首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.  相似文献   

2.
This study investigated comodulation detection differences (CDD) for fixed- and roved-frequency maskers. The objective was to determine whether CDD could be accounted for better in terms of energetic masking or in terms of perceptual fusion/segregation related to comodulation. Roved-frequency maskers were used in order to minimize the role of energetic masking, allowing possible effects related to perceptual fusion/segregation to be revealed. The signals and maskers were composed of 30-Hz-wide noise bands. The signal was either comodulated with the masker (A/A condition) or had a temporal envelope that was independent (A/B condition). The masker was either gated synchronously with the signal or had a leading temporal fringe of 200 ms. In the fixed-frequency masker conditions, listeners with low A/A thresholds showed little masking release due to masker temporal fringe and had CDDs that could be accounted for by energetic masking. Listeners with higher A/A thresholds in the fixed-frequency masker conditions showed relatively large CDDs and large masking release due to a masker temporal fringe. The CDDs of these listeners may have arisen, at least in part, from processes related to perceptual segregation. Some listeners in the roved masker conditions also had large CDDs that appeared to be related to perceptual segregation.  相似文献   

3.
This study demonstrates a new possibility of estimating intelligibility of speech in informational maskers. The temporal and spectral properties of sound maskers are investigated to achieve acoustic privacy in public spaces. Speech intelligibility (SI) tests were conducted using Japanese sentences in daily use for energy (white noise) or informational (reversed speech) maskers. We found that the masking effects including informational masking on SI might not be estimated by analyzing the narrow-band temporal envelopes, which is a common way of predicting SI under noisy conditions. The masking effects might instead be visualized by spectral auto-correlation analysis on a frame-by-frame basis, for the series of dominant-spectral peaks of the masked target in the frequency domain. Consequently, we found that dissimilarity in frame-based spectral-auto-correlation sequences between the original and masked targets was the key to evaluating maskers including informational masking effects on SI.  相似文献   

4.
This study investigated the effects of simulated cochlear-implant processing on speech reception in a variety of complex masking situations. Speech recognition was measured as a function of target-to-masker ratio, processing condition (4, 8, 24 channels, and unprocessed) and masker type (speech-shaped noise, amplitude-modulated speech-shaped noise, single male talker, and single female talker). The results showed that simulated implant processing was more detrimental to speech reception in fluctuating interference than in steady-state noise. Performance in the 24-channel processing condition was substantially poorer than in the unprocessed condition, despite the comparable representation of the spectral envelope. The detrimental effects of simulated implant processing in fluctuating maskers, even with large numbers of channels, may be due to the reduction in the pitch cues used in sound source segregation, which are normally carried by the peripherally resolved low-frequency harmonics and the temporal fine structure. The results suggest that using steady-state noise to test speech intelligibility may underestimate the difficulties experienced by cochlear-implant users in fluctuating acoustic backgrounds.  相似文献   

5.
The first part of this paper presents several experiments on signal detection in temporally modulated noise, yielding a general approach toward the concept of comodulation masking release (CMR). Measurements were made on masked thresholds of both long- and short-duration, narrow-band signals presented in a 100% sinusoidally amplitude-modulated (SAM) noise masker (modulation frequency 32 Hz), as a function of masker bandwidth from 1/3 oct up to 13/3 octs, while the masker band was geometrically centered at signal frequency. With the short-duration signals placed in the valley of the masker, a substantial CMR (i.e., a decrease of masked threshold with increasing masker bandwidth) was found, whereas for the long-duration signals CMR was smaller. Furthermore, investigations were carried out to determine whether CMR changes when the bandwidth of the signals, consisting of bandpass impulse responses, is increased. The data indicate that substantial CMR remains even when all masker bands contain a signal component, thus minimizing across-channel differences. This finding is not in line with current models accounting for the CMR phenomenon. The second part of this paper concerns signal detection in spectrally shaped noise. Also investigated was whether release from masking occurs for the detection of a pure-tone signal at a valley or a peak of a simultaneously presented masking noise with a sinusoidally rippled power spectrum, when this masker was preceded and followed by a second noise (temporal flanking burst) with an identical spectral shape as the on-signal noise. Similar to CMR effects for temporal modulations, the data indicate that coshaping masking release (CSMR) occurs when the signal is placed in a valley of the spectral envelope of the masker, whereas no release from masking is found when the signal is placed at a peak of the spectral envelope of the masker. The implications of these experiments for measures of spectral and temporal resolution are discussed.  相似文献   

6.
Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.  相似文献   

7.
Speech-reception thresholds (SRT) were measured for 17 normal-hearing and 17 hearing-impaired listeners in conditions simulating free-field situations with between one and six interfering talkers. The stimuli, speech and noise with identical long-term average spectra, were recorded with a KEMAR manikin in an anechoic room and presented to the subjects through headphones. The noise was modulated using the envelope fluctuations of the speech. Several conditions were simulated with the speaker always in front of the listener and the maskers either also in front, or positioned in a symmetrical or asymmetrical configuration around the listener. Results show that the hearing impaired have significantly poorer performance than the normal hearing in all conditions. The mean SRT differences between the groups range from 4.2-10 dB. It appears that the modulations in the masker act as an important cue for the normal-hearing listeners, who experience up to 5-dB release from masking, while being hardly beneficial for the hearing impaired listeners. The gain occurring when maskers are moved from the frontal position to positions around the listener varies from 1.5 to 8 dB for the normal hearing, and from 1 to 6.5 dB for the hearing impaired. It depends strongly on the number of maskers and their positions, but less on hearing impairment. The difference between the SRTs for binaural and best-ear listening (the "cocktail party effect") is approximately 3 dB in all conditions for both the normal-hearing and the hearing-impaired listeners.  相似文献   

8.
It is often assumed that listeners detect an increment in the intensity of a pure tone by detecting an increase in the energy falling within the critical band centered on the signal frequency. A noise masker can be used to limit the use of signal energy falling outside of the critical band, but facets of the noise may impact increment detection beyond this intended purpose. The current study evaluated the impact of envelope fluctuation in a noise masker on thresholds for detection of an increment. Thresholds were obtained for detection of an increment in the intensity of a 0.25- or 4-kHz pedestal in quiet and in the presence of noise of varying bandwidth. Results indicate that thresholds for detection of an increment in the intensity of a pure tone increase with increasing bandwidth for an on-frequency noise masker, but are unchanged by an off-frequency noise masker. Neither a model that includes a modulation-filter-bank analysis of envelope modulation nor a model based on discrimination of spectral patterns can account for all aspects of the observed data.  相似文献   

9.
To assess age-related differences in benefit from masker modulation, younger and older adults with normal hearing but not identical audiograms listened to nonsense syllables in each of two maskers: (1) a steady-state noise shaped to match the long-term spectrum of the speech, and (2) this same noise modulated by a 10-Hz square wave, resulting in an interrupted noise. An additional low-level broadband noise was always present which was shaped to produce equivalent masked thresholds for all subjects. This minimized differences in speech audibility due to differences in quiet thresholds among subjects. An additional goal was to determine if age-related differences in benefit from modulation could be explained by differences in thresholds measured in simultaneous and forward maskers. Accordingly, thresholds for 350-ms pure tones were measured in quiet and in each masker; thresholds for 20-ms signals in forward and simultaneous masking were also measured at selected signal frequencies. To determine if benefit from modulated maskers varied with masker spectrum and to provide a comparison with previous studies, a subgroup of younger subjects also listened in steady-state and interrupted noise that was not spectrally shaped. Articulation index (AI) values were computed and speech-recognition scores were predicted for steady-state and interrupted noise; predicted benefit from modulation was also determined. Masked thresholds of older subjects were slightly higher than those of younger subjects; larger age-related threshold differences were observed for short-duration than for long-duration signals. In steady-state noise, speech recognition for older subjects was poorer than for younger subjects, which was partially attributable to older subjects' slightly higher thresholds in these maskers. In interrupted noise, although predicted benefit was larger for older than younger subjects, scores improved more for younger than for older subjects, particularly at the higher noise level. This may be related to age-related increases in thresholds in steady-state noise and in forward masking, especially at higher frequencies. Benefit of interrupted maskers was larger for unshaped than for speech-shaped noise, consistent with AI predictions.  相似文献   

10.
An extended version of the equalization-cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers. The model incorporates time-varying jitters, both in time and amplitude, and implements the equalization and cancellation operations in each frequency band independently. The model is consistent with the original EC model in predicting tone-detection performance for a large set of configurations. When the model is applied to speech, the speech intelligibility index is used to predict speech intelligibility performance in a variety of conditions. Specific conditions addressed include different types of maskers, different numbers of maskers, and different spatial locations of maskers. Model predictions are compared with empirical measurements reported by Hawley et al. [J. Acoust. Soc. Am. 115, 833-843 (2004)] and by Marrone et al. [J. Acoust. Soc. Am. 124, 1146-1158 (2008)]. The model succeeds in predicting speech intelligibility performance when maskers are speech-shaped noise or broadband-modulated speech-shaped noise but fails when the maskers are speech or reversed speech.  相似文献   

11.
12.
Preschoolers and adults were asked to detect a 1000-Hz signal, which was masked by a multitone complex. The frequencies and amplitudes of the components in the complex varied randomly and independently on each presentation. A staircase, cued two-interval, forced-choice procedure disguised as a "listening game" was used to obtain signal thresholds in quiet and in the presence of the multitone maskers. The number of components in the masker was fixed within an experimental condition and varied from 2 to 906 across experimental conditions. Thresholds were also measured with a broadband noise masker. Eight preschool children and eight adults were tested. Although individual differences were large, among both adults and children, there was little difference between the groups in the mean amount of masking produced by the maskers with large numbers of components (400 and 906). There was also a small but significant difference between adults and children in the mean amount of masking produced by the broadband noise. The difference between the groups was much larger with smaller numbers of components. Data obtained from the adults were basically similar to that previously reported [cf. Neff and Green, Percept. Psychophys. 41, 409-415 (1987); Oh and Lutfi, J. Acoust. Soc. Am. 104, 3489-3499 (1998)]: maskers comprised of 10-40 components produced as much as 30 to 60 dB of masking in some, but not all listeners. Those same maskers produced larger amounts of masking (70-83 dB) in many of the preschool children, although, as in the adult group, individual differences were large. The component-relative-entropy (CoRE) model [Lutfi, J. Acoust. Soc. Am. 94, 748-758 (1993)] was used to describe the differences in performance between the children and adults. According to this model the average child appears to integrate information over a larger number of auditory filters than the average adult.  相似文献   

13.
蒋斌  匡正  吴鸣  杨军 《声学学报》2012,37(6):659-666
实验研究了帧长对汉语音段反转言语可懂度的影响。实验结果表明,帧长在64 ms以下,汉语音段反转言语具有较高的可懂度;帧长在64~203 ms之间,可懂度随帧长的增加逐渐降低;帧长在203 ms以上,可懂度为0。在帧长8 ms时,汉语的声调失真导致可懂度下降。原始语音信号和音段反转言语的调制谱的分析表明,调制谱失真大小和可懂度密切相关。因此,用原始语音信号和音段反转言语的窄带包络间的归一化相关值可以衡量调制谱失真大小,基于语音的语言传输指数法计算的客观值和实验结果显著相关(r=0.876,p<0.01)。研究表明,语言可懂度与窄带包络有关,音段反转言语的可懂度和保留原始语音信号的窄带包络密切相关。  相似文献   

14.
When normal-hearing adults and children are required to detect a 1000-Hz tone in a random-frequency multitone masker, masking is often observed in excess of that predicted by traditional auditory filter models. The excess masking is called informational masking. Though individual differences in the effect are large, the amount of informational masking is typically much greater in young children than in adults [Oh et al., J. Acoust. Soc. Am. 109, 2888-2895 (2001)]. One factor that reduces informational masking in adults is spatial separation of the target tone and masker. The present study was undertaken to determine whether or not a similar effect of spatial separation is observed in children. An extreme case of spatial separation was used in which the target tone was presented to one ear and the random multitone masker to the other ear. This condition resulted in nearly complete elimination of masking in adults. In young children, however, presenting the masker to the nontarget ear typically produced only a slight decrease in overall masking and no change in informational masking. The results for children are interpreted in terms of a model that gives equal weight to the auditory filter outputs from each ear.  相似文献   

15.
A single-interval, yes-no, tone-in-noise detection experiment was conducted to measure the proportion of "tone present" responses to each of 25 reproducible noise-alone and tone-plus-noise waveforms under narrowband (100 Hz), wideband (2900 Hz), monotic, and diotic stimulus conditions. Proportions of "tone present" responses (estimates of the probabilities of hits and false alarms) were correlated across masker bandwidths and across monotic and diotic conditions. Two categories of models were considered; one based on stimulus energy or neural counts, and another based on temporal structure of the stimulus envelope or neural patterns. Both categories gave significant correlation between decision variables and data. A model based on a weighted combination of energy in multiple critical bands performed best, predicting up to 90% of the variance in the reproducible-noise data. However, since energy-based models are unable to successfully explain detection under a roving-level paradigm without substantial modification, it is argued that other variations of detection models must be considered for future study. Temporal models are resistant to changes in threshold under roving-level conditions, but explained at most only 67% of the variance in the reproducible-noise data.  相似文献   

16.
Previous research has identified a "synchrony window" of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception.  相似文献   

17.
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.  相似文献   

18.
Children between the ages of 4 and 7 and adults were tested in free field on speech intelligibility using a four-alternative forced choice paradigm with spondees. Target speech was presented from front (0 degrees); speech or modulated speech-shaped-noise competitors were either in front or on the right (90 degrees). Speech reception thresholds were measured adaptively using a three-down/one-up algorithm. The primary difference between children and adults was seen in elevated thresholds in children in quiet and in all masked conditions. For both age groups, masking was greater with the speech-noise versus speech competitor and with two versus one competitor(s). Masking was also greater when the competitors were located in front compared with the right. The amount of masking did not differ across the two age groups. Spatial release from masking was similar in the two age groups, except for in the one-speech condition, when it was greater in children than adults. These findings suggest that, similar to adults, young children are able to utilize spatial and/or head shadow cues to segregate sounds in noisy environments. The potential utility of the measures used here for studying hearing-impaired children is also discussed.  相似文献   

19.
We compare psychophysical tuning curves obtained with sinusoidal and narrow-band (50-Hz wide) noise maskers in both simultaneous and forward masking. In one experiment, we examine the effects of different combinations of duration and intensity of the 1-kHz sinusoidal signal. In a second experiment, we compare tuning curves obtained with a sinusoidal signal to those obtained with a noise signal. In both experiments, a narrow-band noise is a more effective simultaneous masker than a sinusoid for masker frequencies near the signal frequency. We argue that this is probably due to the use of different detection cues in the presence of sinusoidal and noise maskers, and that the greater masking produced by the noise is not simply due to its greater variability. As observed in other studies, tuning curves are narrower in forward masking than in simultaneous masking.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号