首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Studies of speech perception in various types of background noise have shown that noise with linguistic content affects listeners differently than nonlinguistic noise [e.g., Simpson, S. A., and Cooke, M. (2005). "Consonant identification in N-talker babble is a nonmonotonic function of N," J. Acoust. Soc. Am. 118, 2775-2778; Sperry, J. L., Wiley, T. L., and Chial, M. R. (1997). "Word recognition performance in various background competitors," J. Am. Acad. Audiol. 8, 71-80] but few studies of multi-talker babble have employed background babble in languages other than the target speech language. To determine whether the adverse effect of background speech is due to the linguistic content or to the acoustic characteristics of the speech masker, this study assessed speech-in-noise recognition when the language of the background noise was either the same or different from the language of the target speech. Replicating previous findings, results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble, regardless of the language of the babble. In addition, our results showed that in two-talker babble, native English listeners were more adversely affected by English babble than by Mandarin Chinese babble. These findings demonstrate informational masking on sentence-in-noise recognition in the form of "linguistic interference." Whether this interference is at the lexical, sublexical, and/or prosodic levels of linguistic structure and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.  相似文献   

2.
The present study investigated the effect of envelope modulations in a background masker on consonant recognition by normal hearing listeners. It is well known that listeners understand speech better under a temporally modulated masker than under a steady masker at the same level, due to masking release. The possibility of an opposite phenomenon, modulation interference, whereby speech recognition could be degraded by a modulated masker due to interference with auditory processing of the speech envelope, was hypothesized and tested under various speech and masker conditions. It was of interest whether modulation interference for speech perception, if it were observed, could be predicted by modulation masking, as found in psychoacoustic studies using nonspeech stimuli. Results revealed that masking release measurably occurred under a variety of conditions, especially when the speech signal maintained a high degree of redundancy across several frequency bands. Modulation interference was also clearly observed under several circumstances when the speech signal did not contain a high redundancy. However, the effect of modulation interference did not follow the expected pattern from psychoacoustic modulation masking results. In conclusion, (1) both factors, modulation interference and masking release, should be accounted for whenever a background masker contains temporal fluctuations, and (2) caution needs to be taken when psychoacoustic theory on modulation masking is applied to speech recognition.  相似文献   

3.
Spoken communication in a non-native language is especially difficult in the presence of noise. This study compared English and Spanish listeners' perceptions of English intervocalic consonants as a function of masker type. Three maskers (stationary noise, multitalker babble, and competing speech) provided varying amounts of energetic and informational masking. Competing English and Spanish speech maskers were used to examine the effect of masker language. Non-native performance fell short of that of native listeners in quiet, but a larger performance differential was found for all masking conditions. Both groups performed better in competing speech than in stationary noise, and both suffered most in babble. Since babble is a less effective energetic masker than stationary noise, these results suggest that non-native listeners are more adversely affected by both energetic and informational masking. A strong correlation was found between non-native performance in quiet and degree of deterioration in noise, suggesting that non-native phonetic category learning can be fragile. A small effect of language background was evident: English listeners performed better when the competing speech was Spanish.  相似文献   

4.
This study investigated the role of uncertainty in masking of speech by interfering speech. Target stimuli were nonsense sentences recorded by a female talker. Masking sentences were recorded from ten female talkers and combined into pairs. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (nonspatial condition) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (spatial condition). In Experiment 1, the sentences were presented in blocks in which the masking talkers, spatial configuration, and signal-to-noise (S-N) ratio were fixed. Listeners' recognition performance varied widely among the masking talkers in the nonspatial condition, much less so in the spatial condition. This result was attributed to variation in effectiveness of informational masking in the nonspatial condition. The second experiment increased uncertainty by randomizing masking talkers and S-N ratios across trials in some conditions, and reduced uncertainty by presenting the same token of masker across trials in other conditions. These variations in masker uncertainty had relatively small effects on speech recognition.  相似文献   

5.
Quantifying the intelligibility of speech in noise for non-native listeners   总被引:3,自引:0,他引:3  
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.  相似文献   

6.
In the many studies done on informational masking, interfering speech reduces speech intelligibility. This effect is often used to secure privacy in public spaces. These applications require estimates of how much masking is required. In general, masking effects are estimated by using spectrum information as excitation patterns. However, estimates of informational masking can hardly be obtained by only using spectrum information. Therefore, we estimated the effects of informational masking using time-domain information. Then, we calculated the cepstra of the envelopes’ magnitude histograms. If these cepstra are different between the target and the masker, the signals are not similar in the time-domain. Furthermore, the effect of informational masking would be low. Therefore, we considered the histograms’ cepstra distances (HCD) to estimate signal similarities. The signal similarities in our first experiment were estimated using five maskers by utilizing the HCD. These maskers were random noise, music, female speech, male speech, and target speaker’s speech. Male and female speech were more similar to the target speech than music and noise. Also, the same speaker’s speech was the most similar in the set of maskers. A listening test was carried out in the second experiment to verify the HCD. A double masker was used in this experiment as an effective informational masker. It has similar characteristics to reversal speech. The listening test results suggest the double-masker’s masking effects has the same relation with HCD. This suggests informational masking can be estimated by signal similarity using the HCD.  相似文献   

7.
This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.  相似文献   

8.
The present study examined the relative influence of the off- and on-frequency spectral components of modulated and unmodulated maskers on consonant recognition. Stimuli were divided into 30 contiguous equivalent rectangular bandwidths. The temporal fine structure (TFS) in each "target" band was either left intact or replaced with tones using vocoder processing. Recognition scores for 10, 15 and 20 target bands randomly located in frequency were obtained in quiet and in the presence of all 30 masker bands, only the off-frequency masker bands, or only the on-frequency masker bands. The amount of masking produced by the on-frequency bands was generally comparable to that produced by the broadband masker. However, the difference between these two conditions was often significant, indicating an influence of the off-frequency masker bands, likely through modulation interference or spectral restoration. Although vocoder processing systematically lead to poorer consonant recognition scores, the deficit observed in noise could often be attributed to that observed in quiet. These data indicate that (i) speech recognition is affected by the off-frequency components of the background and (ii) the nature of the target TFS does not systematically affect speech recognition in noise, especially when energetic masking and/or the number of target bands is limited.  相似文献   

9.
Previous work has indicated that target-masker similarity, as well as stimulus uncertainty, influences the amount of informational masking that occurs in detection, discrimination, and recognition tasks. In each of five experiments reported in this paper, the detection threshold for a tonal target in random multitone maskers presented simultaneously with the target tone was measured for two conditions using the same set of five listeners. In one condition, the target was constructed to be "similar" (S) to the masker; in the other condition, it was constructed to be "dissimilar" (D) to the masker. The specific masker varied across experiments, but was constant for the two conditions. Target-masker similarity varied in dimensions such as duration, perceived location, direction of frequency glide, and spectro-temporal coherence. Group-mean results show large decreases in the amount of masking for the D condition relative to the S condition. In addition, individual differences (a hallmark of informational masking) are found to be much greater in the S condition than in the D condition. Furthermore, listener vulnerability to informational masking is found to be consistent to at least a moderate degree across experiments.  相似文献   

10.
Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.  相似文献   

11.
Normal-hearing (NH) listeners maintain robust speech understanding in modulated noise by "glimpsing" portions of speech from a partially masked waveform--a phenomenon known as masking release (MR). Cochlear implant (CI) users, however, generally lack such resiliency. In previous studies, temporal masking of speech by noise occurred randomly, obscuring to what degree MR is attributable to the temporal overlap of speech and masker. In the present study, masker conditions were constructed to either promote (+MR) or suppress (-MR) masking release by controlling the degree of temporal overlap. Sentence recognition was measured in 14 CI subjects and 22 young-adult NH subjects. Normal-hearing subjects showed large amounts of masking release in the +MR condition and a marked difference between +MR and -MR conditions. In contrast, CI subjects demonstrated less effect of MR overall, and some displayed modulation interference as reflected by poorer performance in modulated maskers. These results suggest that the poor performance of typical CI users in noise might be accounted for by factors that extend beyond peripheral masking, such as reduced segmental boundaries between syllables or words. Encouragingly, the best CI users tested here could take advantage of masker fluctuations to better segregate the speech from the background.  相似文献   

12.
Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages.  相似文献   

13.
Spatial unmasking describes the improvement in the detection or identification of a target sound afforded by separating it spatially from simultaneous masking sounds. This effect has been studied extensively for speech intelligibility in the presence of interfering sounds. In the current study, listeners identified zebra finch song, which shares many acoustic properties with speech but lacks semantic and linguistic content. Three maskers with the same long-term spectral content but different short-term statistics were used: (1) chorus (combinations of unfamiliar zebra finch songs), (2) song-shaped noise (broadband noise with the average spectrum of chorus), and (3) chorus-modulated noise (song-shaped noise multiplied by the broadband envelope from a chorus masker). The amount of masking and spatial unmasking depended on the masker and there was evidence of release from both energetic and informational masking. Spatial unmasking was greatest for the statistically similar chorus masker. For the two noise maskers, there was less spatial unmasking and it was wholly accounted for by the relative target and masker levels at the acoustically better ear. The results share many features with analogous results using speech targets, suggesting that spatial separation aids in the segregation of complex natural sounds through mechanisms that are not specific to speech.  相似文献   

14.
This study examined combinations of energetic and informational maskers in speech identification. Speech targets and maskers (speech or noise) were processed and filtered into sets of 15 narrow frequency bands. The target was the sum of eight randomly selected bands. More masking occurred for speech maskers than for spectrally matched noise maskers regardless of whether the masker bands overlapped the target bands. The greater effect of the speech maskers was interpreted as due to informational masking. When the masker was comprised of nonoverlapping bands of speech, the addition of bands of noise overlapping the speech masker, but not the speech target, reduced the overall amount of masking. Surprisingly, presenting the noise to the ear contralateral to the target and masker produced an even greater release from masking. The contralateral noise was apparently sufficient to cause a slight change in the image of the ipsilateral speech masker, possibly pulling it away from the target enough to allow the focus of attention on the target. This finding is consistent with the interpretation that in some conditions small binaural differences may be sufficient to cause, or significantly strengthen, the perceptual segregation of sounds.  相似文献   

15.
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.  相似文献   

16.
When a masking sound is spatially separated from a target speech signal, substantial releases from masking typically occur both for speech and noise maskers. However, when a delayed copy of the masker is also presented at the location of the target speech (a condition that has been referred to as the front target, right-front masker or F-RF configuration), the advantages of spatial separation vanish for noise maskers but remain substantial for speech maskers. This effect has been attributed to precedence, which introduces an apparent spatial separation between the target and masker in the F-RF configuration that helps the listener to segregate the target from a masking voice but not from a masking noise. In this study, virtual synthesis techniques were used to examine variations of the F-RF configuration in an attempt to more fully understand the stimulus parameters that influence the release from masking obtained in that condition. The results show that the release from speech-on-speech masking caused by the addition of the delayed copy of the masker is robust across a wide variety of source locations, masker locations, and masker delay values. This suggests that the speech unmasking that occurs in the F-RF configuration is not dependent on any single perceptual cue and may indicate that F-RF speech segregation is only partially based on the apparent left-right location of the RF masker.  相似文献   

17.
Similarity between the target and masking voices is known to have a strong influence on performance in monaural and binaural selective attention tasks, but little is known about the role it might play in dichotic listening tasks with a target signal and one masking voice in the one ear and a second independent masking voice in the opposite ear. This experiment examined performance in a dichotic listening task with a target talker in one ear and same-talker, same-sex, or different-sex maskers in both the target and the unattended ears. The results indicate that listeners were most susceptible to across-ear interference with a different-sex within-ear masker and least susceptible with a same-talker within-ear masker, suggesting that the amount of across-ear interference cannot be predicted from the difficulty of selectively attending to the within-ear masking voice. The results also show that the amount of across-ear interference consistently increases when the across-ear masking voice is more similar to the target speech than the within-ear masking voice is, but that no corresponding decline in across-ear interference occurs when the across-ear voice is less similar to the target than the within-ear voice. These results are consistent with an "integrated strategy" model of speech perception where the listener chooses a segregation strategy based on the characteristics of the masker present in the target ear and the amount of across-ear interference is determined by the extent to which this strategy can also effectively be used to suppress the masker in the unattended ear.  相似文献   

18.
Increases in masker variability have been shown to increase the effects of informational masking in non-speech listening tasks, but relatively little is known about the influence that masker uncertainty has on the informational components of speech-on-speech masking. In this experiment, listeners were asked to extract information from a target phrase that was presented in their right ear while ignoring masking phrases that were presented in the same ear as the target phrase and in the ear opposite the target phrase. The level of masker uncertainty was varied by holding constant or "freezing" the talkers speaking the masking phrases, the semantic content used in the masking phrases, or both the talkers and the semantic content in the masking phrases within each block of 120 trials. The results showed that freezing the semantic content of the masking phrase in the target ear was the only reduction in masker uncertainty that ever resulted in a significant improvement in performance. Providing feedback after each trial improved performance overall, but did not prevent the listeners from making incorrect responses that matched the content of the frozen target-ear masking phrase. However, removing the target-ear contents corresponding to the masking phrase from the response set resulted in a dramatic improvement in performance. This suggests that the listeners were generally able to understand both of the phrases presented to the target ear, and that their incorrect responses in the task were almost entirely a result of their inability to determine which words were spoken by the target talker.  相似文献   

19.
When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker sounds. This study attempted to isolate the effects that energetic masking, defined as the loss of detectable target information due to the spectral overlap of the target and masking signals, has on multitalker speech perception. This was achieved through the use of ideal time-frequency binary masks that retained those spectro-temporal regions of the acoustic mixture that were dominated by the target speech but eliminated those regions that were dominated by the interfering speech. The results suggest that energetic masking plays a relatively small role in the overall masking that occurs when speech is masked by interfering speech but a much more significant role when speech is masked by interfering noise.  相似文献   

20.
When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号