首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Detection thresholds were measured for different spatial configurations of 500- and 1000-Hz pure-tone targets and broadband maskers. Sources were simulated using individually measured head-related transfer functions (HRTFs) for source positions varying in both azimuth and distance. For the spatial configurations tested, thresholds ranged over 50 dB, primarily as a result of large changes in the target-to-masker ratio (TMR) with changes in target and masker locations. Intersubject differences in both HRTFs and in binaural sensitivity were large; however, the overall pattern of results was similar across subjects. As expected, detection thresholds were generally smaller when the target and masker were separated in azimuth than when they were at the same location. However, in some cases, azimuthal separation of target and masker yielded little change or even a small increase in detection threshold. Significant intersubject differences occurred as a result both of differences in monaural and binaural acoustic cues in the individualized HRTFs and of different binaural contributions to performance. Model predictions captured general trends in the pattern of spatial unmasking. However, subject-specific model predictions did not account for the observed individual differences in performance, even after taking into account individual differences in HRTF measurements and overall binaural sensitivity. These results suggest that individuals differ not only in their overall sensitivity to binaural cues, but also in how their binaural sensitivity varies with the spatial position of (and interaural differences in) the masker.  相似文献   

2.
Spatial unmasking describes the improvement in the detection or identification of a target sound afforded by separating it spatially from simultaneous masking sounds. This effect has been studied extensively for speech intelligibility in the presence of interfering sounds. In the current study, listeners identified zebra finch song, which shares many acoustic properties with speech but lacks semantic and linguistic content. Three maskers with the same long-term spectral content but different short-term statistics were used: (1) chorus (combinations of unfamiliar zebra finch songs), (2) song-shaped noise (broadband noise with the average spectrum of chorus), and (3) chorus-modulated noise (song-shaped noise multiplied by the broadband envelope from a chorus masker). The amount of masking and spatial unmasking depended on the masker and there was evidence of release from both energetic and informational masking. Spatial unmasking was greatest for the statistically similar chorus masker. For the two noise maskers, there was less spatial unmasking and it was wholly accounted for by the relative target and masker levels at the acoustically better ear. The results share many features with analogous results using speech targets, suggesting that spatial separation aids in the segregation of complex natural sounds through mechanisms that are not specific to speech.  相似文献   

3.
To examine whether auditory streaming contributes to unmasking, intelligibility of target sentences against two competing talkers was measured using the coordinate response measure (CRM) [Bolia et al., J. Acoust. Soc. Am. 107, 1065-1066 (2007)] corpus. In the control condition, the speech reception threshold (50% correct) was measured when the target and two maskers were collocated straight ahead. Separating maskers from the target by +/-30 degrees resulted in spatial release from masking of 12 dB. CRM sentences involve an identifier in the first part and two target words in the second part. In experimental conditions, masking talkers started spatially separated at +/-30 degrees but became collocated with the target before the scoring words. In one experiment, one target and two different maskers were randomly selected from a mixed-sex corpus. Significant unmasking of 4 dB remained despite the absence of persistent location cues. When same-sex talkers were used as maskers and target, unmasking was reduced. These data suggest that initial separation may permit confident identification and streaming of the target and masker speech where significant differences between target and masker voice characteristics exist, but where target and masker characteristics are similar, listeners must rely more heavily on continuing spatial cues.  相似文献   

4.
Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source.  相似文献   

5.
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.  相似文献   

6.
Across-frequency processing by common interaural time delay (ITD) in spatial unmasking was investigated by measuring speech reception thresholds (SRTs) for high- and low-frequency bands of target speech presented against concurrent speech or a noise masker. Experiment 1 indicated that presenting one of these target bands with an ITD of +500 micros and the other with zero ITD (like the masker) provided some release from masking, but full binaural advantage was only measured when both target bands were given an ITD of + 500 micros. Experiment 2 showed that full binaural advantage could also be achieved when the high- and low-frequency bands were presented with ITDs of equal but opposite magnitude (+/- 500 micros). In experiment 3, the masker was also split into high- and low-frequency bands with ITDs of equal but opposite magnitude (+/-500 micros). The ITD of the low-frequency target band matched that of the high-frequency masking band and vice versa. SRTs indicated that, as long as the target and masker differed in ITD within each frequency band, full binaural advantage could be achieved. These results suggest that the mechanism underlying spatial unmasking exploits differences in ITD independently within each frequency channel.  相似文献   

7.
Two experiments compared the effect of supplying visual speech information (e.g., lipreading cues) on the ability to hear one female talker's voice in the presence of steady-state noise or a masking complex consisting of two other female voices. In the first experiment intelligibility of sentences was measured in the presence of the two types of maskers with and without perceived spatial separation of target and masker. The second study tested detection of sentences in the same experimental conditions. Results showed that visual cues provided more benefit for both recognition and detection of speech when the masker consisted of other voices (versus steady-state noise). Moreover, visual cues provided greater benefit when the target speech and masker were spatially coincident versus when they appeared to arise from different spatial locations. The data obtained here are consistent with the hypothesis that lipreading cues help to segregate a target voice from competing voices, in addition to the established benefit of supplementing masked phonetic information.  相似文献   

8.
In many natural settings, spatial release from masking aids speech intelligibility, especially when there are competing talkers. This paper describes a series of three experiments that investigate the role of prior knowledge of masker location on phoneme identification and spatial release from masking. In contrast to previous work, these experiments use initial stop-consonant identification as a test of target intelligibility to ensure that listeners had little time to switch the focus of spatial attention during the task. The first experiment shows that target phoneme identification was worse when a masker played from an unexpected location (increasing the consonant identification threshold by 2.6 dB) compared to when an energetically very similar and symmetrically located masker came from an expected location. In the second and third experiments, target phoneme identification was worse (increasing target threshold levels by 2.0 and 2.6 dB, respectively) when the target was played unexpectedly on the side from which the masker was expected compared to when the target came from an unexpected, symmetrical location in the hemifield opposite the expected location of the masker. These results support the idea that listeners modulate spatial attention by both focusing resources on the expected target location and withdrawing attentional resources from expected locations of interfering sources.  相似文献   

9.
借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。   相似文献   

10.
In order to investigate the influence of dummy head on measuring speech intelligibility,the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane.The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed.It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario.The standard STIPA is closer to the lower value of the two binaural STIPA values.The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source.Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply.It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA,but correlated highly with STIPA measured with a dummy head.The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear,and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values.  相似文献   

11.
When a masking sound is spatially separated from a target speech signal, substantial releases from masking typically occur both for speech and noise maskers. However, when a delayed copy of the masker is also presented at the location of the target speech (a condition that has been referred to as the front target, right-front masker or F-RF configuration), the advantages of spatial separation vanish for noise maskers but remain substantial for speech maskers. This effect has been attributed to precedence, which introduces an apparent spatial separation between the target and masker in the F-RF configuration that helps the listener to segregate the target from a masking voice but not from a masking noise. In this study, virtual synthesis techniques were used to examine variations of the F-RF configuration in an attempt to more fully understand the stimulus parameters that influence the release from masking obtained in that condition. The results show that the release from speech-on-speech masking caused by the addition of the delayed copy of the masker is robust across a wide variety of source locations, masker locations, and masker delay values. This suggests that the speech unmasking that occurs in the F-RF configuration is not dependent on any single perceptual cue and may indicate that F-RF speech segregation is only partially based on the apparent left-right location of the RF masker.  相似文献   

12.
A masker can reduce target intelligibility both by interfering with the target's peripheral representation ("energetic masking") and/or by causing more central interference ("informational masking"). Intelligibility generally improves with increasing spatial separation between two sources, an effect known as spatial release from masking (SRM). Here, SRM was measured using two concurrent sine-vocoded talkers. Target and masker were each composed of eight different narrowbands of speech (with little spectral overlap). The broadband target-to-masker energy ratio (TMR) was varied, and response errors were used to assess the relative importance of energetic and informational masking. Performance improved with increasing TMR. SRM occurred at all TMRs; however, the pattern of errors suggests that spatial separation affected performance differently, depending on the dominant type of masking. Detailed error analysis suggests that informational masking occurred due to failures in either across-time linkage of target segments (streaming) or top-down selection of the target. Specifically, differences in the spatial cues in target and masker improved streaming and target selection. In contrast, level differences helped listeners select the target, but had little influence on streaming. These results demonstrate that at least two mechanisms (differentially affected by spatial and level cues) influence informational masking.  相似文献   

13.
When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.  相似文献   

14.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

15.
Speech reception thresholds were measured to investigate the influence of a room on speech segregation between a spatially separated target and interferer. The listening tests were realized under headphones. A room simulation allowed selected positioning of the interferer and target, as well as varying the absorption coefficient of the room internal surfaces. The measurements involved target sentences and speech-shaped noise or 2-voice interferers. Four experiments revealed that speech segregation in rooms was not only dependent on the azimuth separation of sound sources, but also on their direct-to-reverberant energy ratio at the listening position. This parameter was varied for interferer and target independently. Speech intelligibility decreased as the direct-to-reverberant ratio of sources was degraded by sound reflections in the room. The influence of the direct-to-reverberant ratio of the interferer was in agreement with binaural unmasking theories, through its effect on interaural coherence. The effect on the target occurred at higher levels of reverberation and was explained by the intrinsic degradation of speech intelligibility in reverberation.  相似文献   

16.
The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak.  相似文献   

17.
This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.  相似文献   

18.
Three experiments investigated the roles of interaural time differences (ITDs) and level differences (ILDs) in spatial unmasking in multi-source environments. In experiment 1, speech reception thresholds (SRTs) were measured in virtual-acoustic simulations of an anechoic environment with three interfering sound sources of either speech or noise. The target source lay directly ahead, while three interfering sources were (1) all at the target's location (0 degrees,0 degrees,0 degrees), (2) at locations distributed across both hemifields (-30 degrees,60 degrees,90 degrees), (3) at locations in the same hemifield (30 degrees,60 degrees,90 degrees), or (4) co-located in one hemifield (90 degrees,90 degrees,90 degrees). Sounds were convolved with head-related impulse responses (HRIRs) that were manipulated to remove individual binaural cues. Three conditions used HRIRs with (1) both ILDs and ITDs, (2) only ILDs, and (3) only ITDs. The ITD-only condition produced the same pattern of results across spatial configurations as the combined cues, but with smaller differences between spatial configurations. The ILD-only condition yielded similar SRTs for the (-30 degrees,60 degrees,90 degrees) and (0 degrees,0 degrees,0 degrees) configurations, as expected for best-ear listening. In experiment 2, pure-tone BMLDs were measured at third-octave frequencies against the ITD-only, speech-shaped noise interferers of experiment 1. These BMLDs were 4-8 dB at low frequencies for all spatial configurations. In experiment 3, SRTs were measured for speech in diotic, speech-shaped noise. Noises were filtered to reduce the spectrum level at each frequency according to the BMLDs measured in experiment 2. SRTs were as low or lower than those of the corresponding ITD-only conditions from experiment 1. Thus, an explanation of speech understanding in complex listening environments based on the combination of best-ear listening and binaural unmasking (without involving sound-localization) cannot be excluded.  相似文献   

19.
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 to 5 dB. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multitalker environments.  相似文献   

20.
Several studies have described a release from speech-on-speech masking associated with separation of target and masker sources in the median sagittal plane. Some have excluded the possibility that small differences between target and masker interaural time disparities can fully account for this release. This study explored the mechanisms underlying the spatial release from speech-on-speech masking that can be obtained in the absence of such differences. In one condition, interaural time disparities were removed from the nominal median-sagittal-plane, head-related impulse responses used to generate the virtual auditory space within which competing sentences were presented. In other conditions, interaural level and spectral disparities also were manipulated by presenting competing sentences monaurally or diotically after convolution with one ear's head-related impulse responses. It was found that substantial spatial release from masking can be obtained in the absence of any interaural disparities and that such disparities probably make a relatively minor contribution to spatial release from speech-on-speech masking in the median sagittal plane. It is argued that this release from masking is driven primarily by a reduction in informational masking that occurs when monaural information at one, or both, of the listener's ears facilitates differentiation of competing sentences that emanate from spatially separated sources.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号