首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Listeners are relatively good at estimating the true content of each physical source in a sound mixture in most everyday situations. However, if there is a spectrotemporal element that logically could belong to more than one object, the correct way to group that element can be ambiguous. Many psychoacoustic experiments have implicitly assumed that when a sound mixture contains ambiguous sound elements, the ambiguous elements "trade" between competing sources, such that the elements contribute more to one object in conditions when they contribute less to others. However, few studies have directly tested whether such trading occurs. While some studies found trading, trading failed in some recent studies in which spatial cues were manipulated to alter the perceptual organization. The current study extended this work by exploring whether trading occurs for similar sound mixtures when frequency content, rather than spatial cues, was manipulated to alter grouping. Unlike when spatial cues were manipulated, results are roughly consistent with trading. Together, results suggest that the degree to which trading is obeyed depends on how stimuli are manipulated to affect object formation.  相似文献   

2.
A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions.  相似文献   

3.
The effect of spatial separation on the ability of human listeners to resolve a pair of concurrent broadband sounds was examined. Stimuli were presented in a virtual auditory environment using individualized outer ear filter functions. Subjects were presented with two simultaneous noise bursts that were either spatially coincident or separated (horizontally or vertically), and responded as to whether they perceived one or two source locations. Testing was carried out at five reference locations on the audiovisual horizon (0 degrees, 22.5 degrees, 45 degrees, 67.5 degrees, and 90 degrees azimuth). Results from experiment 1 showed that at more lateral locations, a larger horizontal separation was required for the perception of two sounds. The reverse was true for vertical separation. Furthermore, it was observed that subjects were unable to separate stimulus pairs if they delivered the same interaural differences in time (ITD) and level (ILD). These findings suggested that the auditory system exploited differences in one or both of the binaural cues to resolve the sources, and could not use monaural spectral cues effectively for the task. In experiments 2 and 3, separation of concurrent noise sources was examined upon removal of low-frequency content (and ITDs), onset/offset ITDs, both of these in conjunction, and all ITD information. While onset and offset ITDs did not appear to play a major role, differences in ongoing ITDs were robust cues for separation under these conditions, including those in the envelopes of high-frequency channels.  相似文献   

4.
Whenever an acoustic scene contains a mixture of sources, listeners must segregate the mixture in order to compute source content and/or location. Some past studies have explored whether perceived location depends on which sound elements are perceived within a source. However, no direct comparisons have been made of "what" and "where" judgments for the same sound mixtures using the same listeners. The current study tested if the sound elements making up an auditory object predict that object's perceived location. Listeners were presented with an auditory scene containing competing "target" and "captor" sources, each of which could logically contain a "promiscuous" tone complex. In separate blocks, the same listeners matched the perceived spectro-temporal content ("what") and location ("where") of the target. Generally, as the captor intensity decreased, the promiscuous complex contributed more to both what and where judgments of the target. However judgments did not agree either quantitatively or qualitatively. For some subjects, the promiscuous complex consistently contributed more to the spectro-temporal content of the target than to its location while for some it consistently contributed more to target location. These results show a dissociation between the perceived spectro-temporal content of an auditory object and where that object is perceived.  相似文献   

5.
Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source.  相似文献   

6.
This study examined spatial release from masking (SRM) when a target talker was masked by competing talkers or by other types of sounds. The focus was on the role of interaural time differences (ITDs) and time-varying interaural level differences (ILDs) under conditions varying in the strength of informational masking (IM). In the first experiment, a target talker was masked by two other talkers that were either colocated with the target or were symmetrically spatially separated from the target with the stimuli presented through loudspeakers. The sounds were filtered into different frequency regions to restrict the available interaural cues. The largest SRM occurred for the broadband condition followed by a low-pass condition. However, even the highest frequency bandpass-filtered condition (3-6 kHz) yielded a significant SRM. In the second experiment the stimuli were presented via earphones. The listeners identified the speech of a target talker masked by one or two other talkers or noises when the maskers were colocated with the target or were perceptually separated by ITDs. The results revealed a complex pattern of masking in which the factors affecting performance in colocated and spatially separated conditions are to a large degree independent.  相似文献   

7.
Although many researchers have shown that listeners are able to selectively attend to a target speech signal when a masking talker is present in the same ear as the target speech or when a masking talker is present in a different ear than the target speech, little is known about selective auditory attention in tasks with a target talker in one ear and independent masking talkers in both ears at the same time. In this series of experiments, listeners were asked to respond to a target speech signal spoken by one of two competing talkers in their right (target) ear while ignoring a simultaneous masking sound in their left (unattended) ear. When the masking sound in the unattended ear was noise, listeners were able to segregate the competing talkers in the target ear nearly as well as they could with no sound in the unattended ear. When the masking sound in the unattended ear was speech, however, speech segregation in the target ear was substantially worse than with no sound in the unattended ear. When the masking sound in the unattended ear was time-reversed speech, speech segregation was degraded only when the target speech was presented at a lower level than the masking speech in the target ear. These results show that within-ear and across-ear speech segregation are closely related processes that cannot be performed simultaneously when the interfering sound in the unattended ear is qualitatively similar to speech.  相似文献   

8.
Extraction of a target sound source amidst multiple interfering sound sources is difficult when there are fewer sensors than sources, as is the case for human listeners in the classic cocktail-party situation. This study compares the signal extraction performance of five algorithms using recordings of speech sources made with three different two-microphone arrays in three rooms of varying reverberation time. Test signals, consisting of two to five speech sources, were constructed for each room and array. The signals were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio of the output. A frequency-domain minimum-variance distortionless-response beamformer outperformed the time-domain based Frost beamformer and generalized sidelobe canceler for all tests with two or more interfering sound sources, and performed comparably or better than the time-domain algorithms for tests with one interfering sound source. The frequency-domain minimum-variance algorithm offered performance comparable to that of the Peissig-Kollmeier binaural frequency-domain algorithm, but with much less distortion of the target signal. Comparisons were also made to a simple beamformer. In addition, computer simulations illustrate that, when processing speech signals, the chosen implementation of the frequency-domain minimum-variance technique adapts more quickly and accurately than time-domain techniques.  相似文献   

9.
Perturbation analysis was used to determine the relative contribution of target enhancement and noise cancellation in the identification of rudimentary sound source in noise. In a two-interval, forced-choice procedure, listeners identified the impact sound produced by the larger of two stretched membranes as target. The noise on each presentation was the impact sound of a variable-sized plate. For four of five listeners, the relative weights on the noise were positive indicating enhancement, and for the remaining listeners, they were negative indicating cancellation. The results underscore the difficulty with evaluating models of masking solely in terms of measures of performance accuracy.  相似文献   

10.
Users of a cochlear implant together with a contralateral hearing aid-so-called bimodal listeners-have difficulties with localizing sound sources. This is mainly due to the distortion of interaural time and level difference cues (ITD and ILD), and limited ITD sensitivity. An algorithm is presented that enhances ILD cues. Horizontal plane sound-source localization performance of six bimodal listeners was evaluated in (1) a real sound field with their clinical devices, (2) in a virtual sound field, under direct computer control, and (3) in a virtual sound field with ILD enhancement. The results in the real sound field did not differ significantly from the results in the virtual field, and ILD enhancement improved localization performance by 4°-10° absolute error, relative to a mean absolute error of 28° in the condition without ILD enhancement.  相似文献   

11.
Pitch, timbre, and/or timing cues may be used to stream and segregate competing musical melodies and instruments. In this study, melodic contour identification was measured in cochlear implant (CI) and normal-hearing (NH) listeners, with and without a competing masker; timing, pitch, and timbre cues were varied between the masker and target contour. NH performance was near-perfect across different conditions. CI performance was significantly poorer than that of NH listeners. While some CI subjects were able to use or combine timing, pitch and/or timbre cues, most were not, reflecting poor segregation due to poor spectral resolution.  相似文献   

12.
A masker can reduce target intelligibility both by interfering with the target's peripheral representation ("energetic masking") and/or by causing more central interference ("informational masking"). Intelligibility generally improves with increasing spatial separation between two sources, an effect known as spatial release from masking (SRM). Here, SRM was measured using two concurrent sine-vocoded talkers. Target and masker were each composed of eight different narrowbands of speech (with little spectral overlap). The broadband target-to-masker energy ratio (TMR) was varied, and response errors were used to assess the relative importance of energetic and informational masking. Performance improved with increasing TMR. SRM occurred at all TMRs; however, the pattern of errors suggests that spatial separation affected performance differently, depending on the dominant type of masking. Detailed error analysis suggests that informational masking occurred due to failures in either across-time linkage of target segments (streaming) or top-down selection of the target. Specifically, differences in the spatial cues in target and masker improved streaming and target selection. In contrast, level differences helped listeners select the target, but had little influence on streaming. These results demonstrate that at least two mechanisms (differentially affected by spatial and level cues) influence informational masking.  相似文献   

13.
The ability of listeners to detect a temporal gap in a 1600-Hz-wide noiseband (target) was studied as a function of the absence and presence of concurrent stimulation by a second 1600-Hz-wide noiseband (distractor) with a nonoverlapping spectrum. Gap detection thresholds for single noisebands centered on 1.0, 2.0, 4.0, and 5.0 kHz were in the range from 4 to 6 ms, and were comparable to those described in previous studies. Gap thresholds for the same target noisebands were only modestly improved by the presence of a synchronously gated gap in a second frequency band. Gap thresholds were unaffected by the presence of a continuous distractor that was either proximate or remote from the target frequency band. Gap thresholds for the target noiseband were elevated if the distractor noiseband also contained a gap which "roved" in time in temporal proximity to the target gap. This effect was most marked in inexperienced listeners. Between-channel gap thresholds, obtained using leading and trailing markers that differed in frequency, were high in all listeners, again consistent with previous findings. The data are discussed in terms of the levels of the auditory perceptual processing stream at which the listener can voluntarily access auditory events in distinct frequency channels.  相似文献   

14.
The auditory system continuously parses the acoustic environment into auditory objects, usually representing separate sound sources. Sound sources typically show characteristic emission patterns. These regular temporal sound patterns are possible cues for distinguishing sound sources. The present study was designed to test whether regular patterns are used as cues for source distinction and to specify the role that detecting these regularities may play in the process of auditory stream segregation. Participants were presented with tone sequences, and they were asked to continuously indicate whether they perceived the tones in terms of a single coherent sequence of sounds (integrated) or as two concurrent sound streams (segregated). Unknown to the participant, in some stimulus conditions, regular patterns were present in one or both putative streams. In all stimulus conditions, participants' perception switched back and forth between the two sound organizations. Importantly, regular patterns occurring in either one or both streams prolonged the mean duration of two-stream percepts, whereas the duration of one-stream percepts was unaffected. These results suggest that temporal regularities are utilized in auditory scene analysis. It appears that the role of this cue lies in stabilizing streams once they have been formed on the basis of simpler acoustic cues.  相似文献   

15.
If spatial attention acts like a "spotlight," focusing on one location and excluding others, it may be advantageous to have all targets of interest within the same spatial region. This hypothesis was explored using a task where listeners reported keywords from two simultaneous talkers. In Experiment 1, the two talkers were placed symmetrically about the frontal midline with various angular separations. While there was a small performance improvement for moderate separations, the improvement decreased for larger separations. However, the dependency of the relative talker intensities on spatial configuration accounted for these effects. Experiment 2 tested whether spatial separation improved the intelligibility of each source, an effect that could counteract any degradation in performance as sources fell outside the spatial spotlight of attention. In this experiment, intelligibility of individual sources was equalized across configurations by adding masking noise. Under these conditions, the cost of divided listening (the drop in performance when reporting both messages compared to reporting just one) was smaller when the spatial separation was small. These results suggest that spatial separation enhances the intelligibility of individual sources in a competing pair but increases the cost associated with having to process both sources simultaneously, consistent with the attentional spotlight hypothesis.  相似文献   

16.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

17.
Sound source segregation refers to the ability to hear as separate entities two or more sound sources comprising a mixture. Masking refers to the ability of one sound to make another sound difficult to hear. Often in studies, masking is assumed to result from a failure of segregation, but this assumption may not always be correct. Here a method is offered to identify the relation between masking and sound source segregation in studies and an example is given of its application.  相似文献   

18.
夏秀渝  何培宇 《声学学报》2013,38(2):224-230
针对欠定卷积混合的语音信号模型,提出一种基于声源方位信息和非线性时频掩蔽的语音盲提取算法。首先对低频段混合语音信号进行时频分析估计瞬时相对时延(ITD)并采用势函数聚类分析方法估计出声源个数及其ITD,接着锁定目标提取准确的目标语音方位信息,最后利用独立语音在时频域上的近似W一分离正交性,采用非线性时频掩蔽的方法提取目标语音。仿真实验表明,该方法能锁定任意感兴趣目标方位,能有效提取目标语音,文中实验条件下信噪比增益平均达9.5 dB。   相似文献   

19.
The acoustical cues for sound location are generated by spatial- and frequency-dependent filtering of propagating sound waves by the head and external ears. Although rats have been a common model system for anatomy, physiology, and psychophysics of localization, there have been few studies of the acoustical cues available to rats. Here, directional transfer functions (DTFs), the directional components of the head-related transfer functions, were measured in six adult rats. The cues to location were computed from the DTFs. In the frontal hemisphere, spectral notches were present for frequencies from approximately 16 to 30 kHz; in general, the frequency corresponding to the notch increased with increases in source elevation and in azimuth toward the ipsilateral ear. The maximum high-frequency envelope-based interaural time differences (ITDs) were 130 mus, whereas low-frequency (<3.5 kHz) fine-structure ITDs were 160 mus; both types of ITDs were larger than predicted from spherical head models. Interaural level differences (ILDs) strongly depended on location and frequency. Maximum ILDs were <10 dB for frequencies <8 kHz and were as large as 20-40 dB for frequencies >20 kHz. Removal of the pinna eliminated the spectral notches, reduced the acoustic gain and ILDs, altered the acoustical axis, and reduced the ITDs.  相似文献   

20.
Two experiments explored how frequency content impacts sound localization for sounds containing reverberant energy. Virtual sound sources from thirteen lateral angles and four distances were simulated in the frontal horizontal plane using binaural room impulse responses measured in an everyday office. Experiment 1 compared localization judgments for one-octave-wide noise centered at either 750 Hz (low) or 6000 Hz (high). For both band-limited noises, perceived lateral angle varied monotonically with source angle. For frontal sources, perceived locations were similar for low- and high-frequency noise; however, for lateral sources, localization was less accurate for low-frequency noise than for high-frequency noise. With increasing source distance, judgments of both noises became more biased toward the median plane, an effect that was greater for low-frequency noise than for high-frequency noise. In Experiment 2, simultaneous presentation of low- and high-frequency noises yielded performance that was less accurate than that for high-frequency noise, but equal to or better than for low-frequency noise. Results suggest that listeners perceptually weight low-frequency information heavily, even in reverberant conditions where high-frequency stimuli are localized more accurately. These findings show that listeners do not always optimally adjust how localization cues are integrated over frequency in reverberant settings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号