首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 744 毫秒
1.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

2.
Auditory filter bandwidths and time constants were obtained with five normal-hearing subjects for different masker configurations both in the frequency and time domain for monaural and binaural listening conditions. Specifically, the masking level in the monaural condition and the interaural correlation in the binaural conditions, respectively, was changed in a sinusoidal, stepwise, and rectangular way in the frequency domain. In the corresponding experiments in the time domain, a sinusoidal and stepwise change of the masker was performed. From these results, a comparison was made across conditions to evaluate the influence of the factors "shape of transition," "monaural versus binaural," "frequency domain versus time domain," and "subject." Also, the respective data from the literature were considered using the same model assumptions and fitting strategy as used for the current data. The results indicate that the monaural auditory filter bandwidths and time constants fitted to the data are consistent across conditions both for the data included in this study and the data from the literature. No consistent relation between individual auditory filter bandwidths and time constants were found across subjects. For the binaural conditions, however, considerable differences were found in estimates of the bandwidths and time constants, respectively, across conditions. The reason for this mismatch seems to be the different detection strategies employed for the various tasks that are affected by the consistency of binaural information across frequency and time. While monaural detection performance appears to be modeled quite well with a linear filter or temporal integration window, this does not hold for the binaural conditions where both larger bandwidth and time constant estimates are found.  相似文献   

3.
The effect of onset interaural time differences (ITDs) on lateralization and detection was investigated for broadband pulse trains 250 ms long with a binaural fundamental frequency of 250 Hz. Within each train, ITDs of successive binaural pulse pairs alternated between two of three values (0 micros, 500 micros left-leading, and 500 micros right-leading) or were invariant. For the alternating conditions, the experimental manipulation was the choice of which of two ITDs was presented first (i.e., at stimulus onset). Lateralization, which was estimated using a broadband noise pointer with a listener adjustable interaural delay, was determined largely by the onset ITD. However, detection thresholds for the signals in left-leading or diotic continuous broadband noise were not affected by where the signals were lateralized. A quantitative analysis suggested that binaural masked thresholds for the pulse trains were well accounted for by the level and phase of harmonic components at 500 and 750 Hz. Detection thresholds obtained for brief stimuli (two binaural pulse or noise burst pairs) were also independent of which of two ITDs was presented first. The control of lateralization by onset cues appears to be based on mechanisms not essential for binaural detection.  相似文献   

4.
A computational model of auditory localization resulting in performance similar to humans is reported. The model incorporates both the monaural and binaural cues available to a human for sound localization. Essential elements used in the simulation of the processes of auditory cue generation and encoding by the nervous system include measured head-related transfer functions (HRTFs), minimum audible field (MAF), and the Patterson-Holdsworth cochlear model. A two-layer feed-forward back-propagation artificial neural network (ANN) was trained to transform the localization cues to a two-dimensional map that gives the direction of the sound source. The model results were compared with (i) the localization performance of the human listener who provided the HRTFs for the model and (ii) the localization performance of a group of 19 other human listeners. The localization accuracy and front-back confusion error rates exhibited by the model were similar to both the single listener and the group results. This suggests that the simulation of the cue generation and extraction processes as well as the model parameters were reasonable approximations to the overall biological processes. The amplitude resolution of the monaural spectral cues was varied and the influence on the model's performance was determined. The model with 128 cochlear channels required an amplitude resolution of approximately 20 discrete levels for encoding the spectral cue to deliver similar localization performance to the group of human listeners.  相似文献   

5.
Temporal processing in the aging auditory system.   总被引:2,自引:0,他引:2  
Measures of monaural temporal processing and binaural sensitivity were obtained from 12 young (mean age = 26.1 years) and 12 elderly (mean age = 70.9 years) adults with clinically normal hearing (pure-tone thresholds < or = 20 dB HL from 250 to 6000 Hz). Monaural temporal processing was measured by gap detection thresholds. Binaural sensitivity was measured by interaural time difference (ITD) thresholds. Gap and ITD thresholds were obtained at three sound levels (4, 8, or 16 dB above individual threshold). Subjects were also tested on two measures of speech perception, a masking level difference (MLD) task, and a syllable identification/discrimination task that included phonemes varying in voice onset time (VOT). Elderly listeners displayed poorer monaural temporal analysis (higher gap detection thresholds) and poorer binaural processing (higher ITD thresholds) at all sound levels. There were significant interactions between age and sound level, indicating that the age difference was larger at lower stimulus levels. Gap detection performance was found to correlate significantly with performance on the ITD task for young, but not elderly adult listeners. Elderly listeners also performed more poorly than younger listeners on both speech measures; however, there was no significant correlation between psychoacoustic and speech measures of temporal processing. Findings suggest that age-related factors other than peripheral hearing loss contribute to temporal processing deficits of elderly listeners.  相似文献   

6.
杨璐慧  杨蕊  张留军  庄桥 《声学学报》2023,48(2):406-414
为研究恒频蝙蝠耳朵与空间定位的关系,利用深度学习算法和仿蝙蝠静态双耳接收器,分析蝙蝠耳朵对恒频声源定向的影响。首先根据普氏蹄蝠耳朵模型设计不同双耳夹角和间距的仿生双耳接收器,并从多个空间方位采集声源发射的不同频率的恒频声呐信号,然后提取双耳同步采集信号的时频图并归一化作为输入特征,最后利用残差网络实现声源定向。实验结果表明,静态双耳接收器对恒频声源的定向误差平均值基本保持在3.5°以下,但高于动态单耳接收器的定向误差;定向精度与声源频率及声源所在空间方位有关,声源位于接收器水平方向±30°范围内时,定向精度相对较高;双耳夹角和间距也会影响定向精度,且前者影响较为显著。  相似文献   

7.
This article presents a quantitative binaural signal detection model which extends the monaural model described by Dau et al. [J. Acoust. Soc. Am. 99, 3615-3622 (1996)]. The model is divided into three stages. The first stage comprises peripheral preprocessing in the right and left monaural channels. The second stage is a binaural processor which produces a time-dependent internal representation of the binaurally presented stimuli. This stage is based on the Jeffress delay line extended with tapped attenuator lines. Through this extension, the internal representation codes both interaural time and intensity differences. In contrast to most present-day models, which are based on excitatory-excitatory interaction, the binaural interaction in the present model is based on contralateral inhibition of ipsilateral signals. The last stage, a central processor, extracts a decision variable that can be used to detect the presence of a signal in a detection task, but could also derive information about the position and the compactness of a sound source. In two accompanying articles, the model predictions are compared with data obtained with human observers in a great variety of experimental conditions.  相似文献   

8.
In the first experiment, subjects were asked to discriminate whether a sound was emanating from a moving or stationary source. The minimum audible movement angle (MAMA) thus defined was observed to increase as the source velocity increased. MAMA ranged from a low of 8.3 degrees with the slowest velocity employed (90 degrees/s) to a high of 21.2 degrees with the fastest velocity (360 degrees/s). In the second experiment, subjects were asked to localize where the moving source was, at signal on and offset. The results indicate that the apparent onset is displaced in the direction of motion and the amount of this displacement is directly related to source velocity. Less consistent results were observed with signal offset. The present results suggest that the binaural system is relatively insensitive to motion.  相似文献   

9.
Ambisonics is a series of spatial sound reproduction system based on spatial harmonics decomposition and each order approximation of sound field. Ambisonics signals are originally intended for loudspeakers reproduction. By using head-related transfer functions (HRTFs) filters, binaural Ambisonics converts the Ambisonics signals for static or dynamic headphone reproduction. In present work, the performances of static and dynamic binaural Ambisonics reproduction are evaluated and compared. The mean binaural pressure errors across target source directions are first analyzed. Then a virtual source localization experiment is conducted, and the localization performances are evaluated by analyzing the percentages of front-back and up-down confusion, the mean angle error and discreteness in the localization results. The results indicate that binaural Ambsonics reproduction with insufficiently high order (for example, 5-10 order) is unable to recreate correct high-frequency magnitude spectra in binaural pressures, resulting in degradation in localization for static reproduction. Because dynamic localization cue is included, dynamic binaural Ambisoncis reproduction yields obviously better localization performance than static reproduction with the same order. Even a 3-order dynamic binaural Ambisoncis reproduction exhibits appropriate localizations performance.  相似文献   

10.
Echolocating dolphins emit trains of clicks and receive echoes from ocean targets. They often emit each successive ranging click about 20 ms after arrival of the target echo. In echolocation, decisions must be made about the target--fish or fowl, predator or food. In the first test of dolphin auditory decision speed, three bottlenose dolphins (Tursiops truncatus) chose whistle or pulse burst responses to different auditory stimuli randomly presented without warning in rapid succession under computer control. The animals were trained to hold pressure catheters in the nasal cavity so that pressure increases required for sound production could be used to split response time (RT) into neural time and movement time. Mean RT in the youngest and fastest dolphin ranged from 175 to 213 ms when responding to tones and from 213 to 275 ms responding to pulse trains. The fastest neural times and movement times were around 60 ms. The results suggest that echolocating dolphins tune to a rhythm so that succeeding pulses in a train are produced about 20 ms over target round-trip travel time. The dolphin nervous system has evolved for rapid processing of acoustic stimuli to accommodate for the more rapid sound speed in water compared to air.  相似文献   

11.
12.
The ability of eight normal-hearing listeners and fourteen listeners with sensorineural hearing loss to detect and identify pitch contours was measured for binaural-pitch stimuli and salience-matched monaurally detectable pitches. In an effort to determine whether impaired binaural pitch perception was linked to a specific deficit, the auditory profiles of the individual listeners were characterized using measures of loudness perception, cognitive ability, binaural processing, temporal fine structure processing, and frequency selectivity, in addition to common audiometric measures. Two of the listeners were found not to perceive binaural pitch at all, despite a clear detection of monaural pitch. While both binaural and monaural pitches were detectable by all other listeners, identification scores were significantly lower for binaural than for monaural pitch. A total absence of binaural pitch sensation coexisted with a loss of a binaural signal-detection advantage in noise, without implying reduced cognitive function. Auditory filter bandwidths did not correlate with the difference in pitch identification scores between binaural and monaural pitches. However, subjects with impaired binaural pitch perception showed deficits in temporal fine structure processing. Whether the observed deficits stemmed from peripheral or central mechanisms could not be resolved here, but the present findings may be useful for hearing loss characterization.  相似文献   

13.
Two simple models are examined in order to explain the observation that a portion of the binaural-evoked response is less than the sum of monaural-evoked responses in human and animal subjects. The sum of monaural responses minus the binaural response is called the binaural difference (BD). Each model acts on binaural input signals and applies a single memoryless nonlinearity. One model (IE) applies a rectifying nonlinearity to the difference of input signals, while the other (EE) applies a compressive nonlinearity to the sum of input signals. These models are suggested by properties of inhibitory-excitatory (IE) and excitatory-excitatory (EE) neurons of the auditory brainstem. Parameters can be found that enable each model to produce a ratio of BD to summed monaural response which is invariant with input stimulus level. The IE model, but not the EE model, has a BD whose level is linearly related to input stimulus level.  相似文献   

14.
A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis.Ambisonics is a spatial sound system based on physical sound field reconstruction.The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics.The binaural loudness level spectra in Ambisonics reconstruction is calculated by using Moore's revised loudness model and then compared with the result of real sound source,so as to evaluate the timbre coloration in Ambisonics quantitatively.The results indicate that,in the case of ideal independent signals,the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics.On the other hand,in the case of recording by microphone array,once the high-frequency limit of microphone array exceeds that of sound field reconstruction,array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the highfrequency limit of reproduction.Based on the binaural auditory model analysis,a scheme for optimizing design of Ambisonics recording and reproduction is also suggested.The subjective experiment yields consistent results with those of binaural model,thus verifies the effectiveness of the model analysis.  相似文献   

15.
刘阳  谢菠荪 《声学学报》2015,40(5):717-729
提出用双耳听觉模型对空间声音色进行分析的普遍方法,并以Ambisonics为例进行了分析。Ambisonics是基于物理声场重构的空间声系统,其最终重构声场误差以及音色改变是由传声器捡拾和重放空间混叠误差共同引起的。采用修正的Moore双耳响度模型计算了Ambisonics重构声场的双耳响度级谱并和目标声场的情况比较,从而定量评价重构声场的音色改变。结果表明,在理想捡拾信号的情况下,无音色改变重放的上限频率和区域大小随Ambisonics的阶数而增加。而对于传声器阵列捡拾的情况,只要阵列的上限频率大于Ambisonics重放的上限频率,在重放的上限频率以下,传声器阵列空间混叠误差对最终重构声场及其感知音色的影响就可以忽略。在此基础上,提出了一种综合考虑捡拾与重放性能的Ambisonics系统优化设计方法。心理声学实验得到了和双耳听觉模型一致的结果,从而也验证了模型分析的有效性。   相似文献   

16.
Restarting the adapted binaural system   总被引:1,自引:0,他引:1  
Previous experiments using trains of high-frequency filtered clicks have shown that for lateralization based on interaural difference of time or level, there is a decline in the usefulness of interaural information after the signal's onset when the clicks are presented at a high rate. This process has been referred to as "binaural adaptation." Of interest here are the conditions that produce a recovery from adaptation and allow for a resampling of the interaural information. A train of clicks with short interclick intervals is used to produce adaptation. Then, during its course, a treatment such as the insertion of a temporal gap or the addition of another "triggering" sound is tested for its ability to restart the binaural process. All of the brief triggers tested are shown to be capable of promoting recovery from adaptation. This suggests that, while the binaural system deals with the demands of high-frequency stimulation with rapid adaptation, it quickly cancels the adaptation in response to stimulus change.  相似文献   

17.
Five bilateral cochlear implant users were tested for their localization abilities and speech understanding in noise, for both monaural and binaural listening conditions. They also participated in lateralization tasks to assess the impact of variations in interaural time delays (ITDs) and interaural level differences (ILDs) for electrical pulse trains under direct computer control. The localization task used pink noise bursts presented from an eight-loudspeaker array spanning an arc of approximately 108 degrees in front of the listeners at ear level (0-degree elevation). Subjects showed large benefits from bilateral device use compared to either side alone. Typical root-mean-square (rms) averaged errors across all eight loudspeakers in the array were about 10 degrees for bilateral device use and ranged from 20 degrees to 60 degrees using either ear alone. Speech reception thresholds (SRTs) were measured for sentences presented from directly in front of the listeners (0 degrees) in spectrally matching speech-weighted noise at either 0 degrees, +90 degrees or -90 degrees for four subjects out of five tested who could perform the task. For noise to either side, bilateral device use showed a substantial benefit over unilateral device use when noise was ipsilateral to the unilateral device. This was primarily because of monaural head-shadow effects, which resulted in robust SRT improvements (P<0.001) of about 4 to 5 dB when ipsilateral and contralateral noise positions were compared. The additional benefit of using both ears compared to the shadowed ear (i.e., binaural unmasking) was only 1 or 2 dB and less robust (P = 0.04). Results from the lateralization studies showed consistently good sensitivity to ILDs; better than the smallest level adjustment available in the implants (0.17 dB) for some subjects. Sensitivity to ITDs was moderate on the other hand, typically of the order of 100 micros. ITD sensitivity deteriorated rapidly when stimulation rates for unmodulated pulse-trains increased above a few hundred Hz but at 800 pps showed sensitivity comparable to 50-pps pulse-trains when a 50-Hz modulation was applied. In our opinion, these results clearly demonstrate important benefits are available from bilateral implantation, both for localizing sounds (in quiet) and for listening in noise when signal and noise sources are spatially separated. The data do indicate, however, that effects of interaural timing cues are weaker than those from interaural level cues and according to our psychophysical findings rely on the availability of low-rate information below a few hundred Hz.  相似文献   

18.
Spatial release from masking (SRM) was measured in groups of children with bilateral cochlear implants (BiCIs, average ages 6.0 and 7.9 yr) and with normal hearing (NH, average ages 5.0 and 7.8 yr). Speech reception thresholds (SRTs) were measured for target speech in front (0°), and interferers in front, distributed asymmetrically toward the right (+90°/+90°) or distributed symmetrically toward the right and left (+90°/-90°). In the asymmetrical condition both monaural "better ear" and binaural cues are available. In the symmetrical condition, listeners rely heavily on binaural cues to segregate sources. SRM was computed as the difference between SRTs in the front condition and SRTs in either the asymmetrical or symmetrical conditions. Results showed that asymmetrical SRM was smaller in BiCI users than NH children. Furthermore, NH children showed symmetrical SRM, suggesting they are able to use binaural cues for source segregation, whereas children with BiCIs had minimal or absent symmetrical SRM. These findings suggest that children who receive BiCIs can segregate speech from noise under conditions that maximize monaural better ear cues. Limitations in the CI devices likely play an important role in limiting SRM. Thus, improvement in spatial hearing abilities in children with BiCIs may require binaural processing strategies.  相似文献   

19.
In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.  相似文献   

20.
Extraction of a target sound source amidst multiple interfering sound sources is difficult when there are fewer sensors than sources, as is the case for human listeners in the classic cocktail-party situation. This study compares the signal extraction performance of five algorithms using recordings of speech sources made with three different two-microphone arrays in three rooms of varying reverberation time. Test signals, consisting of two to five speech sources, were constructed for each room and array. The signals were processed with each algorithm, and the signal extraction performance was quantified by calculating the signal-to-noise ratio of the output. A frequency-domain minimum-variance distortionless-response beamformer outperformed the time-domain based Frost beamformer and generalized sidelobe canceler for all tests with two or more interfering sound sources, and performed comparably or better than the time-domain algorithms for tests with one interfering sound source. The frequency-domain minimum-variance algorithm offered performance comparable to that of the Peissig-Kollmeier binaural frequency-domain algorithm, but with much less distortion of the target signal. Comparisons were also made to a simple beamformer. In addition, computer simulations illustrate that, when processing speech signals, the chosen implementation of the frequency-domain minimum-variance technique adapts more quickly and accurately than time-domain techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号