首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
主观空间分离下的汉语信息掩蔽效应   总被引:1,自引:0,他引:1  
基于听觉优先效应中的融合现象,探讨了主观空间分离下的汉语信息掩蔽效应。实验用左右两个扬声器来播放目标言语信号和掩蔽声音,并通过改变两个扬声器之间的延迟来操作掩蔽声音的主观空间位置。结果显示,尽管言语信号和掩蔽声音都由同样的扬声器播放而没有实际的空间分离,这种利用优先效应所产生的主观空间分离却可以提高言语识别的正确率。在信息掩蔽条件下由主观空间分离所造成的言语识别的改善显著地高于在能量掩蔽条件下的改善。这些结果为如何分离对言语信号的能量掩蔽与信息掩蔽,以及为相关的建筑声学和通讯技术的研究与应用提供了听觉心理学的参考。  相似文献   

2.
Speech reception thresholds (SRTs) were measured for target speech presented concurrently with interfering speech (spoken by a different speaker). In experiment 1, the target and interferer were divided spectrally into high- and low-frequency bands and presented over headphones in three conditions: monaural, dichotic (target and interferer to different ears), and swapped (the low-frequency target band and the high-frequency interferer band were presented to one ear, while the high-frequency target band and the low-frequency interferer band were presented to the other ear). SRTs were highest in the monaural condition and lowest in the dichotic condition; SRTs in the swapped condition were intermediate. In experiment 2, two new conditions were devised such that one target band was presented in isolation to one ear while the other band was presented at the other ear with the interferer. The pattern of SRTs observed in experiment 2 suggests that performance in the swapped condition reflects the intelligibility of the target frequency bands at just one ear; the auditory system appears unable to exploit advantageous target-to-interferer ratios at different ears when segregating target speech from a competing speech interferer.  相似文献   

3.
Across-frequency processing by common interaural time delay (ITD) in spatial unmasking was investigated by measuring speech reception thresholds (SRTs) for high- and low-frequency bands of target speech presented against concurrent speech or a noise masker. Experiment 1 indicated that presenting one of these target bands with an ITD of +500 micros and the other with zero ITD (like the masker) provided some release from masking, but full binaural advantage was only measured when both target bands were given an ITD of + 500 micros. Experiment 2 showed that full binaural advantage could also be achieved when the high- and low-frequency bands were presented with ITDs of equal but opposite magnitude (+/- 500 micros). In experiment 3, the masker was also split into high- and low-frequency bands with ITDs of equal but opposite magnitude (+/-500 micros). The ITD of the low-frequency target band matched that of the high-frequency masking band and vice versa. SRTs indicated that, as long as the target and masker differed in ITD within each frequency band, full binaural advantage could be achieved. These results suggest that the mechanism underlying spatial unmasking exploits differences in ITD independently within each frequency channel.  相似文献   

4.
Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source.  相似文献   

5.
Spatial unmasking of speech has traditionally been studied with target and masker at the same, relatively large distance. The present study investigated spatial unmasking for configurations in which the simulated sources varied in azimuth and could be either near or far from the head. Target sentences and speech-shaped noise maskers were simulated over headphones using head-related transfer functions derived from a spherical-head model. Speech reception thresholds were measured adaptively, varying target level while keeping the masker level constant at the "better" ear. Results demonstrate that small positional changes can result in very large changes in speech intelligibility when sources are near the listener as a result of large changes in the overall level of the stimuli reaching the ears. In addition, the difference in the target-to-masker ratios at the two ears can be substantially larger for nearby sources than for relatively distant sources. Predictions from an existing model of binaural speech intelligibility are in good agreement with results from all conditions comparable to those that have been tested previously. However, small but important deviations between the measured and predicted results are observed for other spatial configurations, suggesting that current theories do not accurately account for speech intelligibility for some of the novel spatial configurations tested.  相似文献   

6.
This study introduces a new test (CRISP-Jr.) for measuring speech intelligibility and spatial release from masking (SRM) in young children ages 2.5-4 years. Study 1 examined whether thresholds, masking, and SRM obtained with a test designed for older children (CRISP) and CRISP-Jr. are comparable in 4 to 5-year-old children. Thresholds were measured for target speech in front, in quiet, and with a different-sex masker either in front or on the right. CRISP-Jr. yielded higher speech reception thresholds (SRTs) than CRISP, but the amount of masking and SRM did not differ across the tests. In study 2, CRISP-Jr. was extended to a group of 3-year-old children. Results showed that while SRTs were higher in the younger group, there were no age differences in masking and SRM. These findings indicate that children as young as 3 years old are able to use spatial cues in sound source segregation, which suggests that some of the auditory mechanisms that mediate this ability develop early in life. In addition, the findings suggest that measures of SRM in young children are not limited to a particular set of stimuli. These tests have potentially useful applications in clinical settings, where bilateral fittings of amplification devices are evaluated.  相似文献   

7.
An experimental method with headphone virtual reproduction is proposed and a series of experiments to study forward masking effect when the masker and the masked signal are spatially separated in azimuth are conducted. Then, the masking thresholds are compared with those when the masker and the masked signal source are at the same place. The results show that, although both the thresholds of 0° and ±30° sound images increase with the sound pressure level (SPL) of the masker, spatial unmasking may be really observed. The maximum unmasking is as large as 15 dB. This spatial unmasking effect is mainly attributed to better-ear contribution. Supported by the National Natural Science Foundation of China (Grant No. 10374031)  相似文献   

8.
Four experiments investigated the effect of the fundamental frequency (F0) contour on speech intelligibility against interfering sounds. Speech reception thresholds (SRTs) were measured for sentences with different manipulations of their F0 contours. These manipulations involved either reductions in F0 variation, or complete inversion of the F0 contour. Against speech-shaped noise, a flattened F0 contour had no significant impact on SRTs compared to a normal F0 contour; the mean SRT for the flattened contour was only 0.4 dB higher. The mean SRT for the inverted contour, however, was 1.3 dB higher than for the normal F0 contour. When the sentences were played against a single-talker interferer, the overall effect was greater, with a 2.0 dB difference between normal and flattened conditions, and 3.8 dB between normal and inverted. There was no effect of altering the F0 contour of the interferer, indicating that any abnormality of the F0 contour serves to reduce intelligibility of the target speech, but does not alter the masking produced by interfering speech. Low-pass filtering the F0 contour increased SRTs; elimination of frequencies between 2 and 4 Hz had the greatest effect. Filtering sentences with inverted contours did not have a significant effect on SRTs.  相似文献   

9.
This study measured the accuracy with which human listeners can localize spoken words. A broadband (300 Hz-16 kHz) corpus of monosyllabic words was created and presented tolisteners using a virtual auditory environment. Localization was examined for 76 locations ona sphere surrounding the listener. Experiment 1 showed that low-pass filtering the speech sounds at 8 kHz degraded performance, causing an increase in polar angle errors associated with the cone of confusion. In experiment 2 it was found that performance in fact varied systematically with the level of the signal above 8 kHz. Although the lower frequencies (below 8 kHz) are known to be sufficient for accurate speech recognition in most situations, these results demonstrate that natural speech contains information between 8 and 16 kHz that is essential for accurate localization.  相似文献   

10.
This study investigates whether the mora is used in controlling timing in Japanese speech, or is instead a structural unit in the language not involved in timing. Unlike most previous studies of mora-timing in Japanese, this article investigates timing in spontaneous speech. Predictability of word duration from number of moras is found to be much weaker than in careful speech. Furthermore, the number of moras predicts word duration only slightly better than number of segments. Syllable structure also has a significant effect on word duration. Finally, comparison of the predictability of whole words and arbitrarily truncated words shows better predictability for truncated words, which would not be possible if the truncated portion were compensating for remaining moras. The results support an accumulative model of variance with a final lengthening effect, and do not indicate the presence of any compensation related to mora-timing. It is suggested that the rhythm of Japanese derives from several factors about the structure of the language, not from durational compensation.  相似文献   

11.
If spatial attention acts like a "spotlight," focusing on one location and excluding others, it may be advantageous to have all targets of interest within the same spatial region. This hypothesis was explored using a task where listeners reported keywords from two simultaneous talkers. In Experiment 1, the two talkers were placed symmetrically about the frontal midline with various angular separations. While there was a small performance improvement for moderate separations, the improvement decreased for larger separations. However, the dependency of the relative talker intensities on spatial configuration accounted for these effects. Experiment 2 tested whether spatial separation improved the intelligibility of each source, an effect that could counteract any degradation in performance as sources fell outside the spatial spotlight of attention. In this experiment, intelligibility of individual sources was equalized across configurations by adding masking noise. Under these conditions, the cost of divided listening (the drop in performance when reporting both messages compared to reporting just one) was smaller when the spatial separation was small. These results suggest that spatial separation enhances the intelligibility of individual sources in a competing pair but increases the cost associated with having to process both sources simultaneously, consistent with the attentional spotlight hypothesis.  相似文献   

12.
陈锴  卢晶  徐柏龄 《声学学报》2006,31(3):211-216
提出了一种基于话者状态检测的语音分离算法。该算法对话者状态进行自动检测,并根据相应的状态对自适应滤波过程加以控制,以此对各路的声场传递函数进行估计,进而使混合的语音信号得到分离。仿真实验结果表明:与传统的输出信号互为参考的信号的分离算法相比,该算法克服了参考信号不纯导致自适应语音分离结果恶化的缺陷;该算法不需要人为地降低自适应滤波器的收敛速度,所以具有较快的收敛和跟踪性能;此外,该算法还具有运算量较小,实时性好等特点。  相似文献   

13.
Speech transmission index (STI) is an objective measure of the acoustic properties of office environments and is used to specify norms for acceptable acoustic work conditions. Yet, the tasks used to evaluate the effects of varying STIs on work performance have often been focusing on memory (as memory of visually presented words) and reading tasks and may not give a complete view of the severity even of low STI values (i.e., when speech intelligibility is low). Against this background, we used a more typical office-work task in the present study. The participants were asked to write short essays (5 min per essay) in 5 different STI conditions (0.08; 0.23; 0.34; 0.50; and 0.71). Writing fluency dropped drastically and the number of pauses longer than 5 s increased at STI values above 0.23. This study shows that realistic work-related performance drops even at low STI values and has implications for how to evaluate acoustic conditions in school and office environments.  相似文献   

14.
Despite a lack of traditional speech features, novel sentences restricted to a narrow spectral slit can retain nearly perfect intelligibility [R. M. Warren et al., Percept. Psychophys. 57, 175-182 (1995)]. The current study employed 514 listeners to elucidate the cues allowing this high intelligibility, and to examine generally the use of narrow-band temporal speech patterns. When 1/3-octave sentences were processed to preserve the overall temporal pattern of amplitude fluctuation, but eliminate contrasting amplitude patterns within the band, sentence intelligibility dropped from values near 100% to values near zero (experiment 1). However, when a 1/3-octave speech band was partitioned to create a contrasting pair of independently amplitude-modulated 1/6-octave patterns, some intelligibility was restored (experiment 2). An additional experiment (3) showed that temporal patterns can also be integrated across wide frequency separations, or across the two ears. Despite the linguistic content of single temporal patterns, open-set intelligibility does not occur. Instead, a contrast between at least two temporal patterns is required for the comprehension of novel sentences and their component words. These contrasting patterns can reside together within a narrow range of frequencies, or they can be integrated across frequencies or ears. This view of speech perception, in which across-frequency changes in energy are seen as systematic changes in the temporal fluctuation patterns at two or more fixed loci, is more in line with the physiological encoding of complex signals.  相似文献   

15.
石倩  陈航艇  张鹏远 《声学学报》2022,47(1):139-150
提出了波达方向初始化空间混合概率模型的语音增强算法。通过声源定位估计出声源波达方向,再根据此计算相对传递函数,进而构造空间协方差矩阵来初始化空间混合概率模型。论证了相对传递函数在作为模型参数中语音协方差矩阵的主特征向量时,空间混合概率模型对应的概率分布可达到最大值,进而使期望最大化算法在迭代时更易收敛,以得到期望的掩蔽值。实验先后在自建仿真数据集和CHiME-4的两通道数据集中进行验证,结果表明,将波达方向信息引入到语音增强后语音识别系统的词错误率可以比未引入波达方向的词错误率最多降低3.79%,信号失真比最多提升2.00 dB,验证了在结合波达方向后的空间混合概率模型进行语音增强时性能有所提升。  相似文献   

16.
提出了波达方向初始化空间混合概率模型的语音增强算法。通过声源定位估计出声源波达方向,再根据此计算相对传递函数,进而构造空间协方差矩阵来初始化空间混合概率模型。论证了相对传递函数在作为模型参数中语音协方差矩阵的主特征向量时,空间混合概率模型对应的概率分布可达到最大值,进而使期望最大化算法在迭代时更易收敛,以得到期望的掩蔽值。实验先后在自建仿真数据集和CHiME-4的两通道数据集中进行验证,结果表明,将波达方向信息引入到语音增强后语音识别系统的词错误率可以比未引入波达方向的词错误率最多降低3.79%,信号失真比最多提升2.00 dB,验证了在结合波达方向后的空间混合概率模型进行语音增强时性能有所提升。  相似文献   

17.
The contributions of auditory and cognitive factors to age-dependent differences in auditory spatial attention were investigated. In conditions of real spatial separation, the target sentence was presented from a central location and competing sentences were presented from left and right locations. In conditions of simulated spatial separation, different apparent spatial locations of the target and competitors were induced using the precedence effect. The identity of the target was cued by a callsign presented either prior to or following each target sentence, and the probability that the target would be presented at the three locations was specified at the beginning of each block. Younger and older adults with normal hearing sensitivity below 4 kHz completed all 16 conditions (2-spatial separation method X 2-callsign conditions X 4-probability conditions). Overall, younger adults performed better than older adults. For both age groups, performance improved with target location certainty, with a priori target cueing, and when location differences were real rather than simulated. For both age groups, the contributions of natural spatial cues were most pronounced when the target occurred at "unlikely" spatial listening locations. This suggests that both age groups benefit similarly from richer acoustical cues and a priori information in difficult listening environments.  相似文献   

18.
The effect of perceived spatial differences on masking release was examined using a 4AFC speech detection paradigm. Targets were 20 words produced by a female talker. Maskers were recordings of continuous streams of nonsense sentences spoken by two female talkers and mixed into each of two channels (two talker, and the same masker time reversed). Two masker spatial conditions were employed: "RF" with a 4 ms time lead to the loudspeaker 60 degrees horizontally to the right, and "FR" with the time lead to the front (0 degrees ) loudspeaker. The reference nonspatial "F" masker was presented from the front loudspeaker only. Target presentation was always from the front loudspeaker. In Experiment 1, target detection threshold for both natural and time-reversed spatial maskers was 17-20 dB lower than that for the nonspatial masker, suggesting that significant release from informational masking occurs with spatial speech maskers regardless of masker understandability. In Experiment 2, the effectiveness of the FR and RF maskers was evaluated as the right loudspeaker output was attenuated until the two-source maskers were indistinguishable from the F masker, as measured independently in a discrimination task. Results indicated that spatial release from masking can be observed with barely noticeable target-masker spatial differences.  相似文献   

19.
Four adult bilateral cochlear implant users, with good open-set sentence recognition, were tested with three different sound coding strategies for binaural speech unmasking and their ability to localize 100 and 500 Hz click trains in noise. Two of the strategies tested were envelope-based strategies that are clinically widely used. The third was a research strategy that additionally preserved fine-timing cues at low frequencies. Speech reception thresholds were determined in diotic noise for diotic and interaurally time-delayed speech using direct audio input to a bilateral research processor. Localization in noise was assessed in the free field. Overall results, for both speech and localization tests, were similar with all three strategies. None provided a binaural speech unmasking advantage due to the application of 700 micros interaural time delay to the speech signal, and localization results showed similar response patterns across strategies that were well accounted for by the use of broadband interaural level cues. The data from both experiments combined indicate that, in contrast to normal hearing, timing cues available from natural head-width delays do not offer binaural advantages with present methods of electrical stimulation, even when fine-timing cues are explicitly coded.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号