期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

This paper examines how short-term energy fluctuations in a masker affect the thresholds for tones at frequencies above those of the masker. Two equally intense tones at 1060 and 1075 Hz produce up to 25 dB less masking than does a 1075-Hz tone set to the overall level of the two-tone complex. At wider frequency separations, two-tone complexes also produce less masking than the pure tone. These results indicate that envelope fluctuations in a masker, whose spectrum is confined to a single critical band, may result in release from masking. The release from masking probably is related to the comodulation masking release reported by Hall et al. [J. Acoust. Soc. Am. 76, 50-56 (1984b)] for modulated-noise maskers with bandwidths greater than one critical band. Further measurements with maskers, whose intensity level in the critical band around 1 kHz was 90 dB SPL, show similar masking by a pure tone and a 625- to 1075-Hz bandpass noise, but less masking by narrow-band noises. These results are inconsistent with a simple frequency selective energy-detector model and indicate that the auditory system can use periods of low masker energy as brief as a few ms to enhance detection of a tone. The results also imply that the upward spread of excitation is best represented by masking patterns for noises with bandwidths of several critical bands. 相似文献

6.

Comodulation masking release for speech stimuli.

J H Grose J W Hall 《The Journal of the Acoustical Society of America》1992,91(2):1042-1050

This study sought to determine whether speech recognition in a modulating noise background can be facilitated by a process attributable to comodulation masking release (CMR). Experiment 1 examined the masked identification of six filtered vowels as a function of the number of comodulated noisebands present. A benefit of increased number was observed, consistent with an interpretation in terms of CMR, although it could not be certain that the basis of the discrimination was word recognition in the semantic sense. Experiment 2 made use of a forced-choice rhyming test in which the response foils differed only in a single filtered consonant; again, the measure of interest was performance as a function of the number of comodulated noisebands present. No evidence for a suprathreshold CMR was observed. Experiment 3 made use of open-set sentence material and employed a different paradigm, which allowed a measure of CMR in terms of the difference between thresholds in correlated and uncorrelated noise to be determined. While a CMR for speech detection was observed, no CMR for speech recognition was found. It was concluded that CMR is most evident in masked detection tasks and that diminishing returns are encountered as the signal-to-masker ratio is raised. 相似文献

7.

母语为汉语的听者听英语时的空间去掩蔽现象研究

陈妍邱小军《声学学报》2011,36(2):231-238

通过心理声学实验研究了来自不同方向具有不同信噪比的两种干扰声条件下,母语为汉语的听者对英语的空间去掩蔽现象。在消声室指定位置布放扬声器,发出目标声和干扰声,通过听者对目标声进行听音识别,得到听者识别的正确率。实验结果显示:只在正前方播放目标语音时,识别正确率大于99%,当目标和干扰语音都位于听者正前方时,正确率为57%;当目标和干扰语音随机位于士60°时,正确率为96%;特别地,当目标语音和干扰信号都位于听者正前方时,若干扰为噪声,随着信噪比从0 dB降低到-12 dB,正确率从96%降低到34%,而当干扰为语音时,随着信噪比从0 dB降低到-12 dB,正确率先是下降,随后有平均幅度为27%的明显上升,在此之后又是下降的趋势;当噪声干扰和语音干扰位于60°时,随着信噪比从-4 dB降低到-16 dB,正确率分别从99%降低到80%和从98%降低到91%。研究表明:空间分离对于母语为汉语的听者的英语语音可懂度有明显增益;大多数情况下英语语音的正确率都随着信噪比的降低而下降。这和对母语为其他语言的相关研究结论一致。相似文献

8.

Predicting binaural gain in intelligibility and release from masking for speech 总被引：2，自引：0，他引：2

H Levitt L R Rabiner 《The Journal of the Acoustical Society of America》1967,42(4):820-829

相似文献

9.

Combining energetic and informational masking for speech identification

Kidd G Mason CR Gallun FJ 《The Journal of the Acoustical Society of America》2005,118(2):982-992

This study examined combinations of energetic and informational maskers in speech identification. Speech targets and maskers (speech or noise) were processed and filtered into sets of 15 narrow frequency bands. The target was the sum of eight randomly selected bands. More masking occurred for speech maskers than for spectrally matched noise maskers regardless of whether the masker bands overlapped the target bands. The greater effect of the speech maskers was interpreted as due to informational masking. When the masker was comprised of nonoverlapping bands of speech, the addition of bands of noise overlapping the speech masker, but not the speech target, reduced the overall amount of masking. Surprisingly, presenting the noise to the ear contralateral to the target and masker produced an even greater release from masking. The contralateral noise was apparently sufficient to cause a slight change in the image of the ipsilateral speech masker, possibly pulling it away from the target enough to allow the focus of attention on the target. This finding is consistent with the interpretation that in some conditions small binaural differences may be sufficient to cause, or significantly strengthen, the perceptual segregation of sounds. 相似文献

10.

Selection of meaningless steady noise for masking of speech

Tetsuro Saeki Takahiro Tamesue Shizuma Yamaguchi Kazuya Sunada 《Applied Acoustics》2004,65(2):203-210

This paper focuses on masking speech with meaningless steady noise as a way of realizing a comfortable sound environment. As a basis for research, meaningless steady noise at minimum sound pressure levels for masking of male or female meaningful speech is considered, based on psychological experiments using a method of adjustment. From the results, band-limited pink noise can be selected as the most effective noise for masking of speech. In the case of speech with a lower sound pressure level, the sound pressure level of the meaningless steady noise needs to be a little higher. 相似文献

11.

Variability and uncertainty in masking by competing speech

Freyman RL Helfer KS Balakrishnan U 《The Journal of the Acoustical Society of America》2007,121(2):1040-1046

This study investigated the role of uncertainty in masking of speech by interfering speech. Target stimuli were nonsense sentences recorded by a female talker. Masking sentences were recorded from ten female talkers and combined into pairs. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (nonspatial condition) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (spatial condition). In Experiment 1, the sentences were presented in blocks in which the masking talkers, spatial configuration, and signal-to-noise (S-N) ratio were fixed. Listeners' recognition performance varied widely among the masking talkers in the nonspatial condition, much less so in the spatial condition. This result was attributed to variation in effectiveness of informational masking in the nonspatial condition. The second experiment increased uncertainty by randomizing masking talkers and S-N ratios across trials in some conditions, and reduced uncertainty by presenting the same token of masker across trials in other conditions. These variations in masker uncertainty had relatively small effects on speech recognition. 相似文献

12.

基于非线性时频掩蔽的语音盲分离方法

徐舜陈绍荣刘郁林《声学学报》2007,32(4):375-381

针对语音信号的欠定卷积混合模型,利用独立语音在时频域上的近似W-分离正交性(W-DO),提出了一种基于非线性时频掩蔽的盲分离方法。首先对多传声器观测信号在时频域上进行规范化处理,使混合信号在每个时频槽的表示与频率无关,然后采用动态聚类算法获取时频槽对应的活跃源信息,选择关于簇中心偏角的非线性函数进行时频掩蔽,从而实现语音信号的盲分离。该方法解决了经典频域盲分离算法中的频率置换问题,能有效抑制分离矩阵的空间方向扩散。仿真实验表明,与BLUES方法相比具有更优的分离性能,信噪比增益平均增加1.58 dB。相似文献

13.

Blind speech source separation via nonlinear time-frequency masking

XU Shun CHEN Shaorong LIU Yulin 《声学学报：英文版》2008,27(3):203-214

Aim at the underdetermined convolutive mixture model, a blind speech source separation method based on nonlinear time-frequency masking was proposed, where the approximate W-disjoint orthogonality （W-DO） property among independent speech signals in time-frequency domain is utilized. In this method, the observation mixture signal from multimicrophones is normalized to be independent of frequency in the time-frequency domain at first, then the dynamic clustering algorithm is adopted to obtain the active source information in each time-frequency slot, a nonlinear function via deflection angle from the cluster center is selected for time-frequency masking, finally the blind separation of mixture speech signals can be achieved by inverse STFT （short-time Fourier transformation）. This method can not only solve the problem of frequency permutation which may be met in most classic frequency-domain blind separation techniques, but also suppress the spatial direction diffusion of the separation matrix. The simulation results demonstrate that the proposed separation method is better than the typical BLUES method, the signal-noise-ratio gain （SNRG） increases 1.58 dB averagely. 相似文献

14.

Spatial release from energetic and informational masking in a divided speech identification task

Ihlefeld A Shinn-Cunningham B 《The Journal of the Acoustical Society of America》2008,123(6):4380-4392

When listening selectively to one talker in a two-talker environment, performance generally improves with spatial separation of the sources. The current study explores the role of spatial separation in divided listening, when listeners reported both of two simultaneous messages processed to have little spectral overlap (limiting "energetic masking" between the messages). One message was presented at a fixed level, while the other message level varied from equal to 40 dB less than that of the fixed-level message. Results demonstrate that spatial separation of the competing messages improved divided-listening performance. Most errors occurred because listeners failed to report the content of the less-intense talker. Moreover, performance generally improved as the broadband energy ratio of the variable-level to the fixed-level talker increased. The error patterns suggest that spatial separation improves the intelligibility of the less-intense talker by improving the ability to (1) hear portions of the signal that would otherwise be masked, (2) segregate the two talkers properly into separate perceptual streams, and (3) selectively focus attention on the less-intense talker. Spatial configuration did not noticeably affect the ability to report the more-intense talker, suggesting that it was processed differently than the less-intense talker, which was actively attended. 相似文献

15.

Spatial release from energetic and informational masking in a selective speech identification task

Ihlefeld A Shinn-Cunningham B 《The Journal of the Acoustical Society of America》2008,123(6):4369-4379

A masker can reduce target intelligibility both by interfering with the target's peripheral representation ("energetic masking") and/or by causing more central interference ("informational masking"). Intelligibility generally improves with increasing spatial separation between two sources, an effect known as spatial release from masking (SRM). Here, SRM was measured using two concurrent sine-vocoded talkers. Target and masker were each composed of eight different narrowbands of speech (with little spectral overlap). The broadband target-to-masker energy ratio (TMR) was varied, and response errors were used to assess the relative importance of energetic and informational masking. Performance improved with increasing TMR. SRM occurred at all TMRs; however, the pattern of errors suggests that spatial separation affected performance differently, depending on the dominant type of masking. Detailed error analysis suggests that informational masking occurred due to failures in either across-time linkage of target segments (streaming) or top-down selection of the target. Specifically, differences in the spatial cues in target and masker improved streaming and target selection. In contrast, level differences helped listeners select the target, but had little influence on streaming. These results demonstrate that at least two mechanisms (differentially affected by spatial and level cues) influence informational masking. 相似文献

16.

A frequency bin-wise nonlinear masking algorithm in convolutive mixtures for speech segregation

Chi TS Huang CW Chou WS 《The Journal of the Acoustical Society of America》2012,131(5):EL361-EL367

A frequency bin-wise nonlinear masking algorithm is proposed in the spectrogram domain for speech segregation in convolutive mixtures. The contributive weight from each speech source to a time-frequency unit of the mixture spectrogram is estimated by a nonlinear function based on location cues. For each sound source, a non-binary mask is formed from the estimated weights and is multiplied to the mixture spectrogram to extract the sound. Head-related transfer functions (HRTFs) are used to simulate convolutive sound mixtures perceived by listeners. Simulation results show our proposed method outperforms convolutive independent component analysis and degenerate unmixing and estimation technique methods in almost all test conditions. 相似文献

17.

Release from speech-on-speech masking by adding a delayed masker at a different location

Rakerd B Aaronson NL Hartmann WM 《The Journal of the Acoustical Society of America》2006,119(3):1597-1605

The amount of masking exerted by one speech sound on another can be reduced by presenting the masker twice, from two different locations in the horizontal plane, with one of the presentations delayed to simulate an acoustical reflection. Three experiments were conducted on various aspects of this phenomenon. Experiment 1 varied the number of masking talkers from one to three and the signal-to-noise (S/N) ratio from -12 to +4 dB. Evidence of masking release was found for every combination of these variables tested. For the most difficult conditions (multiple maskers and negative S/N) the amount of release was approximately 10 dB. Experiment 2 varied the timing of leading and lagging masker presentations over a broad range, to include shorter delay times where room reflections of speech are rarely noticed by listeners and longer delays where reflections can become disruptive. Substantial masking release was found for all of the shorter delay times tested, and negligible release was found at the longer delays. Finally, Experiment 3 used speech-spectrum noise as a masker and searched for possible energetic masking release as a function of the lead-lag time delay. Release of up to 4 dB was found whenever delays were 2 ms or less. No energetic masking release was found at longer delays. 相似文献

18.

主观空间分离下的汉语信息掩蔽效应 总被引：1，自引：0，他引：1

吴艳红李文瑞陈婧王纯屈宏伟吴玺宏李量《声学学报》2005,30(5):462-467

基于听觉优先效应中的融合现象,探讨了主观空间分离下的汉语信息掩蔽效应。实验用左右两个扬声器来播放目标言语信号和掩蔽声音,并通过改变两个扬声器之间的延迟来操作掩蔽声音的主观空间位置。结果显示,尽管言语信号和掩蔽声音都由同样的扬声器播放而没有实际的空间分离,这种利用优先效应所产生的主观空间分离却可以提高言语识别的正确率。在信息掩蔽条件下由主观空间分离所造成的言语识别的改善显著地高于在能量掩蔽条件下的改善。这些结果为如何分离对言语信号的能量掩蔽与信息掩蔽,以及为相关的建筑声学和通讯技术的研究与应用提供了听觉心理学的参考。相似文献

19.

Effect of forward and backward masking on speech intelligibility

D D Dirks D Bower 《The Journal of the Acoustical Society of America》1970,47(4):1003-1008

相似文献

20.

Informational masking of speech in children: auditory-visual integration

Wightman F Kistler D Brungart D 《The Journal of the Acoustical Society of America》2006,119(6):3940-3949

The focus of this study was the release from informational masking that could be obtained in a speech task by viewing a video of the target talker. A closed-set speech recognition paradigm was used to measure informational masking in 23 children (ages 6-16 years) and 10 adults. An audio-only condition required attention to a monaural target speech message that was presented to the same ear with a time-synchronized distracter message. In an audiovisual condition, a synchronized video of the target talker was also presented to assess the release from informational masking that could be achieved by speechreading. Children required higher target/distracter ratios than adults to reach comparable performance levels in the audio-only condition, reflecting a greater extent of informational masking in these listeners. There was a monotonic age effect, such that even the children in the oldest age group (12-16.9 years) demonstrated performance somewhat poorer than adults. Older children and adults improved significantly in the audiovisual condition, producing a release from informational masking of 15 dB or more in some adult listeners. Audiovisual presentation produced no informational masking release for the youngest children. Across all ages, the benefit of a synchronized video was strongly associated with speechreading ability. 相似文献