期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Intelligibility enhancement for noisy whispered speech using asymmetric cost function

ZHOU Jian ;ZHENG Wenming ;WANG Qingyun ;ZHAO Li 《声学学报：英文版》2014,(3):312-322

We proposed two whispered speech enhancement methods based on asymmetric cost functions in this paper to deal with the amplification and attenuation distortions of whispered speech distinctively.The modified Itakura-Saito（MIS）distance function provides more penalties to speech amplification distortion,whereas the Kullback-Leibler（KL）divergence function gives more penalties to speech attenuation distortion.The experimental results show that the MIS function based method achieves significant improvement of intelligibility in contrast to the conventional speech enhancement algorithms when the signal-to-noise ratio（SNR）falls below-6 dB,whereas the KL function based one achieves the similar result as the minimum mean square error（MMSE）speech enhancement method.The results show that the effects of the amplification and attenuation distortions on the intelligibility of the enhanced whisper are different,where larger attenuation distortion may result in better intelligibility of speech with low SNR.However,the attenuation distortion has small effects on intelligibility of speech with high SNR. 相似文献

2.

Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise

Cao S Li L Wu X 《The Journal of the Acoustical Society of America》2011,129(4):2227-2236

When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions. 相似文献

3.

Improvements in intelligibility of noisy reverberant speech using a binaural subband adaptive noise-cancellation processing scheme.

P W Shields D R Campbell 《The Journal of the Acoustical Society of America》2001,110(6):3232-3242

This article reports on the performance of an adaptive subband noise cancellation scheme, which performs binaural preprocessing of speech signals for a hearing-aid application. The multi-microphone subband adaptive (MMSBA) signal processing scheme uses the least mean squares (LMS) algorithm in frequency-limited subbands. The use of subbands enables a diverse processing mechanism to be employed, splitting the two-channel wide-band signal into smaller frequency-limited subbands, which can be processed according to their individual signal characteristics. The frequency delimiting used a linear- or cochlear-spaced subband distribution. The effect of the processing scheme on speech intelligibility was assessed in a trial involving 15 hearing-impaired volunteers with moderate sensorineural hearing loss. The acoustic material consisted of speech and speech-shaped noise signals, generated using simulated and real-room acoustic environments, at signal-to-noise ratios (SNRs) in the range -6 to +3 dB. The results show that the MMSBA scheme delivered average speech intelligibility improvements of 11.5%, with a maximum of 37.25%, in noisy reverberant conditions. There was no significant reduction in mean speech intelligibility due to processing, in any of the test conditions. 相似文献

4.

Intelligibility of temporally interrupted speech

G L Powers C Speaks 《The Journal of the Acoustical Society of America》1973,54(3):661-667

相似文献

5.

Pitch-based monaural segregation of reverberant speech

Roman N Wang D 《The Journal of the Acoustical Society of America》2006,120(1):458-469

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design. 相似文献

6.

Effect of spectral resolution on the intelligibility of ideal binary masked speech

Li N Loizou PC 《The Journal of the Acoustical Society of America》2008,123(4):EL59-EL64

Most binary-mask studies assume a fine time-frequency representation of the signal that may not be available in some applications (e.g., cochlear implants). This study assesses the effect of spectral resolution on intelligibility of ideal-binary masked speech. In Experiment 1, speech corrupted in noise at -5 to 5 dB signal-to-noise ratio (SNR) was filtered into 6-32 channels and synthesized using the ideal binary mask. Results with normal-hearing listeners indicated substantial improvements in intelligibility with 24-32 channels, particularly in -5 dB SNR. Results from Experiment 2 indicated that having access to the ideal binary mask in the F1/F2 region is sufficient for good performance. 相似文献

7.

Time-compressed speech intelligibility in different reverberant conditions

Jędrzej Kociński Dawid Niemiec 《Applied Acoustics》2016

相似文献

8.

Elevated measurement of traffic noise above an ideal reverberant city

S.E. Froseth R.F. Lambert 《Journal of sound and vibration》1977,50(3):353-368

Aerial noise measurement methods may be well suited to the determination of spatially-averaged traffic noise exposure levels, and could possibly be used as a means of assessing the long-term effectiveness of motor vehicle noise regulations. In this study two theoretical models are developed for some specific aerial measurement situations. Several characteristics of the models are examined. Limited experimental measurements agree well with theoretically predicted results; elevated measured noise levels are nearly proportional to the density of the traffic (in vehicles per unit area) on the city streets. 相似文献

9.

Statistical analysis of the autoregressive modeling of reverberant speech

Gaubitch ND Ward DB Naylor PA 《The Journal of the Acoustical Society of America》2006,120(6):4031-4039

Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M-channel observation (M > 1); and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced (<0.3m). 相似文献

10.

Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources

Lavandier M Jelfs S Culling JF Watkins AJ Raimond AP Makin SJ 《The Journal of the Acoustical Society of America》2012,131(1):218-231

When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs. 相似文献

11.

Intelligibility of speech under nonexponential decay conditions.

B Yegnanarayana B S Ramakrishna 《The Journal of the Acoustical Society of America》1975,58(4):853-857

相似文献

12.

Release from masking for speech

P B Weston J D Miller I J Hirsh 《The Journal of the Acoustical Society of America》1965,38(6):1053-1054

相似文献

13.

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation

Brungart DS Chang PS Simpson BD Wang D 《The Journal of the Acoustical Society of America》2006,120(6):4007-4018

When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker sounds. This study attempted to isolate the effects that energetic masking, defined as the loss of detectable target information due to the spectral overlap of the target and masking signals, has on multitalker speech perception. This was achieved through the use of ideal time-frequency binary masks that retained those spectro-temporal regions of the acoustic mixture that were dominated by the target speech but eliminated those regions that were dominated by the interfering speech. The results suggest that energetic masking plays a relatively small role in the overall masking that occurs when speech is masked by interfering speech but a much more significant role when speech is masked by interfering noise. 相似文献

14.

Nanophotonic reservoir computing for noisy speech recognition

M. R. Salehi L. Dehyadegari 《Optical and Quantum Electronics》2016,48(5):281

相似文献

15.

Comodulation masking release for speech stimuli.

J H Grose J W Hall 《The Journal of the Acoustical Society of America》1992,91(2):1042-1050

This study sought to determine whether speech recognition in a modulating noise background can be facilitated by a process attributable to comodulation masking release (CMR). Experiment 1 examined the masked identification of six filtered vowels as a function of the number of comodulated noisebands present. A benefit of increased number was observed, consistent with an interpretation in terms of CMR, although it could not be certain that the basis of the discrimination was word recognition in the semantic sense. Experiment 2 made use of a forced-choice rhyming test in which the response foils differed only in a single filtered consonant; again, the measure of interest was performance as a function of the number of comodulated noisebands present. No evidence for a suprathreshold CMR was observed. Experiment 3 made use of open-set sentence material and employed a different paradigm, which allowed a measure of CMR in terms of the difference between thresholds in correlated and uncorrelated noise to be determined. While a CMR for speech detection was observed, no CMR for speech recognition was found. It was concluded that CMR is most evident in masked detection tasks and that diminishing returns are encountered as the signal-to-masker ratio is raised. 相似文献

16.

含噪语音多路信号同步的研究

方元《声学学报》2001,26(4):324-328

提出了一种解决两路信号同步的方法.与单纯平衡信号的延迟相比,在具有一定长度混响时间的房间里,能够较为有效地抵消房间冲激响应的影响。实验结果证明,该方法在信噪比较低的情况下仍可以取得较好的效果;同步后的信号相加,几乎可以达到信噪比提高的极限。相似文献

17.

基于噪声追踪的二值时频掩蔽到浮值掩蔽的泛化算法

梁山刘文举江巍《声学学报》2013,38(5):632-637

虽然浮值掩蔽比二值掩蔽有更好的语音分离效果,但是由于理想浮值掩蔽难以直接估计,现有的语音分离系统通常以理想二值掩蔽估计作为计算目标。我们提出了一个二值掩蔽到浮值掩蔽的泛化算法。由于实现浮值掩蔽估计的关键在于噪声能量追踪,我们首先采用指数分布刻画以混合谱和噪声能量以混合能量及二值掩蔽为观测的条件分布。其次,采用高斯马尔柯夫条件随机场刻画噪声估计在连续几帧内的关联。最后,采用马尔柯夫链-蒙特卡洛计算噪声能量最小均方误差估计并进一步计算浮值掩蔽。实验表明,相比于基于二值掩蔽估计的常规算法,我们所提出的算法在信噪比增益和客观感知质量两方面都有显著提高。相似文献

18.

Letter: Dichotic release from masking for speech

T C Rand 《The Journal of the Acoustical Society of America》1974,55(3):678-680

相似文献

19.

Variability and uncertainty in masking by competing speech

Freyman RL Helfer KS Balakrishnan U 《The Journal of the Acoustical Society of America》2007,121(2):1040-1046

This study investigated the role of uncertainty in masking of speech by interfering speech. Target stimuli were nonsense sentences recorded by a female talker. Masking sentences were recorded from ten female talkers and combined into pairs. Listeners' recognition performance was measured with both target and masker presented from a front loudspeaker (nonspatial condition) or with a masker presented from two loudspeakers, with the right leading the front by 4 ms (spatial condition). In Experiment 1, the sentences were presented in blocks in which the masking talkers, spatial configuration, and signal-to-noise (S-N) ratio were fixed. Listeners' recognition performance varied widely among the masking talkers in the nonspatial condition, much less so in the spatial condition. This result was attributed to variation in effectiveness of informational masking in the nonspatial condition. The second experiment increased uncertainty by randomizing masking talkers and S-N ratios across trials in some conditions, and reduced uncertainty by presenting the same token of masker across trials in other conditions. These variations in masker uncertainty had relatively small effects on speech recognition. 相似文献

20.

Combining energetic and informational masking for speech identification

Kidd G Mason CR Gallun FJ 《The Journal of the Acoustical Society of America》2005,118(2):982-992

This study examined combinations of energetic and informational maskers in speech identification. Speech targets and maskers (speech or noise) were processed and filtered into sets of 15 narrow frequency bands. The target was the sum of eight randomly selected bands. More masking occurred for speech maskers than for spectrally matched noise maskers regardless of whether the masker bands overlapped the target bands. The greater effect of the speech maskers was interpreted as due to informational masking. When the masker was comprised of nonoverlapping bands of speech, the addition of bands of noise overlapping the speech masker, but not the speech target, reduced the overall amount of masking. Surprisingly, presenting the noise to the ear contralateral to the target and masker produced an even greater release from masking. The contralateral noise was apparently sufficient to cause a slight change in the image of the ipsilateral speech masker, possibly pulling it away from the target enough to allow the focus of attention on the target. This finding is consistent with the interpretation that in some conditions small binaural differences may be sufficient to cause, or significantly strengthen, the perceptual segregation of sounds. 相似文献