期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Intelligibility enhancement for noisy whispered speech using asymmetric cost function

ZHOU Jian ;ZHENG Wenming ;WANG Qingyun ;ZHAO Li 《声学学报：英文版》2014,(3):312-322

We proposed two whispered speech enhancement methods based on asymmetric cost functions in this paper to deal with the amplification and attenuation distortions of whispered speech distinctively.The modified Itakura-Saito（MIS）distance function provides more penalties to speech amplification distortion,whereas the Kullback-Leibler（KL）divergence function gives more penalties to speech attenuation distortion.The experimental results show that the MIS function based method achieves significant improvement of intelligibility in contrast to the conventional speech enhancement algorithms when the signal-to-noise ratio（SNR）falls below-6 dB,whereas the KL function based one achieves the similar result as the minimum mean square error（MMSE）speech enhancement method.The results show that the effects of the amplification and attenuation distortions on the intelligibility of the enhanced whisper are different,where larger attenuation distortion may result in better intelligibility of speech with low SNR.However,the attenuation distortion has small effects on intelligibility of speech with high SNR. 相似文献

2.

提高耳语音可懂度的非对称压缩语音增强方法

周健郑文明王青云赵力《声学学报》2014,39(4):501-508

提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。相似文献

3.

Channel selection in the modulation domain for improved speech intelligibility in noise

Wójcicki KK Loizou PC 《The Journal of the Acoustical Society of America》2012,131(4):2904-2913

Background noise reduces the depth of the low-frequency envelope modulations known to be important for speech intelligibility. The relative strength of the target and masker envelope modulations can be quantified using a modulation signal-to-noise ratio, (S/N)(mod), measure. Such a measure can be used in noise-suppression algorithms to extract target-relevant modulations from the corrupted (target + masker) envelopes for potential improvement in speech intelligibility. In the present study, envelopes are decomposed in the modulation spectral domain into a number of channels spanning the range of 0-30 Hz. Target-dominant modulations are identified and retained in each channel based on the (S/N)(mod) selection criterion, while modulations which potentially interfere with perception of the target (i.e., those dominated by the masker) are discarded. The impact of modulation-selective processing on the speech-reception threshold for sentences in noise is assessed with normal-hearing listeners. Results indicate that the intelligibility of noise-masked speech can be improved by as much as 13 dB when preserving target-dominant modulations, present up to a modulation frequency of 18 Hz, while discarding masker-dominant modulations from the mixture envelopes. 相似文献

4.

Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise

Cao S Li L Wu X 《The Journal of the Acoustical Society of America》2011,129(4):2227-2236

When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions. 相似文献

5.

A comparative intelligibility study of single-microphone noise reduction algorithms 总被引：1，自引：0，他引：1

Hu Y Loizou PC 《The Journal of the Acoustical Society of America》2007,122(3):1777

The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores. 相似文献

6.

汉语音段反转言语的可懂度研究

下载免费PDF全文

蒋斌匡正吴鸣杨军《声学学报》2012,37(6):659-666

实验研究了帧长对汉语音段反转言语可懂度的影响。实验结果表明,帧长在64 ms以下,汉语音段反转言语具有较高的可懂度;帧长在64~203 ms之间,可懂度随帧长的增加逐渐降低;帧长在203 ms以上,可懂度为0。在帧长8 ms时,汉语的声调失真导致可懂度下降。原始语音信号和音段反转言语的调制谱的分析表明,调制谱失真大小和可懂度密切相关。因此,用原始语音信号和音段反转言语的窄带包络间的归一化相关值可以衡量调制谱失真大小,基于语音的语言传输指数法计算的客观值和实验结果显著相关(r=0.876,p<0.01)。研究表明,语言可懂度与窄带包络有关,音段反转言语的可懂度和保留原始语音信号的窄带包络密切相关。相似文献

7.

The acoustic and perceptual effects of two noise-suppression algorithms

Zakis JA Wise C 《The Journal of the Acoustical Society of America》2007,121(1):433-441

Internal noise generated by hearing-aid circuits can be audible and objectionable to aid users, and may lead to the rejection of hearing aids. Two expansion algorithms were developed to suppress internal noise below a threshold level. The multiple-channel algorithm's expansion thresholds followed the 55-dB SPL long-term average speech spectrum, while the single-channel algorithm suppressed sounds below 45 dBA. With the recommended settings in static conditions, the single-channel algorithm provided lower noise levels, which were perceived as quieter by most normal-hearing participants. However, in dynamic conditions "pumping" noises were more noticeable with the single-channel algorithm. For impaired-hearing listeners fitted with the ADRO amplification strategy, both algorithms maintained speech understanding for words in sentences presented at 55 dB SPL in quiet (99.3% correct). Mean sentence reception thresholds in quiet were 39.4, 40.7, and 41.8 dB SPL without noise suppression, and with the single- and multiple-channel algorithms, respectively. The increase in the sentence reception threshold was statistically significant for the multiple-channel algorithm, but not the single-channel algorithm. Thus, both algorithms suppressed noise without affecting the intelligibility of speech presented at 55 dB SPL, with the single-channel algorithm providing marginally greater noise suppression in static conditions, and the multiple-channel algorithm avoiding pumping noises. 相似文献

8.

Effects of hearing protector devices on speech intelligibility

João Candido Fernandes 《Applied Acoustics》2003,64(6):581-590

The purpose of this study was to determine the influence of hearing protection devices (HPDs) on the understanding of speech in young adults with normal hearing, both in a silent situation and in the presence of ambient noise. The experimental research was carried out with the following variables: five different conditions of HPD use (without protectors, with two earplugs and with two earmuffs); a type of noise (pink noise); 4 test levels (60, 70, 80 and 90 dB[A]); 6 signal/noise ratios (without noise, +5, +10, zero, −5 and −10 dB); 5 repetitions for each case, totalling 600 tests with 10 monosyllables in each one. The variable measure was the percentage of correctly heard words (monosyllabic) in the test. The results revealed that, at the lowest levels (60 and 70 dB), the protectors reduced the intelligibility of speech (compared to the tests without protectors) while, in the presence of ambient noise levels of 80 and 90 dB and unfavourable signal/noise ratios (0, −5 and −10 dB), the HPDs improved the intelligibility. A comparison of the effectiveness of earplugs versus earmuffs showed that the former offer greater efficiency in respect to the recognition of speech, providing a 30% improvement over situations in which no protection is used. As might be expected, this study confirmed that the protectors' influence on speech intelligibility is related directly to the spectral curve of the protector's attenuation. 相似文献

9.

Effects of noise and distortion on speech quality judgments in normal-hearing and hearing-impaired listeners

Arehart KH Kates JM Anderson MC Harvey LO 《The Journal of the Acoustical Society of America》2007,122(2):1150-1164

Noise and distortion reduce speech intelligibility and quality in audio devices such as hearing aids. This study investigates the perception and prediction of sound quality by both normal-hearing and hearing-impaired subjects for conditions of noise and distortion related to those found in hearing aids. Stimuli were sentences subjected to three kinds of distortion (additive noise, peak clipping, and center clipping), with eight levels of degradation for each distortion type. The subjects performed paired comparisons for all possible pairs of 24 conditions. A one-dimensional coherence-based metric was used to analyze the quality judgments. This metric was an extension of a speech intelligibility metric presented in Kates and Arehart (2005) [J. Acoust. Soc. Am. 117, 2224-2237] and is based on dividing the speech signal into three amplitude regions, computing the coherence for each region, and then combining the three coherence values across frequency in a calculation based on the speech intelligibility index. The one-dimensional metric accurately predicted the quality judgments of normal-hearing listeners and listeners with mild-to-moderate hearing loss, although some systematic errors were present. A multidimensional analysis indicates that several dimensions are needed to describe the factors used by subjects to judge the effects of the three distortion types. 相似文献

10.

Effects of noise on speech production: acoustic and perceptual analyses 总被引：4，自引：0，他引：4

W V Summers D B Pisoni R H Bernacki R I Pedlow M A Stokes 《The Journal of the Acoustical Society of America》1988,84(3):917-928

Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise. 相似文献

11.

A simulation study of harmonics regeneration in noise reduction for electric and acoustic stimulation

Hu Y 《The Journal of the Acoustical Society of America》2010,127(5):3145-3153

Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions. 相似文献

12.

The effect of infrasonic and very low frequency noise on speech intelligibility

N.S. Yeowart P.E. Connor 《Applied Acoustics》1974,7(3):229-231

An experiment was performed in which a noise containing frequencies from 10 Hz to 47 Hz was used to mask speech. The behaviour of speech intelligibility with speech presentation level and masking noise level was examined briefly.The infrasonic and low frequency masking noise did reduce the intelligibility of speech. The effect only became significant when the masking noise level was present at levels of 115 dB OASPL or above. 相似文献

13.

Improvements in intelligibility of noisy reverberant speech using a binaural subband adaptive noise-cancellation processing scheme.

P W Shields D R Campbell 《The Journal of the Acoustical Society of America》2001,110(6):3232-3242

This article reports on the performance of an adaptive subband noise cancellation scheme, which performs binaural preprocessing of speech signals for a hearing-aid application. The multi-microphone subband adaptive (MMSBA) signal processing scheme uses the least mean squares (LMS) algorithm in frequency-limited subbands. The use of subbands enables a diverse processing mechanism to be employed, splitting the two-channel wide-band signal into smaller frequency-limited subbands, which can be processed according to their individual signal characteristics. The frequency delimiting used a linear- or cochlear-spaced subband distribution. The effect of the processing scheme on speech intelligibility was assessed in a trial involving 15 hearing-impaired volunteers with moderate sensorineural hearing loss. The acoustic material consisted of speech and speech-shaped noise signals, generated using simulated and real-room acoustic environments, at signal-to-noise ratios (SNRs) in the range -6 to +3 dB. The results show that the MMSBA scheme delivered average speech intelligibility improvements of 11.5%, with a maximum of 37.25%, in noisy reverberant conditions. There was no significant reduction in mean speech intelligibility due to processing, in any of the test conditions. 相似文献

14.

The effects of the addition of low-level, low-noise noise on the intelligibility of sentences processed to remove temporal envelope information

Hopkins K Moore BC Stone MA 《The Journal of the Acoustical Society of America》2010,128(4):2150-2161

The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low. 相似文献

15.

Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English

Li J Yang L Zhang J Yan Y Hu Y Akagi M Loizou PC 《The Journal of the Acoustical Society of America》2011,129(5):3291-3301

A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed. 相似文献

16.

Traffic noise annoyance and speech intelligibility in persons with normal and persons with impaired hearing

G. Aniansson M. Björkman 《Journal of sound and vibration》1983,88(1):99-106

Annoyance ratings in speech intelligibility tests at 45 dB(A) and 55 dB(A) traffic noise were investigated in a laboratory study. Subjects were chosen according to their hearing acuity to be representative of 70-year-old men and women, and of noise-induced hearing losses typical for a great number of industrial workers. These groups were compared with normal hearing subjects of the same sex and, when possible, the same age. The subjects rated their annoyance on an open 100 mm scale. Significant correlations were found between annoyance expressed in millimetres and speech intelligibility in percent when all subjects were taken as one sample. Speech intelligibility was also calculated from physical measurements of speech and noise by using the articulation index method. Observed and calculated speech intelligibility scores are compared and discussed. Also treated is the estimation of annoyance by traffic noise at moderate noise levels via speech intelligibility scores. 相似文献

17.

A new sound coding strategy for suppressing noise in cochlear implants

Hu Y Loizou PC 《The Journal of the Acoustical Society of America》2008,124(1):498-509

In the n-of-m strategy, the signal is processed through m bandpass filters from which only the n maximum envelope amplitudes are selected for stimulation. While this maximum selection criterion, adopted in the advanced combination encoder strategy, works well in quiet, it can be problematic in noise as it is sensitive to the spectral composition of the input signal and does not account for situations in which the masker completely dominates the target. A new selection criterion is proposed based on the signal-to-noise ratio (SNR) of individual channels. The new criterion selects target-dominated (SNR > or = 0 dB) channels and discards masker-dominated (SNR<0 dB) channels. Experiment 1 assessed cochlear implant users' performance with the proposed strategy assuming that the channel SNRs are known. Results indicated that the proposed strategy can restore speech intelligibility to the level attained in quiet independent of the type of masker (babble or continuous noise) and SNR level (0-10 dB) used. Results from experiment 2 showed that a 25% error rate can be tolerated in channel selection without compromising speech intelligibility. Overall, the findings from the present study suggest that the SNR criterion is an effective selection criterion for n-of-m strategies with the potential of restoring speech intelligibility. 相似文献

18.

Monosyllabic word recognition at higher-than-normal speech and noise levels

Studebaker GA Sherbecoe RL McDaniel DM Gwaltney CA 《The Journal of the Acoustical Society of America》1999,105(4):2431-2444

The effects of intensity on monosyllabic word recognition were studied in adults with normal hearing and mild-to-moderate sensorineural hearing loss. The stimuli were bandlimited NU#6 word lists presented in quiet and talker-spectrum-matched noise. Speech levels ranged from 64 to 99 dB SPL and S/N ratios from 28 to -4 dB. In quiet, the performance of normal-hearing subjects remained essentially constant in noise, at a fixed S/N ratio, it decreased as a linear function of speech level. Hearing-impaired subjects performed like normal-hearing subjects tested in noise when the data were corrected for the effects of audibility loss. From these and other results, it was concluded that: (1) speech intelligibility in noise decreases when speech levels exceed 69 dB SPL and the S/N ratio remains constant; (2) the effects of speech and noise level are synergistic; (3) the deterioration in intelligibility can be modeled as a relative increase in the effective masking level; (4) normal-hearing and hearing-impaired subjects are affected similarly by increased signal level when differences in speech audibility are considered; (5) the negative effects of increasing speech and noise levels on speech recognition are similar for all adult subjects, at least up to 80 years; and (6) the effective dynamic range of speech may be larger than the commonly assumed value of 30 dB. 相似文献

19.

Comparison of different forms of compression using wearable digital hearing aids 总被引：1，自引：0，他引：1

Stone MA Moore BC Alcántara JI Glasberg BR 《The Journal of the Acoustical Society of America》1999,106(6):3603-3619

Four different compression algorithms were implemented in wearable digital hearing aids: (1) The slow-acting dual-front-end automatic gain control (AGC) system [B. C. J. Moore, B. R. Glasberg, and M. A. Stone, Br. J. Audiol. 25, 171-182 (1991)], combined with appropriate frequency response equalization, with a compression threshold of 63 dB sound pressure level (SPL) and with a compression ratio of 30 (DUAL-HI); (2) The dual-front-end AGC system combined with appropriate frequency response equalization, with a compression threshold of 55 dB SPL and with a compression ratio of 3 (DUAL-LO). This was intended to give some impression of the levels of sounds in the environment; (3) Fast-acting full dynamic range compression in four channels (FULL-4). The compression was designed to minimize envelope distortion due to overshoots and undershoots; (4) A combination of (2) and (3) above, where each applied less compression than when used alone (DUAL-4). Initial fitting was partly based on the concept of giving a flat specific-loudness pattern for a 65-dB SPL speech-shaped noise input, and this was followed by fine tuning using an adaptive procedure with speech stimuli. Eight subjects with moderate to severe cochlear hearing loss were tested in a counter-balanced design. Subjects had at least 2 weeks experience with each system in everyday life before evaluation using the Abbreviated Profile of Hearing Aid Benefit (APHAB) test and measures of speech intelligibility in quiet (AB word lists at 50 and 80 dB SPL) and noise (adoptive sentence lists in speech-shaped noise, or that same noise amplitude modulated with the envelope of speech from a single talker). The APHAB scores did not indicate clear differences between the four systems. Scores for the AB words in quiet were high for all four systems at both 50 and 80 dB SPL. The speech-to-noise ratios required for 50% intelligibility were low (indicating good performance) and similar for all the systems, but there was a slight trend for better performance in modulated noise with the DUAL-4 system than with the other systems. A subsequent trial where three subjects directly compared each of the four systems in their everyday lives indicated a slight preference for the DUAL-LO system. Overall, the results suggest that it is not necessary to compress fast modulations of the input signal. 相似文献

20.

Enhancing intelligibility of narrowband speech with out-of-band noise: evidence for lateral suppression at high-normal intensity

Bashford JA Warren RM Lenz PW 《The Journal of the Acoustical Society of America》2005,117(1):365-369

Previous studies have shown that the intelligibility of filtered speech can be enhanced by filling stopbands with noise. The present study found that this enhancement occurred only when speech intensity was sufficiently high to degrade performance. Intelligibility decreased by about 15% when narrowband speech was increased from 45 to 65 dBA (corresponding to broadband speech levels of about 60 and 80 dBA), and decreased by 20% at a level of 75 dBA. However, when flanking bands of low-pass and high-pass filtered white noise were added at spectrum levels of -40 to -20 dB relative to the speech, intelligibility of the 75-dBA speech band increased by about 13%. Additional findings confirm that this enhancement of intelligibility depends upon out-of-band stimulation, in agreement with theories proposing that lateral suppressive interactions extend the dynamic range of intensity coding by counteracting effects of auditory-nerve firing-rate saturation at high signal levels. 相似文献