首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Classic demonstrations of the phonemic restoration effect show increased intelligibility of interrupted speech when the interruptions are caused by a plausible masking sound rather than by silent periods. Previous studies of this effect have been conducted exclusively under anechoic or nearly anechoic listening conditions. This study demonstrates that the effect is reversed when sounds are presented in a realistically simulated reverberant room (broadband T(60) = 1.1 s): intelligibility is greater for silent interruptions than for interruptions by unmodulated noise. Additional results suggest that the reversal is primarily due to filling silent intervals with reverberant energy from the speech signal.  相似文献   

2.
When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.  相似文献   

3.
For normal-hearing (NH) listeners, masker energy outside the spectral region of a target signal can improve target detection and identification, a phenomenon referred to as comodulation masking release (CMR). This study examined whether, for cochlear implant (CI) listeners and for NH listeners presented with a "noise vocoded" CI simulation, speech identification in modulated noise is improved by a co-modulated flanking band. In Experiment 1, NH listeners identified noise-vocoded speech in a background of on-target noise with or without a flanking narrow band of noise outside the spectral region of the target. The on-target noise and flanker were either 16-Hz square-wave modulated with the same phase or were unmodulated; the speech was taken from a closed-set corpus. Performance was better in modulated than in unmodulated noise, and this difference was slightly greater when the comodulated flanker was present, consistent with a small CMR of about 1.7 dB for noise-vocoded speech. Experiment 2, which tested CI listeners using the same speech materials, found no advantage for modulated versus unmodulated maskers and no CMR. Thus although NH listeners can benefit from CMR even for speech signals with reduced spectro-temporal detail, no CMR was observed for CI users.  相似文献   

4.
Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking.  相似文献   

5.
Siren noises usually severely disturb the intelligibility of voice communication inside the cabs of police, paramedic and fire vehicles. It is often desired that such unwanted noise can be removed from the speech signal. In this paper, a new method is proposed to adaptively cancel siren noises and enhance speech signals. Based on the characteristics of siren noises, an anti-speech filter and a time delayer are employed in the single and dual channel noise cancellation systems to reduce the siren noises. Experiment results demonstrate that the effectiveness of the proposed method for canceling the siren noises and the performance of the enhanced speech signal is satisfying.  相似文献   

6.
The author proposed to adopt wide dynamic range compression and adaptive multichannel modulation-based noise reduction algorithms to enhance hearing protector performance. Three experiments were conducted to investigate the effects of compression and noise reduction configurations on the amount of noise reduction, speech intelligibility, and overall preferences using existing digital hearing aids. In Experiment 1, sentence materials were recorded in speech spectrum noise and white noise after being processed by eight digital hearing aids. When the hearing aids were set to 3:1 compression, the amount of noise reduction achieved was enhanced or maintained for hearing aids with parallel configurations, but reduced for hearing aids with serial configurations. In Experiments 2 and 3, 16 normal-hearing listeners' speech intelligibility and perceived sound quality were tested when they listened to speech recorded through hearing aids with parallel and serial configurations. Regardless of the configuration, the noise reduction algorithms reduced the noise level and maintained speech intelligibility in white noise. Additionally, the listeners preferred the parallel rather than the serial configuration in 3:1 conditions and the serial configuration in 1:1 rather than 3:1 compression when the noise reduction algorithms were activated. Implications for hearing protector and hearing aid design are discussed.  相似文献   

7.
Internal noise generated by hearing-aid circuits can be audible and objectionable to aid users, and may lead to the rejection of hearing aids. Two expansion algorithms were developed to suppress internal noise below a threshold level. The multiple-channel algorithm's expansion thresholds followed the 55-dB SPL long-term average speech spectrum, while the single-channel algorithm suppressed sounds below 45 dBA. With the recommended settings in static conditions, the single-channel algorithm provided lower noise levels, which were perceived as quieter by most normal-hearing participants. However, in dynamic conditions "pumping" noises were more noticeable with the single-channel algorithm. For impaired-hearing listeners fitted with the ADRO amplification strategy, both algorithms maintained speech understanding for words in sentences presented at 55 dB SPL in quiet (99.3% correct). Mean sentence reception thresholds in quiet were 39.4, 40.7, and 41.8 dB SPL without noise suppression, and with the single- and multiple-channel algorithms, respectively. The increase in the sentence reception threshold was statistically significant for the multiple-channel algorithm, but not the single-channel algorithm. Thus, both algorithms suppressed noise without affecting the intelligibility of speech presented at 55 dB SPL, with the single-channel algorithm providing marginally greater noise suppression in static conditions, and the multiple-channel algorithm avoiding pumping noises.  相似文献   

8.
In the present study, the effects of interference from combined noises on speech transmission were investigated in a simulated open public space. Sound fields for dominant noises were predicted using a typical urban square model surrounded by buildings. Then road traffic noise and two types of construction noises, corresponding to stationary and impulsive noises, were selected as background noises. Listening tests were performed on a group of adults, and the quality of speech transmission was evaluated using listening difficulty as well as intelligibility scores. During the listening tests, two factors that affect speech transmission performance were considered: (1) temporal characteristics of construction noise (stationary or impulsive) and (2) the levels of the construction and road traffic noises. The results indicated that word intelligibility scores and listening difficulty ratings were affected by the temporal characteristics of construction noise due to fluctuations in the background noise level. It was also observed that listening difficulty is unable to describe the speech transmission in noisy open public spaces showing larger variation than did word intelligibility scores.  相似文献   

9.
The brain can restore missing speech segments using linguistic knowledge and context. The phonemic restoration effect is commonly quantified by the increase in intelligibility of interrupted speech when the silent gaps are filled with noise bursts. In normal hearing, the restoration effect is negatively correlated with the baseline scores with interrupted speech; listeners with poorer baseline show more benefit from restoration. Reanalyzing data from Bas?kent et al. [(2010). Hear. Res. 260, 54-62], correlations with mild and moderate hearing impairment were observed to differ than with normal hearing. This analysis further shows that hearing impairment may affect top-down restoration of speech.  相似文献   

10.
When sinusoidal amplitude modulation (SAM) is applied to noise or tone carriers, the stimuli can generate audible distortion products in the region of the modulation frequency. As a result, when bandpass-filtered SAM noise is used to investigate temporal processing, a band of unmodulated noise is typically positioned at the modulation frequency to mask any distortion products. This study was designed to investigate the distortion products for bandpass noise carriers, and so reduce ambiguity about the form of this distortion and its role in perception. The distortion consists of two distortion-noise bands and a distortion tone at the modulation frequency. In the first two experiments, the level and phase of the distortion tone are measured using two different experimental paradigms. In the third experiment, modulation-frequency difference limens are measured for filtered SAM noise and it is shown that performance deteriorates markedly when the distortion tone is canceled. In a fourth experiment, masked threshold is measured at low frequencies for bands of high-frequency, unmodulated noise with the same levels and spectra as the SAM noises in the earlier experiments. The results confirm that unmodulated noise also produces quadratic distortion which may explain some aspects of earlier reports on remote masking.  相似文献   

11.
Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation.  相似文献   

12.
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.  相似文献   

13.
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.  相似文献   

14.
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.  相似文献   

15.
The present study sought to establish whether speech recognition can be disrupted by the presence of amplitude modulation (AM) at a remote spectral region, and whether that disruption depends upon the rate of AM. The goal was to determine whether this paradigm could be used to examine which modulation frequencies in the speech envelope are most important for speech recognition. Consonant identification for a band of speech located in either the low- or high-frequency region was measured in the presence of a band of noise located in the opposite frequency region. The noise was either unmodulated or amplitude modulated by a sinusoid, a band of noise with a fixed absolute bandwidth, or a band of noise with a fixed relative bandwidth. The frequency of the modulator was 4, 16, 32, or 64 Hz. Small amounts of modulation interference were observed for all modulator types, irrespective of the location of the speech band. More important, the interference depended on modulation frequency, clearly supporting the existence of selectivity of modulation interference with speech stimuli. Overall, the results suggest a primary role of envelope fluctuations around 4 and 16 Hz without excluding the possibility of a contribution by faster rates.  相似文献   

16.
Although cochlear implant (CI) users have enjoyed good speech recognition in quiet, they still have difficulties understanding speech in noise. We conducted three experiments to determine whether a directional microphone and an adaptive multichannel noise reduction algorithm could enhance CI performance in noise and whether Speech Transmission Index (STI) can be used to predict CI performance in various acoustic and signal processing conditions. In Experiment I, CI users listened to speech in noise processed by 4 hearing aid settings: omni-directional microphone, omni-directional microphone plus noise reduction, directional microphone, and directional microphone plus noise reduction. The directional microphone significantly improved speech recognition in noise. Both directional microphone and noise reduction algorithm improved overall preference. In Experiment II, normal hearing individuals listened to the recorded speech produced by 4- or 8-channel CI simulations. The 8-channel simulation yielded similar speech recognition results as in Experiment I, whereas the 4-channel simulation produced no significant difference among the 4 settings. In Experiment III, we examined the relationship between STIs and speech recognition. The results suggested that STI could predict actual and simulated CI speech intelligibility with acoustic degradation and the directional microphone, but not the noise reduction algorithm. Implications for intelligibility enhancement are discussed.  相似文献   

17.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

18.
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.  相似文献   

19.
用于无监督语音降噪的听觉感知鲁棒主成分分析法   总被引:2,自引:0,他引:2       下载免费PDF全文
闵刚  邹霞  韩伟  张雄伟  谭薇 《声学学报》2017,42(2):246-256
针对现有稀疏低秩分解语音降噪方法对人耳听觉感知特性应用不充分、语音失真易被感知的问题,提出了一种用于语音降噪的听觉感知鲁棒主成分分析法。由于耳蜗基底膜对于频率感知具有非线性特性,该方法采用耳蜗谱图作为语噪分离的基础。此外,选用符合人耳听觉感知特性的板仓-斋田距离度量作为优化目标函数,在稀疏低秩建模过程中引入非负约束以使分解分量更符合实际物理含义,并在交替方向乘子法框架下推导了具有闭合解形式的迭代优化算法。文中方法在语音降噪时是完全无监督的,无需预先训练语音或噪声模型。多种类型噪声和不同信噪比条件下的仿真实验验证了该方法的有效性,噪声抑制效果较目前同类算法更为显著,且降噪后语音的可懂度和总体质量有所提高、至少相当。   相似文献   

20.
When listeners hear a target signal in the presence of competing sounds, they are quite good at extracting information at instances when the local signal-to-noise ratio of the target is most favorable. Previous research suggests that listeners can easily understand a periodically interrupted target when it is interleaved with noise. It is not clear if this ability extends to the case where an interrupted target is alternated with a speech masker rather than noise. This study examined speech intelligibility in the presence of noise or speech maskers, which were either continuous or interrupted at one of six rates between 4 and 128 Hz. Results indicated that with noise maskers, listeners performed significantly better with interrupted, rather than continuous maskers. With speech maskers, however, performance was better in continuous, rather than interrupted masker conditions. Presumably the listeners used continuity as a cue to distinguish the continuous masker from the interrupted target. Intelligibility in the interrupted masker condition was improved by introducing a pitch difference between the target and speech masker. These results highlight the role that target-masker differences in continuity and pitch play in the segregation of competing speech signals.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号