首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation.  相似文献   

2.
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.  相似文献   

3.
Two experiments investigated the impact of reverberation and masking on speech understanding using cochlear implant (CI) simulations. Experiment 1 tested sentence recognition in quiet. Stimuli were processed with reverberation simulation (T=0.425, 0.266, 0.152, and 0.0 s) and then either processed with vocoding (6, 12, or 24 channels) or were subjected to no further processing. Reverberation alone had only a small impact on perception when as few as 12 channels of information were available. However, when the processing was limited to 6 channels, perception was extremely vulnerable to the effects of reverberation. In experiment 2, subjects listened to reverberated sentences, through 6- and 12-channel processors, in the presence of either speech-spectrum noise (SSN) or two-talker babble (TTB) at various target-to-masker ratios. The combined impact of reverberation and masking was profound, although there was no interaction between the two effects. This differs from results obtained in subjects listening to unprocessed speech where interactions between reverberation and masking have been shown to exist. A speech transmission index (STI) analysis indicated a reasonably good prediction of speech recognition performance. Unlike previous investigations, the SSN and TTB maskers produced equivalent results, raising questions about the role of informational masking in CI processed speech.  相似文献   

4.
In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.  相似文献   

5.
Listening conditions in everyday life typically include a combination of reverberation and nonstationary background noise. It is well known that sentence intelligibility is adversely affected by these factors. To assess their combined effects, an approach is introduced which combines two methods of predicting speech intelligibility, the extended speech intelligibility index (ESII) and the speech transmission index. First, the effects of reverberation on nonstationary noise (i.e., reduction of masker modulations) and on speech modulations are evaluated separately. Subsequently, the ESII is applied to predict the speech reception threshold (SRT) in the masker with reduced modulations. To validate this approach, SRTs were measured for ten normal-hearing listeners, in various combinations of nonstationary noise and artificially created reverberation. After taking the characteristics of the speech corpus into account, results show that the approach accurately predicts SRTs in nonstationary noise and reverberation for normal-hearing listeners. Furthermore, it is shown that, when reverberation is present, the benefit from masker fluctuations may be substantially reduced.  相似文献   

6.
The benefits of combined electric and acoustic stimulation (EAS) in terms of speech recognition in noise are well established; however the underlying factors responsible for this benefit are not clear. The present study tests the hypothesis that having access to acoustic information in the low frequencies makes it easier for listeners to glimpse the target. Normal-hearing listeners were presented with vocoded speech alone (V), low-pass (LP) filtered speech alone, combined vocoded and LP speech (LP+V) and with vocoded stimuli constructed so that the low-frequency envelopes were easier to glimpse. Target speech was mixed with two types of maskers (steady-state noise and competing talker) at -5 to 5 dB signal-to-noise ratios. Results indicated no advantage of LP+V in steady noise, but a significant advantage over V in the competing talker background, an outcome consistent with the notion that it is easier for listeners to glimpse the target in fluctuating maskers. A significant improvement in performance was noted with the modified glimpsed stimuli over the original vocoded stimuli. These findings taken together suggest that a significant factor contributing to the EAS advantage is the enhanced ability to glimpse the target.  相似文献   

7.
Little is known about the extent to which reverberation affects speech intelligibility by cochlear implant (CI) listeners. Experiment 1 assessed CI users' performance using Institute of Electrical and Electronics Engineers (IEEE) sentences corrupted with varying degrees of reverberation. Reverberation times of 0.30, 0.60, 0.80, and 1.0 s were used. Results indicated that for all subjects tested, speech intelligibility decreased exponentially with an increase in reverberation time. A decaying-exponential model provided an excellent fit to the data. Experiment 2 evaluated (offline) a speech coding strategy for reverberation suppression using a channel-selection criterion based on the signal-to-reverberant ratio (SRR) of individual frequency channels. The SRR reflects implicitly the ratio of the energies of the signal originating from the early (and direct) reflections and the signal originating from the late reflections. Channels with SRR larger than a preset threshold were selected, while channels with SRR smaller than the threshold were zeroed out. Results in a highly reverberant scenario indicated that the proposed strategy led to substantial gains (over 60 percentage points) in speech intelligibility over the subjects' daily strategy. Further analysis indicated that the proposed channel-selection criterion reduces the temporal envelope smearing effects introduced by reverberation and also diminishes the self-masking effects responsible for flattened formants.  相似文献   

8.
Reverberation usually degrades speech intelligibility for spatially separated speech and noise sources since spatial unmasking is reduced and late reflections decrease the fidelity of the received speech signal. The latter effect could not satisfactorily be predicted by a recently presented binaural speech intelligibility model [Beutelmann et al. (2010). J. Acoust. Soc. Am. 127, 2479-2497]. This study therefore evaluated three extensions of the model to improve its predictions: (1) an extension of the speech intelligibility index based on modulation transfer functions, (2) a correction factor based on the room acoustical quantity "definition," and (3) a separation of the speech signal into useful and detrimental parts. The predictions were compared to results of two experiments in which speech reception thresholds were measured in a reverberant room in quiet and in the presence of a noise source for listeners with normal hearing. All extensions yielded better predictions than the original model when the influence of reverberation was strong, while predictions were similar for conditions with less reverberation. Although model (3) differed substantially in the assumed interaction of binaural processing and early reflections, its predictions were very similar to model (2) that achieved the best fit to the data.  相似文献   

9.
吴礼福  王华  程义  郭业才 《应用声学》2016,35(4):288-293
混响是室内声学中的重要现象,在室内设计与音频信号处理中都需要测量或估计混响时间。本文改进了一种基于最大似然估计的混响时间盲估计方法,即采用说话人在房间中自然说话时发出的混响语音信号来估计混响时间的方法。该方法首先确定语音衰减段的最优边界,其次计算该衰减段的两个额外参数,据此筛选出符合条件的语音段,最后将满足条件的语音段采用最大似然估计得到混响时间估计值。在五个不同混响时间条件下的仿真表明,与已有方法相比,改进方法估计的混响时间同真实混响时间的偏差更小,方差更低,估计准确性较高。  相似文献   

10.
For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.  相似文献   

11.
Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers.  相似文献   

12.
Listeners were given the task to identify the stop-consonant [t] in the test-word "stir" when the word was embedded in a carrier sentence. Reverberation was added to the test-word, but not to the carrier, and the ability to identify the [t] decreased because the amplitude modulations associated with the [t] were smeared. When a similar amount of reverberation was also added to the carrier sentence, the listeners' ability to identify the stop-consonant was restored. This phenomenon has in previous research been considered as evidence for an extrinsic compensation mechanism for reverberation in the human auditory system [Watkins (2005). J. Acoust. Soc. Am. 118, 249-262]. In the present study, the reverberant test-word was embedded in additional non-reverberant carriers, such as white noise, speech-shaped noise and amplitude modulated noise. In addition, a reference condition was included where the test-word was presented in isolation, i.e., without any carrier stimulus. In all of these conditions, the ability to identify the stop-consonant [t] was enhanced relative to the condition using the non-reverberant speech carrier. The results suggest that the non-reverberant speech carrier produces an interference effect that impedes the identification of the stop-consonant. These findings raise doubts about the existence of the compensation mechanism.  相似文献   

13.
Eight normal listeners and eight listeners with sensorineural hearing losses were compared on a gap-detection task and on a speech perception task. The minimum detectable gap (71% correct) was determined as a function of noise level, and a time constant was computed from these data for each listener. The time constants of the hearing-impaired listeners were significantly longer than those of the normal listeners. The speech consisted of sentences that were mixed with two levels of noise and subjected to two kinds of reverberation (real or simulated). The speech thresholds (minimum signal-to-noise ratio for 50% correct) were significantly higher for the hearing-impaired listeners than for the normal listeners for both kinds of reverberation. The longer reverberation times produced significantly higher thresholds than the shorter times. The time constant was significantly correlated with all the speech threshold measures (r = -0.58 to -0.74) and a measure of hearing threshold loss also correlated significantly with all the speech thresholds (r = 0.53 to 0.95). A principal components analysis yielded two factors that accounted for the intercorrelations. The factor loadings for the time constant were similar to those on the speech thresholds for real reverberation and the loadings for hearing loss were similar to those of the thresholds for simulated reverberation.  相似文献   

14.
The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak.  相似文献   

15.
This paper presents a comparison between measured and calculated acoustical parameters in eight high school classrooms. The mid frequency unoccupied and occupied reverberation times and the 1 kHz sound propagation (SP) of the reverberant and total speech levels in occupied classrooms were compared with analytical and numerical predictions. The ODEON 6.5 code and the Sabine formula gave the most accurate results for reverberation time in the empty classrooms with overall relative differences of 8.1% and 9.7%, respectively. With students present, the Eyring and Sabine formulas and Hodgson’s empirical model resulted to be the most accurate with relative differences of 11.1%, 13.2% and 13.6%, respectively. The reverberant speech levels decrease with increasing distance from the source at rates varying from −1.21 to −2.62 dB/distance doubling, and the Hodgson model fits the slope values quite well. The best predictions of the SP of the reverberant and total speech levels are shown, in order of accuracy, for the ODEON code, the Barron and Lee theory and the classical diffuse field theory. Lower rms errors were found when the measured total acoustic absorptions were used. The lowest rms error of 1.4 dB for the SP of the total speech level were found for both the ODEON code and the Barron and Lee theory.  相似文献   

16.
Speech signals recorded with a distant microphone usually are interfered by the spatial reverberation in the room, which severely degrades the clarity and intelligibility of speech. A speech dereverberation method based on spectral subtraction and spectral line enhancement is proposed in this paper. Following the generalized statistical reverberation model, the power spectrum of late reverberation is estimated and removed from the reverberation speech by the spectral subtraction method. Then, according to the human auditory model, a spectral line enhancement technique based on adaptive post-filtering is adopted to further eliminate the reverberant components between adjacent speech formants. The proposed method can effectively suppress the spatial reverberation and improve the auditory perception of speech. The subjective and objective evaluation results reveal that the perceptual quality of speech is greatly improved by the proposed method.  相似文献   

17.
Speech perception by subjects with sensorineural hearing impairment was studied using various types of short-term (syllabic) amplitude compression. Average speech level was approximately constant. In quiet, a single-channel wideband compression (WBC) with compression ratio equal to 10, attack time 10 ms and release time 90 ms produced significantly higher scores than a three-channel multiband compression (MBC) or no compression when a nonsense syllable test (City University of New York) was used. The scores under MBC, WBC, or no compression were not significantly different when the modified rhyme test (MRT) was used. But when overshoots caused by compression were clipped, the MRT scores improved significantly. The influence of MBC on reverberant speech and of WBC on noisy speech were tested with the MRT. Reverberation reduced the scores, and this reduction was the same with compression as without. Noise added to speech before compression also reduced the scores, but the reduction was larger with compression than without. When noise was added after compression, an improvement was observed when WBC had a compression ratio of about 5, attack time 1 ms, and release time 30 ms. Other compression modes (e.g., with high-frequency pre-emphasis) did not show an improvement. The results indicate that WBC with a compression ratio around 5, attack time shorter than 3 ms, and release time between 30 and 90 ms can be beneficial if signal-to-noise ratio is large, or, if in a noisy or reverberant environment, the effects of noise or reverberation are eliminated by using listening systems.  相似文献   

18.
Perceptual distances among single tokens of American English vowels were established for nonreverberant and reverberant conditions. Fifteen vowels in the phonetic context (b-t), embedded in the sentence "Mark the (b-t) again" were recorded by a male talker. For the reverberant condition, the sentences were played through a room with a reverberation time of 1.2 s. The CVC syllables were removed from the sentences and presented in pairs to ten subjects with audiometrically normal hearing, who judged the similarity of the syllable pairs separately for the nonreverberant and reverberant conditions. The results were analyzed by multidimensional scaling procedures, which showed that the perceptual data were accounted for by a three-dimensional vowel space. Correlations were obtained between the coordinates of the vowels along each dimension and selected acoustic parameters. For both conditions, dimensions 1 and 2 were highly correlated with formant frequencies F2 and F1, respectively, and dimension 3 was correlated with the product of the duration of the vowels and the difference between F3 and F1 expressed on the Bark scale. These observations are discussed in terms of the influence of reverberation on speech perception.  相似文献   

19.
The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values.  相似文献   

20.
Overlap-masking degrades speech intelligibility in reverberation [R. H. Bolt and A. D. MacDonald, J. Acoust. Soc. Am. 21(6), 577-580 (1949)]. To reduce the effect of this degradation, steady-state suppression has been proposed as a preprocessing technique [Arai et al., Proc. Autumn Meet. Acoust. Soc. Jpn., 2001; Acoust. Sci. Tech. 23(8), 229-232 (2002)]. This technique automatically suppresses steady-state portions of speech that have more energy but are less crucial for speech perception. The present paper explores the effect of steady-state suppression on syllable identification preceded by /a/ under various reverberant conditions. In each of two perception experiments, stimuli were presented to 22 subjects with normal hearing. The stimuli consisted of mono-syllables in a carrier phrase with and without steady-state suppression and were presented under different reverberant conditions using artificial impulse responses. The results indicate that steady-state suppression statistically improves consonant identification for reverberation times of 0.7 to 1.2 s. Analysis of confusion matrices shows that identification of voiced consonants, stop and nasal consonants, and bilabial, alveolar, and velar consonants were especially improved by steady-state suppression. The steady-state suppression is demonstrated to be an effective preprocessing method for improving syllable identification by reducing the effect of overlap-masking under specific reverberant conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号