首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 505 毫秒
1.
早晚期混响划分对理想比值掩蔽在语音识别性能上的影响   总被引:2,自引:0,他引:2  
真实环境中存在的噪声和混响会降低语音识别系统的性能。封闭空间中的混响包括直达声、早期反射和后期混响3部分,它们对语音识别系统具有不同的影响.我们研究了早期反射和后期混响的不同划分方法,以其中的早期反射为目标语音,计算出了不同的理想比值掩蔽并研究了它们对语音识别系统性能的影响;在此基础上,利用双向长短时记忆网络(BLSTM)估计理想比值掩蔽,测试它们对语音识别系统性能的影响.实验结果表明,基于Abel早期反射和后期混响的划分方法,理想比值掩蔽能够降低词错误率约2.8%;基于BLSTM的估计方法过低估计了理想比值掩蔽,未能有效提高语音识别系统的性能。   相似文献   

2.
For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.  相似文献   

3.
When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker sounds. This study attempted to isolate the effects that energetic masking, defined as the loss of detectable target information due to the spectral overlap of the target and masking signals, has on multitalker speech perception. This was achieved through the use of ideal time-frequency binary masks that retained those spectro-temporal regions of the acoustic mixture that were dominated by the target speech but eliminated those regions that were dominated by the interfering speech. The results suggest that energetic masking plays a relatively small role in the overall masking that occurs when speech is masked by interfering speech but a much more significant role when speech is masked by interfering noise.  相似文献   

4.
This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained.  相似文献   

5.
This paper proposes a speech feature extraction method that utilizes periodicity and nonperiodicity for robust automatic speech recognition. The method was motivated by the auditory comb filtering hypothesis proposed in speech perception research. The method divides input signals into subband signals, which it then decomposes into their periodic and nonperiodic components using comb filters independently designed in each subband. Both features are used as feature parameters. This representation exploits the robustness of periodicity measurements as regards noise while preserving the overall speech information content. In addition, periodicity is estimated independently in each subband, providing robustness as regards noise spectrum bias. The framework is similar to that of a previous study [Jackson et al., Proc. of Eurospeech. (2003), pp. 2321-2324], which is based on cascade processing motivated by speech production. However, the proposed method differs in its design philosophy, which is based on parallel distributed processing motivated by speech perception. Continuous digit speech recognition experiments in the presence of noise confirmed that the proposed method performs better than conventional methods when the noise in the training and test data sets differs.  相似文献   

6.
Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task.  相似文献   

7.
结合幅度谱和功率谱字典的语音增强方法   总被引:1,自引:0,他引:1       下载免费PDF全文
从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法.在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表....  相似文献   

8.
Considerable advances in automatic speech recognition have been made in the last decades, thanks specially to the use of hidden Markov models. In the field of speech signal analysis, different techniques have been developed. However, deterioration in the performance of the speech recognizers has been observed when they are trained with clean signal and tested with noisy signals. This is still an open problem in this field. Continuous multiresolution entropy has been shown to be robust to additive noise in applications to different physiological signals. In previous works we have included Shannon and Tsallis entropies, and their corresponding divergences, in different speech analysis and recognition systems. In this paper we present an extension of the continuous multiresolution entropy to different divergences and we propose them as new dimensions for the pre-processing stage of a speech recognition system. This approach takes into account information about changes in the dynamics of speech signal at different scales. The methods proposed here are tested with speech signals corrupted with babble and white noise. Their performance is compared with classical mel cepstral parametrization. The results suggest that these continuous multiresolution entropy related measures provide valuable information to the speech recognition system and that they could be considered to be included as an extra component in the pre-processing stage.  相似文献   

9.
The article presents spectral models of additive and modulation noise in speech. The purpose is to learn about the causes of noise in the spectra of normal and disordered voices and to gauge whether the spectral properties of the perturbations of the phonatory excitation signal can be inferred from the spectral properties of the speech signal. The approach to modeling consists of deducing the Fourier series of the perturbed speech, assuming that the Fourier series of the noise and of the clean monocycle-periodic excitation are known. The models explain published data, take into account the effects of supraglottal tremor, demonstrate the modulation distortion owing to vocal tract filtering, establish conditions under which noise cues of different speech signals may be compared, and predict the impossibility of inferring the spectral properties of the frequency modulating noise from the spectral properties of the frequency modulation noise (e.g., phonatory jitter and frequency tremor). The general conclusion is that only phonatory frequency modulation noise is spectrally relevant. Other types of noise in speech are either epiphenomenal, or their spectral effects are masked by the spectral effects of frequency modulation noise.  相似文献   

10.
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.  相似文献   

11.
Among various speech enhancement methods, dual-microphone methods are of a great importance for their low cost implementation and for exploiting spatial-filtering benefits of the microphone arrays. Coherence based methods are well-known as efficient two-microphone noise reduction techniques. These techniques do not work well, when received noise signals are correlated. These can be improved when the cross power spectral density (CPSD) of noise is available. In this paper, we propose an iterative approach for estimation of the noise CPSD to be employed in coherence based methods. We compare the proposed iterative noise CPSD estimation with a noise CPSD estimation technique based on voice activity detector (VAD), both of which are employed in a two-microphone speech enhancement, separately. Evaluation results show that the two-microphone speech enhancement scheme utilizing the proposed noise CPSD estimation technique performs superior than the enhancement system using the VAD-based noise CPSD estimation.  相似文献   

12.
13.
Speech-intelligibility tests auralized in a virtual classroom were used to investigate the optimal reverberation times for verbal communication for normal-hearing and hearing-impaired adults. The idealized classroom had simple geometry, uniform surface absorption, and an approximately diffuse sound field. It contained a speech source, a listener at a receiver position, and a noise source located at one of two positions. The relative output levels of the speech and noise sources were varied, along with the surface absorption and the corresponding reverberation time. The binaural impulse responses of the speech and noise sources in each classroom configuration were convolved with Modified Rhyme Test (MRT) and babble-noise signals. The resulting signals were presented to normal-hearing and hearing-impaired adult subjects to identify the configurations that gave the highest speech intelligibilities for the two groups. For both subject groups, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time included both zero and nonzero values. The results generally support previous theoretical results.  相似文献   

14.
I.IntroductionKa1manfilteringisjustamethodtoestimatestatistica1lythestateoftheobservedsystemfromthecorruptedsigna1s,andthiskindofcstimationisarecurrcneeestimationbasedon1inear,nonbiasandminimumvariance.Moreover,Ka1manfilteringisapplicabletonon-sta-honarysignalsandtime-variantdynamicsystem.Therefore,Kalmanfilteringisveryapplica-bletoenhancingthespeechsigna1sthatarecorruptedbynoise.ThispaperreportStheconcretcmethodofenhanccmentofnoisyspccchanditscxperimentresults.Experimentsindicate:Afterthes…  相似文献   

15.
We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM.  相似文献   

16.
This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.  相似文献   

17.
In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.  相似文献   

18.
When listeners hear a target signal in the presence of competing sounds, they are quite good at extracting information at instances when the local signal-to-noise ratio of the target is most favorable. Previous research suggests that listeners can easily understand a periodically interrupted target when it is interleaved with noise. It is not clear if this ability extends to the case where an interrupted target is alternated with a speech masker rather than noise. This study examined speech intelligibility in the presence of noise or speech maskers, which were either continuous or interrupted at one of six rates between 4 and 128 Hz. Results indicated that with noise maskers, listeners performed significantly better with interrupted, rather than continuous maskers. With speech maskers, however, performance was better in continuous, rather than interrupted masker conditions. Presumably the listeners used continuity as a cue to distinguish the continuous masker from the interrupted target. Intelligibility in the interrupted masker condition was improved by introducing a pitch difference between the target and speech masker. These results highlight the role that target-masker differences in continuity and pitch play in the segregation of competing speech signals.  相似文献   

19.
In this contribution, a novel dual-channel speech enhancement technique is introduced. The proposed approach uses the dissimilarity between the power of received signals in the two channels as a criterion for speech enhancement and noise reduction. We claim that in near field conditions, where the distances between microphones and sound source are short, the difference in the received power levels at the two microphones is an estimate of the clean speech signal power. Then, apply this theory to present an optimum method for speech enhancement. Fortunately, the method has the ability to cope with problems such as transient noise and nearby microphones which are two of the main problems of the proposed dual-microphone speech enhancement techniques. Using objective speech quality measures and spectrogram analysis, we show that the proposed method results in improved speech quality.  相似文献   

20.
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号