首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.  相似文献   

2.
吕钊  吴小培  张超  李密 《声学学报》2010,35(4):465-470
提出了一种基于独立分量分析(ICA)的语音信号鲁棒特征提取算法,用以解决在卷积噪声环境下语音信号的训练与识别特征不匹配的问题。该算法通过短时傅里叶变换将带噪语音信号从时域转换到频域后,采用复值ICA方法从带噪语音的短时谱中分离出语音信号的短时谱,然后根据所得到的语音信号短时谱计算美尔倒谱系数(MFCC)及其一阶差分作为特征参数。在仿真与真实环境下汉语数字语音识别实验中,所提算法相比较传统的MFCC其识别正确率分别提升了34.8%和32.6%。实验结果表明基于ICA方法的语音特征在卷积噪声环境下具有良好的鲁棒性。   相似文献   

3.
This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.  相似文献   

4.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。  相似文献   

5.
I.IntroductionKa1manfilteringisjustamethodtoestimatestatistica1lythestateoftheobservedsystemfromthecorruptedsigna1s,andthiskindofcstimationisarecurrcneeestimationbasedon1inear,nonbiasandminimumvariance.Moreover,Ka1manfilteringisapplicabletonon-sta-honarysignalsandtime-variantdynamicsystem.Therefore,Kalmanfilteringisveryapplica-bletoenhancingthespeechsigna1sthatarecorruptedbynoise.ThispaperreportStheconcretcmethodofenhanccmentofnoisyspccchanditscxperimentresults.Experimentsindicate:Afterthes…  相似文献   

6.
This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained.  相似文献   

7.
早晚期混响划分对理想比值掩蔽在语音识别性能上的影响   总被引:2,自引:0,他引:2  
真实环境中存在的噪声和混响会降低语音识别系统的性能。封闭空间中的混响包括直达声、早期反射和后期混响3部分,它们对语音识别系统具有不同的影响.我们研究了早期反射和后期混响的不同划分方法,以其中的早期反射为目标语音,计算出了不同的理想比值掩蔽并研究了它们对语音识别系统性能的影响;在此基础上,利用双向长短时记忆网络(BLSTM)估计理想比值掩蔽,测试它们对语音识别系统性能的影响.实验结果表明,基于Abel早期反射和后期混响的划分方法,理想比值掩蔽能够降低词错误率约2.8%;基于BLSTM的估计方法过低估计了理想比值掩蔽,未能有效提高语音识别系统的性能。   相似文献   

8.
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.  相似文献   

9.
王梦蛟  周泽权  李志军  曾以成 《物理学报》2018,67(6):60501-060501
混沌信号协同滤波去噪算法充分利用了混沌信号的自相似结构特征,具有良好的信噪比提升性能.针对该算法的滤波参数优化问题,考虑到最优滤波参数的选取受到信号特征、采样频率和噪声水平的影响,为提高该算法的自适应性使其更符合实际应用需求,基于排列熵提出一种滤波参数自动优化准则.依据不同噪声水平的混沌信号排列熵的不同,首先选取不同滤波参数对含噪混沌信号进行去噪,然后计算各滤波参数对应重构信号的排列熵,最后通过比较各重构信号的排列熵,选取排列熵最小的重构信号对应的滤波参数为最优滤波参数,实现滤波参数的优化.分析了不同信号特征、采样频率和噪声水平情况下滤波参数的选取规律.仿真结果表明,该参数优化准则能在不同条件下对滤波参数进行有效的自动最优化,提高了混沌信号协同滤波去噪算法的自适应性.  相似文献   

10.
结合幅度谱和功率谱字典的语音增强方法   总被引:1,自引:0,他引:1       下载免费PDF全文
从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法。在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表。对分别由幅度谱和功率谱恢复而来的两路信号进行自适应加权重构,结合相位补偿函数得到增强后的语音信号。实验结果表明,该方法在平稳、非平稳噪声环境下相比于单一谱特征的语音增强方法平均提高31.6%,改善了语音增强方法的性能。   相似文献   

11.
Vicen R  Gil R  Jarabo P  Rosa M  López F  Martínez D 《Ultrasonics》2004,42(1-9):355-360
Structure noise from inhomogeneous micro-structures makes the detection of flaws present in highly scattering materials difficult. Several techniques have been applied to improve the signal-to-noise ratio (SNR) in order to make flaw detection easier. Linear filtering does not provide good results because both structure noise and flaw signal concentrate energy in the same frequency band. Non-linear filtering can be used to reduce the structure noise of ultrasonic signals. Therefore, neural networks are applied in this work for this purpose. In order to use neural networks for non-linear filtering, dynamic structures must be applied. The easiest way to implement a neural network with the capability of processing temporal patterns is to consider them spatial ones, applying the signal into a tapped delay line of finite extension, that is the input of a static neural network (for example, a multi-layer perceptron). In this work, a dynamic neural network has been built to filter ultrasonic signals with structure noise, and has been trained with the real-time back-propagation algorithm, using as inputs 3000 synthetic ultrasonic signals of 896 samples each. Target signals for training are the same as the ones used as inputs but without noise. The neural network is trained in order to generate as output the target signal when the noisy input one is applied. For testing the performance of the non-linear filter, a new set of 500 noisy signals has been used. The SNR improvement is about 6 dB average. The results show that this non-linear filtering method is quite useful as pre-processing stage in flaw detection systems.  相似文献   

12.
Although cochlear implant (CI) users have enjoyed good speech recognition in quiet, they still have difficulties understanding speech in noise. We conducted three experiments to determine whether a directional microphone and an adaptive multichannel noise reduction algorithm could enhance CI performance in noise and whether Speech Transmission Index (STI) can be used to predict CI performance in various acoustic and signal processing conditions. In Experiment I, CI users listened to speech in noise processed by 4 hearing aid settings: omni-directional microphone, omni-directional microphone plus noise reduction, directional microphone, and directional microphone plus noise reduction. The directional microphone significantly improved speech recognition in noise. Both directional microphone and noise reduction algorithm improved overall preference. In Experiment II, normal hearing individuals listened to the recorded speech produced by 4- or 8-channel CI simulations. The 8-channel simulation yielded similar speech recognition results as in Experiment I, whereas the 4-channel simulation produced no significant difference among the 4 settings. In Experiment III, we examined the relationship between STIs and speech recognition. The results suggested that STI could predict actual and simulated CI speech intelligibility with acoustic degradation and the directional microphone, but not the noise reduction algorithm. Implications for intelligibility enhancement are discussed.  相似文献   

13.
小波多分辨分析用于化学发光光谱的噪声滤除   总被引:3,自引:1,他引:2  
本文利用小波变换对化学发光光谱进行了多分辨信号分解 ,有效地滤除噪声 ,提高了光谱信噪比。讨论了不同小波基和分解级次对分析结果的影响。分析表明小波分析对离散信号处理具有一定优势。  相似文献   

14.
Although many audio-visual speech experiments have focused on situations where the presence of an incongruent visual speech signal influences the perceived utterance heard by an observer, there are also documented examples of a related effect in which the presence of an incongruent audio speech signal influences the perceived utterance seen by an observer. This study examined the effects that different distracting audio signals had on performance in a color and number keyword speechreading task. When the distracting sound was noise, time-reversed speech, or continuous speech, it had no effect on speechreading. However, when the distracting audio signal consisted of speech that started at the same time as the visual stimulus, speechreading performance was substantially degraded. This degradation did not depend on the semantic similarity between the target and masker speech, but it was substantially reduced when the onset of the audio speech was shifted relative to that of the visual stimulus. Overall, these results suggest that visual speech perception is impaired by the presence of a simultaneous mismatched audio speech signal, but that other types of audio distracters have little effect on speechreading performance.  相似文献   

15.
When listeners hear a target signal in the presence of competing sounds, they are quite good at extracting information at instances when the local signal-to-noise ratio of the target is most favorable. Previous research suggests that listeners can easily understand a periodically interrupted target when it is interleaved with noise. It is not clear if this ability extends to the case where an interrupted target is alternated with a speech masker rather than noise. This study examined speech intelligibility in the presence of noise or speech maskers, which were either continuous or interrupted at one of six rates between 4 and 128 Hz. Results indicated that with noise maskers, listeners performed significantly better with interrupted, rather than continuous maskers. With speech maskers, however, performance was better in continuous, rather than interrupted masker conditions. Presumably the listeners used continuity as a cue to distinguish the continuous masker from the interrupted target. Intelligibility in the interrupted masker condition was improved by introducing a pitch difference between the target and speech masker. These results highlight the role that target-masker differences in continuity and pitch play in the segregation of competing speech signals.  相似文献   

16.
Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task.  相似文献   

17.
《Physica A》2006,361(1):337-354
The detection of changes in the parameter values of a nonlinear dynamic system is a branch of study with multiple applications. In this paper, we explore a variant of an automatic detector and clustering of slight parameter variations in nonlinear dynamic systems proposed by Torres et al. [Automatic detection of slight changes in nonlinear dynamical systems using multiresolution entropy tools, Int. J. Bifurc. Chaos 11(4) (2001) 967–981]. The new method takes the advantages of the continuous multiresolution entropy to localize slight changes in the parameters, and uses self-organizing maps to quantify and cluster these changes. We discuss the performance of this method while applied to automatic segmentation of natural and synthetic diphthongs in the presence of additive noise. Our results show the potentiality of the proposed method.  相似文献   

18.
谢映海  杨维  张玉 《物理学报》2010,59(11):8255-8263
最小能量(小波)框架在信号处理领域有着广泛的应用前景,但目前只能应用在连续信号上.为解决这一问题,给出了离散信号空间上的最小能量框架的定义,并证明了它所具备的一些优良性质.在实际应用中,针对通信领域中的受加性高斯白噪声污染的二进制矩形脉冲信号提出一个新的去噪算法,利用离散空间上一个最小能量框架对接收波形的抽样离散数列进行去噪工作,获得了较好的处理效果.仿真结果表明,如果利用该算法对接收波形进行去噪预处理,则接收机可以降低误码率,在信噪比4 dB 处获得了3.4 dB的性能增益. 关键词: 离散信号空间 最小能量(小波)框架 二进制矩形脉冲信号 去噪  相似文献   

19.
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.  相似文献   

20.
Several dereverberation algorithms have been studied. The sampling frequencies used in conventional studies are typically 8–16 kHz because their main purpose is preprocessing for improving the intelligibility of speech communication and articulation for automatic speech recognition. However, in next-generation communication systems, techniques to analyze and reproduce not only semantic information of sound but also more high-definition components such as spatial information and directivity will be increasingly necessary. To decompose these sound field characteristics with high definition, a dereverberation algorithm that is useful at high sampling frequencies is an important technique to process sound that includes high-frequency spectra such as musical sounds. The LInear-predictive Multichannel Equalization (LIME) algorithm is a promising dereverberation method. Using the LIME algorithm, however, a dereverberation signal cannot be solved at high sampling frequencies when the source signal is colored, such as in the case of speech and sound of musical signals. Because the rank of the correlation matrix calculated from such a colored signal is not full, the characteristic polynomial cannot be calculated precisely. To alleviate this problem, we propose preprocessing of all input signals with filters to whiten their spectra so that this algorithm can function for colored signals at high sampling frequencies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号