首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper addresses the problem of the speech quality improvement using adaptive filtering algorithms. Recently in Djendi and Bendoumia (2014) [1], we have proposed a new two-channel backward algorithm for noise reduction and speech intelligibility enhancement. The main drawback of proposed two-channel subband algorithm is its poor performance when the number of subband is high. This inconvenience is well seen in the steady state regime values. The source of this problem is the fixed step-sizes of the cross-adaptive filtering algorithms that distort the speech signal when they are selected high and degrade the convergence speed behaviours when they are selected small. In this paper, we propose four modifications of this algorithm which allow improving both the convergence speed and the steady state values even in very noisy condition and a high number of subbands. To confirm the good performance of the four proposed variable-step-size SBBSS algorithms, we have carried out several simulations in various noisy environments. In these simulations, we have evaluated objective and subjective criteria as the system mismatch, the cepstral distance, the output signal-to-noise-ratio, and the mean opinion score (MOS) method to compare the four proposed variables step-size versions of the SBBSS algorithm with their original versions and with the two-channel fullband backward (2CFB) least mean square algorithm.  相似文献   

2.
Although cochlear implant (CI) users have enjoyed good speech recognition in quiet, they still have difficulties understanding speech in noise. We conducted three experiments to determine whether a directional microphone and an adaptive multichannel noise reduction algorithm could enhance CI performance in noise and whether Speech Transmission Index (STI) can be used to predict CI performance in various acoustic and signal processing conditions. In Experiment I, CI users listened to speech in noise processed by 4 hearing aid settings: omni-directional microphone, omni-directional microphone plus noise reduction, directional microphone, and directional microphone plus noise reduction. The directional microphone significantly improved speech recognition in noise. Both directional microphone and noise reduction algorithm improved overall preference. In Experiment II, normal hearing individuals listened to the recorded speech produced by 4- or 8-channel CI simulations. The 8-channel simulation yielded similar speech recognition results as in Experiment I, whereas the 4-channel simulation produced no significant difference among the 4 settings. In Experiment III, we examined the relationship between STIs and speech recognition. The results suggested that STI could predict actual and simulated CI speech intelligibility with acoustic degradation and the directional microphone, but not the noise reduction algorithm. Implications for intelligibility enhancement are discussed.  相似文献   

3.
This paper addresses the problem of speech intelligibility enhancement by adaptive filtering algorithms employed with subband techniques. The two structures named the forward and backward blind source separation structures are extensively used in the speech enhancement and source separation areas, and largely studied in the literature with convolutive and non-convolutive mixtures. These two structures use two-microphones to generate the convolutive/non-convolutive mixing signal, and provide at the outputs the target and the jammer signal components. In this paper, we focus our interest on the backward structure employed to enhance the speech signal from a convolutive mixture. Furthermore, we propose a subband implementation of this structure to improve its behavior with speech signal. The new proposed subband-Backward BSS (SBBSS) structure allows a very important improvement of the convergence speed of the adaptive filtering algorithms when the subband-number is selected high. In order to improve the robustness of the proposed subband structure, we have adapted then applied a new criterion that combines the System Mismatch and the Mean-Errors criterion minimization. The proposed subband backward structure, when it is combined with this new criterion minimization, allows to enhance the output speech signal by reducing the distortion and the noise components. The performance of the proposed subband backward structure is validated through several objective criteria which are given and described in this paper.  相似文献   

4.
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.  相似文献   

5.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

6.
This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained.  相似文献   

7.
This paper describes an application of the multichannel signal processing technique of adaptive decorrelation filtering to the design of an assistive listening system. A simulated "dinner table" scenario was studied. The speech signal of a desired talker was corrupted by three simultaneous speech jammers and by a speech-shaped diffusive noise. The technique of adaptive decorrelation filtering processing was used to extract the desired speech from the interference speech and noise. The effectiveness of the assistive listening system was evaluated by observing improvements in A-weighted signal-to-noise ratio (SNR) and in sentence intelligibility, where the latter was evaluated in a listening test with eight normal hearing subjects and three subjects with hearing impairments. Significant improvements in SNR and sentence intelligibility were achieved with the use of the assistive listening system. For subjects with normal hearing, the speech reception threshold was improved by 3 to 5 dBA, and for subjects with hearing impairments, the threshold was improved by 4 to 8 dBA.  相似文献   

8.
It is well known that auditory system of human beings has excellent performance which automatic speech recognition(ASR) systems can’t match,and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing.In this paper,the Gammatone filterbank is applied to speech signals for front-end temporal filtering,and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT,an order adaptation method based on the instantaneous frequency is proposed,and its performance is compared with the method based on ambiguity function.ASR experiments are conducted on clean and noisy Putonghua digits,and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline,and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function.Further more,the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.  相似文献   

9.
This paper reports the results of a large scale, detailed acoustic survey of 42 open plan classrooms of varying design in the UK each of which contained between 2 and 14 teaching areas or classbases. The objective survey procedure, which was designed specifically for use in open plan classrooms, is described. The acoustic measurements relating to speech intelligibility within a classbase, including ambient noise level, intrusive noise level, speech to noise ratio, speech transmission index, and reverberation time, are presented. The effects on speech intelligibility of critical physical design variables, such as the number of classbases within an open plan unit and the selection of acoustic finishes for control of reverberation, are examined. This analysis enables limitations of open plan classrooms to be discussed and acoustic design guidelines to be developed to ensure good listening conditions. The types of teaching activity to provide adequate acoustic conditions, plus the speech intelligibility requirements of younger children, are also discussed.  相似文献   

10.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

11.
Most information in speech is carried in spectral changes over time, rather than in static spectral shape per se. A form of signal processing aimed at enhancing spectral changes over time was developed and evaluated using hearing-impaired listeners. The signal processing was based on the overlap-add method, and the degree and type of enhancement could be manipulated via four parameters. Two experiments were conducted to assess speech intelligibility and clarity preferences. Three sets of parameter values (one corresponding to a control condition), two types of masker (steady speech-spectrum noise and two-talker speech) and two signal-to-masker ratios (SMRs) were used for each masker type. Generally, the effects of the processing were small, although intelligibility was improved by about 8 percentage points relative to the control condition for one set of parameter values using the steady noise masker at -6 dB SMR. The processed signals were not preferred over those for the control condition, except for the steady noise masker at -6 dB SMR. Further work is needed to determine whether tailoring the processing to the characteristics of the individual hearing-impaired listener is beneficial.  相似文献   

12.
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.  相似文献   

13.
王辉  张玲华 《声学学报》2012,37(5):534-538
自适应波束形成算法是数字助听器的核心算法之一。针对自适应波束形成算法中不可避免存在的语音泄漏,本文先对传统GSC结构自适应波束形成算法进行理论研究,并提出一种汉语处理技术,补偿泄漏的语音。这种汉语处理技术利用汉语语音特有的基音频率信息,调整语音幅度谱包络,提高谱包络与基频曲线形状的相似度以提高语音的可懂度。针对泄漏的语音在高频清辅音段有较大损失的特点,在频域上对清辅音进行放大,在不改变共振峰结构的情况下,提高清辅音的能量,同时降低语音间隔段GSC算法泄漏的噪声能量,提高对语音的辨别。仿真实验结果表明,这种汉语语音处理能够补偿自适应波束形成算法造成的语音泄漏,提高语音的可懂度。   相似文献   

14.
Binaural speech intelligibility of individual listeners under realistic conditions was predicted using a model consisting of a gammatone filter bank, an independent equalization-cancellation (EC) process in each frequency band, a gammatone resynthesis, and the speech intelligibility index (SII). Hearing loss was simulated by adding uncorrelated masking noises (according to the pure-tone audiogram) to the ear channels. Speech intelligibility measurements were carried out with 8 normal-hearing and 15 hearing-impaired listeners, collecting speech reception threshold (SRT) data for three different room acoustic conditions (anechoic, office room, cafeteria hall) and eight directions of a single noise source (speech in front). Artificial EC processing errors derived from binaural masking level difference data using pure tones were incorporated into the model. Except for an adjustment of the SII-to-intelligibility mapping function, no model parameter was fitted to the SRT data of this study. The overall correlation coefficient between predicted and observed SRTs was 0.95. The dependence of the SRT of an individual listener on the noise direction and on room acoustics was predicted with a median correlation coefficient of 0.91. The effect of individual hearing impairment was predicted with a median correlation coefficient of 0.95. However, for mild hearing losses the release from masking was overestimated.  相似文献   

15.
This paper proposes a speech feature extraction method that utilizes periodicity and nonperiodicity for robust automatic speech recognition. The method was motivated by the auditory comb filtering hypothesis proposed in speech perception research. The method divides input signals into subband signals, which it then decomposes into their periodic and nonperiodic components using comb filters independently designed in each subband. Both features are used as feature parameters. This representation exploits the robustness of periodicity measurements as regards noise while preserving the overall speech information content. In addition, periodicity is estimated independently in each subband, providing robustness as regards noise spectrum bias. The framework is similar to that of a previous study [Jackson et al., Proc. of Eurospeech. (2003), pp. 2321-2324], which is based on cascade processing motivated by speech production. However, the proposed method differs in its design philosophy, which is based on parallel distributed processing motivated by speech perception. Continuous digit speech recognition experiments in the presence of noise confirmed that the proposed method performs better than conventional methods when the noise in the training and test data sets differs.  相似文献   

16.
快速收敛最小方差无畸变响应算法研究及应用   总被引:4,自引:0,他引:4  
周胜增  杜选民 《声学学报》2009,34(6):515-520
常规最小方差无畸变响应(MVDR)自适应波束形成是一种高分辨窄带波束形成器,它是利用实际声场的窄带互谱密度矩阵(CSDM)估计出自适应波束形成权向量。在实际应用中,MVDR算法需要较长的观测时间估计协方差矩阵,不利于对高速运动目标进行定位;对于宽带目标信号,MVDR算法需要对每一个CSDM进行求逆运算,计算量较大;在相干源条件下,目标信号之间会发生"对消"现象,MVDR算法性能急剧恶化。本文提出了基于子带子阵处理的快速收敛MVDR自适应波束形成方法。首先将全频带划分成一组子带,将接收线阵划分成一组子阵,然后对每一子带计算降维的驾驶协方差矩阵(STCM),从而得到快速收敛MVDR自适应波束形成的权值和空间谱估计结果。同时采用双向空间平滑方法对相干源进行MVDR空间谱估计。仿真和海试数据处理结果表明该算法在保证高分辨力的同时,具有瞬时收敛的性能,双向空间平滑技术具有良好的解相干性能。   相似文献   

17.
Reverberation usually degrades speech intelligibility for spatially separated speech and noise sources since spatial unmasking is reduced and late reflections decrease the fidelity of the received speech signal. The latter effect could not satisfactorily be predicted by a recently presented binaural speech intelligibility model [Beutelmann et al. (2010). J. Acoust. Soc. Am. 127, 2479-2497]. This study therefore evaluated three extensions of the model to improve its predictions: (1) an extension of the speech intelligibility index based on modulation transfer functions, (2) a correction factor based on the room acoustical quantity "definition," and (3) a separation of the speech signal into useful and detrimental parts. The predictions were compared to results of two experiments in which speech reception thresholds were measured in a reverberant room in quiet and in the presence of a noise source for listeners with normal hearing. All extensions yielded better predictions than the original model when the influence of reverberation was strong, while predictions were similar for conditions with less reverberation. Although model (3) differed substantially in the assumed interaction of binaural processing and early reflections, its predictions were very similar to model (2) that achieved the best fit to the data.  相似文献   

18.
The Speech Transmission Index (STI) is a physical metric that is well correlated with the intelligibility of speech degraded by additive noise and reverberation. The traditional STI uses modulated noise as a probe signal and is valid for assessing degradations that result from linear operations on the speech signal. Researchers have attempted to extend the STI to predict the intelligibility of nonlinearly processed speech by proposing variations that use speech as a probe signal. This work considers four previously proposed speech-based STI methods and four novel methods, studied under conditions of additive noise, reverberation, and two nonlinear operations (envelope thresholding and spectral subtraction). Analyzing intermediate metrics in the STI calculation reveals why some methods fail for nonlinear operations. Results indicate that none of the previously proposed methods is adequate for all of the conditions considered, while four proposed methods produce qualitatively reasonable results and warrant further study. The discussion considers the relevance of this work to predicting the intelligibility of cochlear-implant processed speech.  相似文献   

19.
20.
提出一种用于球形阵列自适应波束形成的掩蔽函数估计方法.该方法利用包含空间信息的球谐系数提取低维空间向量,并采用复高斯混合模型和深度学习两种方案来估计掩蔽函数,最终利用估计的掩蔽函数设计最小方差无失真响应波束形成器,以达到空域滤波的效果.理论分析和仿真实验证明,对于相同时长的声信号,球谐域掩蔽函数估计方法的计算复杂度比传...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号