期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An effective cluster-based model for robust speech detection and speech recognition in noisy environments

Górriz JM Ramírez J Segura JC Puntonet CG 《The Journal of the Acoustical Society of America》2006,120(1):470-481

This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms. 相似文献

2.

一种用于强噪声环境下语音识别的含噪Lombard及Loud语音补偿方法

田斌易克初《声学学报》2003,28(1):28-32

针对语音识别中由于强噪声的影响而引起的Lombard和Loud效应进行研究,提出了基于训练数据的加性噪声和Lombard及Loud效应的联合补偿法。对于加性噪声是从谱减法的逆向角度对训练数据在频谱域采用谱加法;对于Lombard和Loud语音,则采用基于隐马尔可夫模型(HMM)状态标注的训练数据补偿,该方法同时考虑Lombard和Loud语音不同声学单元的不同状态在倒谱域的多种变化和多种变异情况下不同声学单元的音长及相对音长的变化。这种基于数据的多模式补偿使模型自动适应多种噪声和语音变异情况,在强噪声环境下具有很强的鲁棒性,并且不影响识别系统在正常环境或正常发音时的识别性能.同时,由于补偿是在训练过程中得到,不增加识别时的计算复杂度。相似文献

3.

Phoneme grouping for speech recognition

D R Reddy 《The Journal of the Acoustical Society of America》1967,41(5):1295-1300

相似文献

4.

Intelligibility enhancement for noisy whispered speech using asymmetric cost function

ZHOU Jian ;ZHENG Wenming ;WANG Qingyun ;ZHAO Li 《声学学报：英文版》2014,(3):312-322

We proposed two whispered speech enhancement methods based on asymmetric cost functions in this paper to deal with the amplification and attenuation distortions of whispered speech distinctively.The modified Itakura-Saito（MIS）distance function provides more penalties to speech amplification distortion,whereas the Kullback-Leibler（KL）divergence function gives more penalties to speech attenuation distortion.The experimental results show that the MIS function based method achieves significant improvement of intelligibility in contrast to the conventional speech enhancement algorithms when the signal-to-noise ratio（SNR）falls below-6 dB,whereas the KL function based one achieves the similar result as the minimum mean square error（MMSE）speech enhancement method.The results show that the effects of the amplification and attenuation distortions on the intelligibility of the enhanced whisper are different,where larger attenuation distortion may result in better intelligibility of speech with low SNR.However,the attenuation distortion has small effects on intelligibility of speech with high SNR. 相似文献

5.

Recognizing articulatory gestures from speech for robust speech recognition

Mitra V Nam H Espy-Wilson C Saltzman E Goldstein L 《The Journal of the Acoustical Society of America》2012,131(3):2270-2287

Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems. 相似文献

6.

非特定人四声识别 总被引：5，自引：0，他引：5

关存太陈永彬《声学学报》1993,18(5):379-385

本文提出一个性能可靠的非特定人汉语普通话四声识别方法.该方法采用中心削波的无偏自相关法作基音周期检测,通过对基音周期进行数据选取、误差修正、平滑、拟合等处理过程,获取两维的判决矢量供四声判决.普通话单音节发音的四声识别率达98%以上。相似文献

7.

含噪语音多路信号同步的研究

方元《声学学报》2001,26(4):324-328

提出了一种解决两路信号同步的方法.与单纯平衡信号的延迟相比,在具有一定长度混响时间的房间里,能够较为有效地抵消房间冲激响应的影响。实验结果证明,该方法在信噪比较低的情况下仍可以取得较好的效果;同步后的信号相加,几乎可以达到信噪比提高的极限。相似文献

8.

Nanophotonic technologies for single-photon devices

A. Gerardino M. Francardi A. Gaggero F. Mattioli R. Leoni L. Balet N. Chauvin F. Marsili A. Fiore 《Opto-Electronics Review》2010,18(4):352-365

相似文献

9.

Cochlear implant speech recognition with speech maskers

Stickney GS Zeng FG Litovsky R Assmann P 《The Journal of the Acoustical Society of America》2004,116(2):1081-1091

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced. 相似文献

10.

提高耳语音可懂度的非对称压缩语音增强方法

周健郑文明王青云赵力《声学学报》2014,39(4):501-508

提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。相似文献

11.

Intelligibility of reverberant noisy speech with ideal binary masking

Roman N Woodruff J 《The Journal of the Acoustical Society of America》2011,130(4):2153-2161

For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners. 相似文献

12.

Whither speech recognition?

A L Samuel J R Pierce 《The Journal of the Acoustical Society of America》1970,47(6):1616-1617

相似文献

13.

Nanophotonic light-trapping theory for solar cells

Zongfu Yu Aaswath Raman Shanhui Fan 《Applied Physics A: Materials Science & Processing》2011,105(2):329-339

Conventional light-trapping theory, based on a ray-optics approach, was developed for standard thick photovoltaic cells. The classical theory established an upper limit for possible absorption enhancement in this context and provided a design strategy for reaching this limit. This theory has become the foundation for light management in bulk silicon PV cells, and has had enormous influence on the optical design of solar cells in general. This theory, however, is not applicable in the nanophotonic regime. Here we develop a statistical temporal coupled-mode theory of light trapping based on a rigorous electromagnetic approach. Our theory reveals that the standard limit can be substantially surpassed when optical modes in the active layer are confined to deep-subwavelength scale, opening new avenues for highly efficient next-generation solar cells. 相似文献

14.

Design of correlation filters for pattern recognition using a noisy reference

Pablo Mario Aguilar-González Vitaly Kober 《Optics Communications》2012,285(5):574-583

We present the design of correlation filters for detection of a target in a noisy input scene when the object of interest is given in a noisy reference image. The target signal, shape and location in the reference image are assumed to be unknown. Two signal models are considered for the input scene: additive and nonoverlapping. The design of the filters consists of automated estimation of needed parameters from a noisy reference image and maximization of the peak-to-output energy ratio criterion. Two filter variants are proposed. The matching error metric is used to determine the regions of the parameter space where each filter variant performs better. Computer simulation results obtained with the proposed filters are presented and evaluated in terms of discrimination capability, location errors, and tolerance to input noise. 相似文献

15.

An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech

Taal CH Hendriks RC Heusdens R Jensen J 《The Journal of the Acoustical Society of America》2011,130(5):3013-3027

Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate in cases where noisy speech is processed by a time-frequency weighting. To this end, an extensive evaluation is presented of objective measure for intelligibility prediction of noisy speech processed with a technique called ideal time frequency (TF) segregation. In total 17 measures are evaluated, including four advanced speech-intelligibility measures (CSII, CSTI, NSEC, DAU), the advanced speech-quality measure (PESQ), and several frame-based measures (e.g., SSNR). Furthermore, several additional measures are proposed. The study comprised a total number of 168 different TF-weightings, including unprocessed noisy speech. Out of all measures, the proposed frame-based measure MCC gave the best results (ρ?=?0.93). An additional experiment shows that the good performing measures in this study also show high correlation with the intelligibility of single-channel noise reduced speech. 相似文献

16.

基于最大似然多项式回归的鲁棒语音识别 总被引：2，自引：0，他引：2

吕勇吴镇扬《声学学报》2010,35(1):88-96

本文针对最大似然线性回归算法线性假设的缺点,将多项式回归方法用于模型自适应,构建了基于最大似然多项式回归的非线性模型自适应算法。该算法在对数谱域用多项式回归方法,逼近每个Mel子带上识别环境模型均值与训练环境模型均值之间的非线性关系。多项式系数通过EM算法和最大似然准则从识别环境下的少量自适应数据中估计。实验结果表明,二阶多项式就可以较好地逼近模型均值的非线性环境变换关系。在噪声补偿和说话人自适应实验中,最大似然多项式回归算法的误识率都明显低于最大似然线性回归算法。本文算法较好地克服了线性模型自适应算法线性假设的缺陷,可同时减小噪声,和说话人的改变或其它因素对语音识别系统的影响,尤其适合说话人和噪声的联合自适应。相似文献

17.

Maximum likelihood polynomial regression for robust speech recognition

L Yong WU Zhenyang 《声学学报：英文版》2011,30(3):358-370

The linear hypothesis is the main disadvantage of maximum likelihood linear regression (MLLR).This paper applies the polynomial regression method to model adaptation and establishes a nonlinear model adaptation algorithm using maximum likelihood polynomial regression(MLPR)for robust speech recognition.In this algorithm,the nonlinear relationship between training and testing Gaussian means in every Mel channel is approximated by a set of polynomials and the polynomial coefficients are estimated from adaptation data in test environment using the expectation-maximization(EM)algorithm and maximum likelihood(ML) criterion.The experimental results show that the second-order polynomial can approximate the actual nonlinear function better and in noise compensation and speaker adaptation,the word error rates of MLPR are significantly lower than those of MLLR.The proposed MLPR algorithm overcomes the limitation of linear hypothesis well and can decrease the impact of noise,speaker and other factors simultaneously.It is especially suitable for joint adaptation of speaker and noise. 相似文献

18.

Automatic speech recognition in cocktail-party situations: a specific training for separated speech

Marti A Cobos M Lopez JJ 《The Journal of the Acoustical Society of America》2012,131(2):1529-1535

相似文献

19.

Computer recognition of connected speech

D R Reddy 《The Journal of the Acoustical Society of America》1967,42(2):329-347

相似文献

20.

改进的对数联合变换相关器在噪声条件下的模式识别

赵昱申铉国《光学技术》2006,32(6):803-805

对联合变换相关器的功率谱进行对数变换可强化其高频分量,锐化相关峰,但同时也会增强其噪声成分,影响抗噪性能。针对这一问题,提出了一种改进的对数联合变换相关器模型。通过改变对数函数形式来提高对数联合变换相关器的抗噪能力。计算机仿真结果表明,在附加高斯白噪声、非重叠低频和高频有色噪声的条件下,改进的对数联合变换相关器的抗噪能力均优于传统的对数联合变换相关器。相似文献