期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Application of the articulation index to the speech recognition of normal and impaired listeners wearing hearing protection

G Wilde L E Humes 《The Journal of the Acoustical Society of America》1990,87(3):1192-1199

The present study examined the application of the articulation index (AI) as a predictor of the speech-recognition performance of normal and hearing-impaired listeners with and without hearing protection. The speech-recognition scores of 12 normal and 12 hearing-impaired subjects were measured for a wide range of conditions designed to be representative of those in the workplace. Conditions included testing in quiet, in two types of background noise (white versus speech spectrum), at three signal-to-noise ratios (+ 5, 0, - 5 dB), and in three conditions of protection (unprotected, earplugs, earmuffs). The mean results for all 21 listening conditions and both groups of subjects were accurately described by the AI. Moreover, a single transfer-function relating performance to the AI could describe all the data from both groups. 相似文献

2.

Consonant influences on vowel duration as a function of speech intelligibility for hearing-impaired individuals

R L Whitehead 《The Journal of the Acoustical Society of America》1986,79(6):2084-2088

The purpose of this investigation was to study the effects of consonant environment on vowel duration for normally hearing males, hearing-impaired males with intelligible speech, and hearing-impaired males with semi-intelligible speech. The results indicated that the normally hearing and intelligible hearing-impaired speakers exhibited similar trends with respect to consonant influence on vowel duration; i.e., vowels were longer in duration, in a voiced environment as compared with a voiceless, and in a fricative environment as compared with a plosive. The semi-intelligible hearing-impaired speakers, however, failed to demonstrate a consonant effect on vowel duration, and produced the vowels with significantly longer durations when compared with the other two groups of speakers. These data provide information regarding temporal conditions which may contribute to the decreased intelligibility of hearing-impaired persons. 相似文献

3.

一种基于后验概率差值的拒识算法

下载免费PDF全文

李莹莹王成友蔡宣平《应用声学》2004,23(5):32-35

本文提出了一种孤立词语音识别系统中基于后验概率差值的拒识算法。研究了作为拒识特征时，输入词的后验概率和后验概率差值之间的区别，并将多层感知人工神经网络用于拒识特征的学习。相比现存的几种拒识算法，本算法几乎不需要额外的计算和存储量。当识别率为98．2％时，拒识率达到了95．4％。相似文献

4.

Integration efficiency for speech perception within and across sensory modalities by normal-hearing and hearing-impaired individuals

Grant KW Tufts JB Greenberg S 《The Journal of the Acoustical Society of America》2007,121(2):1164-1176

In face-to-face speech communication, the listener extracts and integrates information from the acoustic and optic speech signals. Integration occurs within the auditory modality (i.e., across the acoustic frequency spectrum) and across sensory modalities (i.e., across the acoustic and optic signals). The difficulties experienced by some hearing-impaired listeners in understanding speech could be attributed to losses in the extraction of speech information, the integration of speech cues, or both. The present study evaluated the ability of normal-hearing and hearing-impaired listeners to integrate speech information within and across sensory modalities in order to determine the degree to which integration efficiency may be a factor in the performance of hearing-impaired listeners. Auditory-visual nonsense syllables consisting of eighteen medial consonants surrounded by the vowel [a] were processed into four nonoverlapping acoustic filter bands between 300 and 6000 Hz. A variety of one, two, three, and four filter-band combinations were presented for identification in auditory-only and auditory-visual conditions: A visual-only condition was also included. Integration efficiency was evaluated using a model of optimal integration. Results showed that normal-hearing and hearing-impaired listeners integrated information across the auditory and visual sensory modalities with a high degree of efficiency, independent of differences in auditory capabilities. However, across-frequency integration for auditory-only input was less efficient for hearing-impaired listeners. These individuals exhibited particular difficulty extracting information from the highest frequency band (4762-6000 Hz) when speech information was presented concurrently in the next lower-frequency band (1890-2381 Hz). Results suggest that integration of speech information within the auditory modality, but not across auditory and visual modalities, affects speech understanding in hearing-impaired listeners. 相似文献

5.

一种改进的DNN-HMM的语音识别方法* 总被引：1，自引：1，他引：1

下载免费PDF全文

李云红梁思程贾凯莉张秋铭宋鹏何琛王刚毅李禹萱《应用声学》2019,38(3):371-377

针对深度神经网络与隐马尔可夫模型(DNN-HMM)结合的声学模型在语音识别过程中建模能力有限等问题,提出了一种改进的DNN-HMM模型语音识别算法。首先根据深度置信网络(DBN)结合深度玻尔兹曼机(DBM),建立深度神经网络声学模型,然后提取梅尔频率倒谱系数(MFCC)和对数域的Mel滤波器组系数(Fbank)作为声学特征参数,通过TIMIT语音数据集进行实验。实验结果表明:结合了DBM的DNN-HMM模型相比DNN-HMM模型更具优势,其中,使用MFCC声学特征在词错误率与句错误率方面分别下降了1.26%和0.20%。此外,使用默认滤波器组的Fbank特征在词错误率与句错误率方面分别下降了0.48%和0.82%,并且适量增加滤波器组可以降低错误率。总之,研究取得句错误率与词错误率分别降低到21.06%和3.12%的好成绩。相似文献

6.

Evaluating the articulation index for auditory-visual input 总被引：4，自引：0，他引：4

K W Grant L D Braida 《The Journal of the Acoustical Society of America》1991,89(6):2952-2960

An investigation of the auditory-visual (AV) articulation index (AI) correction procedure outlined in the ANSI standard [ANSI S3.5-1969 (R1986)] was made by evaluating auditory (A), visual (V), and auditory-visual sentence identification for both wideband speech degraded by additive noise and a variety of bandpass-filtered speech conditions presented in quiet and in noise. When the data for each of the different listening conditions were averaged across talkers and subjects, the procedure outlined in the standard was fairly well supported, although deviations from the predicted AV score were noted for individual subjects as well as individual talkers. For filtered speech signals with AIA less than 0.25, there was a tendency for the standard to underpredict AV scores. Conversely, for signals with AIA greater than 0.25, the standard consistently overpredicted AV scores. Additionally, synergistic effects, where the AIA obtained from the combination of different bandpass-filtered conditions was greater than the sum of the individual AIA's, were observed for all nonadjacent filter-band combinations (e.g., the addition of a low-pass band with a 630-Hz cutoff and a high-pass band with a 3150-Hz cutoff). These latter deviations from the standard violate the basic assumption of additivity stated by Articulation Theory, but are consistent with earlier reports by Pollack [I. Pollack, J. Acoust. Soc. Am. 20, 259-266 (1948)], Licklider [J. C. R. Licklider, Psychology: A Study of a Science, Vol. 1, edited by S. Koch (McGraw-Hill, New York, 1959), pp. 41-144], and Kryter [K. D. Kryter, J. Acoust. Soc. Am. 32, 547-556 (1960)]. 相似文献

7.

基于回归分析的语音识别快速自适应算法 总被引：2，自引：2，他引：0

吕萍颜永红《声学学报》2005,30(3):222-228

从回归分析的角度推导出最大似然线性回归算法的等价算法——最小二乘线性回归算法,以及相应的多元线性回归模型。该模型中回归因子间存在着多重共线性,它导致了算法在自适应数据很少时失效。为减轻多重共线性的影响,提出改进算法:伪自适应数据算法。实验表明,当仅有1s~3s自适应数据时,新算法使得系统误识率相对下降2%~6%,随着自适应数据增多,其性能与最大似然线性回归(或最小二乘线性同归)算法趋于一致。相似文献

8.

Prediction of speech intelligibility for normal-hearing and cochlearly hearing-impaired listeners

C Ludvigsen 《The Journal of the Acoustical Society of America》1987,82(4):1162-1171

The word recognition ability of 4 normal-hearing and 13 cochlearly hearing-impaired listeners was evaluated. Filtered and unfiltered speech in quiet and in noise were presented monaurally through headphones. The noise varied over listening situations with regard to spectrum, level, and temporal envelope. Articulation index theory was applied to predict the results. Two calculation methods were used, both based on the ANSI S3.5-1969 20-band method [S3.5-1969 (American National Standards Institute, New York)]. Method I was almost identical to the ANSI method. Method II included a level- and hearing-loss-dependent calculation of masking of stationary and on-off gated noise signals and of self-masking of speech. Method II provided the best prediction capability, and it is concluded that speech intelligibility of cochlearly hearing-impaired listeners may also, to a first approximation, be predicted from articulation index theory. 相似文献

9.

On enhancement of spectral contrast in speech for hearing-impaired listeners

H T Bunnell 《The Journal of the Acoustical Society of America》1990,88(6):2546-2556

A digital processing method is described for altering spectral contrast (the difference in amplitude between spectral peaks and valleys) in natural utterances. Speech processed with programs implementing the contrast alteration procedure was presented to listeners with moderate to severe sensorineural hearing loss. The task was a three alternative (/b/,/d/, or /g/) stop consonant identification task for consonants at a fixed location in short nonsense utterances. Overall, tokens with enhanced contrast showed moderate gains in percentage correct stop consonant identification when compared to unaltered tokens. Conversely, reducing spectral contrast generally reduced percent correct stop consonant identification. Contrast alteration effects were inconsistent for utterances containing /d/. The observed contrast effects also interacted with token intelligibility. 相似文献

10.

一种对加性噪声和信道函数联合补偿的模型估计方法 总被引：1，自引：0，他引：1

王智国吴及戴礼荣王仁华《声学学报》2008,33(3):238-243

语音识别系统在面对实际环境中多变的加性噪声和信道差异的影响时性能急剧下降,抑制这些噪声和差异所造成的性能下降具有重要意义。作者提出了一种模型补偿算法,使用句子中的非语音段估计加性噪声,然后利用EM算法估计信道函数,从而在倒谱域上对失配的声学模型进行联合补偿。实验表明,相比基线系统,采用该算法的系统的平均性能相对提升幅度超过50%。算法可以动态跟踪环境的变化,性能表现优于一些传统的语音识别稳健性处理算法。相似文献

11.

An effective cluster-based model for robust speech detection and speech recognition in noisy environments

Górriz JM Ramírez J Segura JC Puntonet CG 《The Journal of the Acoustical Society of America》2006,120(1):470-481

This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms. 相似文献

12.

基于决策树的汉语三音子模型 总被引：6，自引：2，他引：6

高升徐波黄泰翼《声学学报》2000,25(6):504-509

基于决策树理论的上下文相关声学模型在英语语音识别中已经得到了比较深入的研究和应用,但在汉语语音识别中的应用则研究的比较少。本文基于决策树理论建立了汉语语境相关模型-三音于模型,讨论了决策构建模所要解决的几个重要问题:(1)基本建模单元集的选择,(2)音子类别集的设计,(3)评估函数的选择,(4)停止准则的选择,(5)决策树的建立和三音子模型的生成,本文着重分析了两种不同建模单元的性能:对音子类别集的设计提出了一些一般性的准则,并对我们设计的类别集进行了统计分析;分析了三音子模型在语音库的覆盖程度。实验结果表明,基于决策树的三音子声学模型建立的识别系统与双音子声学模型系统比较,误识率下降了24.7%。相似文献

13.

基于双向循环神经网络的汉语语音识别*

下载免费PDF全文

李鹏杨元维杜李慧高贤君周意蒋梦月张净波《应用声学》2020,39(3):464-471

当前基于深度神经网络模型中,虽然其隐含层可设置多层,对复杂问题适应能力强,但每层之间的节点连接是相互独立的,这种结构特性导致了在语音序列中无法利用上下文相关信息来提高识别效果,而传统的循环神经网络虽然做出了改进,但是只能对上文信息进行利用。针对以上问题,该文采用可以同时利用语音序列中上下文相关信息的双向循环神经网络模型与深度神经网络模型相结合,并应用于语音识别。构建具有5层隐含层的模型,其中第3层为双向循环神经网络结构,其他层采用深度神经网络结构。实验结果表明:加入了双向循环神经网络结构的模型与其他模型相比,较好地提高了识别正确率;噪声对双向循环神经网络汉语识别有重要影响,尤其是训练集和测试集附加噪声类型不同时,单一的含噪声语音的训练模型无法适应不同噪声类型的语音识别;调整神经网络模型中隐含层神经元数量后,识别正确率并不是一直随着隐含层中神经元数量的增加而增加,神经元数量数目增加到一定程度后正确率出现了降低的趋势。相似文献

14.

Binaural intelligibility prediction based on the speech transmission index

van Wijngaarden SJ Drullman R 《The Journal of the Acoustical Society of America》2008,123(6):4514-4523

Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of environments and applications, it is essentially a monaural model. Advantages of binaural hearing in speech intelligibility are disregarded. In specific conditions, this leads to considerable mismatches between subjective intelligibility and the STI. A binaural version of the STI was developed based on interaural cross correlograms, which shows a considerably improved correspondence with subjective intelligibility in dichotic listening conditions. The new binaural STI is designed to be a relatively simple model, which adds only few parameters to the original standardized STI and changes none of the existing model parameters. For monaural conditions, the outcome is identical to the standardized STI. The new model was validated on a set of 39 dichotic listening conditions, featuring anechoic, classroom, listening room, and strongly echoic environments. For these 39 conditions, speech intelligibility [consonant-vowel-consonant (CVC) word score] and binaural STI were measured. On the basis of these conditions, the relation between binaural STI and CVC word scores closely matches the STI reference curve (standardized relation between STI and CVC word score) for monaural listening. A better-ear STI appears to perform quite well in relation to the binaural STI model; the monaural STI performs poorly in these cases. 相似文献

15.

Speech recognition and the Articulation Index for normal and hearing-impaired listeners

C A Kamm D D Dirks T S Bell 《The Journal of the Acoustical Society of America》1985,77(1):281-288

The purpose of this experiment was to determine the applicability of the Articulation Index (AI) model for characterizing the speech recognition performance of listeners with mild-to-moderate hearing loss. Performance-intensity functions were obtained from five normal-hearing listeners and 11 hearing-impaired listeners using a closed-set nonsense syllable test for two frequency responses (uniform and high-frequency emphasis). For each listener, the fitting constant Q of the nonlinear transfer function relating AI and speech recognition was estimated. Results indicated that the function mapping AI onto performance was approximately the same for normal and hearing-impaired listeners with mild-to-moderate hearing loss and high speech recognition scores. For a hearing-impaired listener with poor speech recognition ability, the AI procedure was a poor predictor of performance. The AI procedure as presently used is inadequate for predicting performance of individuals with reduced speech recognition ability and should be used conservatively in applications predicting optimal or acceptable frequency response characteristics for hearing-aid amplification systems. 相似文献

16.

Perceived speech rate: the effects of articulation rate and speaking style in spontaneous speech

Koreman J 《The Journal of the Acoustical Society of America》2006,119(1):582-596

In this study, the effect of articulation rate and speaking style on the perceived speech rate is investigated. The articulation rate is measured both in terms of the intended phones, i.e., phones present in the assumed canonical form, and as the number of actual, realized phones per second. The combination of these measures reflects the deletion of phones, which is related to speaking style. The effect of the two rate measures on the perceived speech rate is compared in two listening experiments on the basis of a set of intonation phrases with carefully balanced intended and realized phone rates, selected from a German database of spontaneous speech. Because the balance between input-oriented (effort) and output-oriented (communicative) constraints may be different at fast versus slow speech rates, the effect of articulation rate is compared both for fast and for slow phrases from the database. The effect of the listeners' own speaking habits is also investigated to evaluate if listeners' perception is based on a projection of their own behavior as a speaker. It is shown that listener judgments reflect both the intended and realized phone rates, and that their judgments are independent of the constraint balance and their own speaking habits. 相似文献

17.

噪声对特征综合法语音识别性能的影响 总被引：4，自引：1，他引：3

王成友汤叔祺梁甸农《声学学报》1997,22(3):282-285

本文考查了噪声对特征综合法语音识别的影响,实验表明在不同环境噪声情况,文献1提出的通过特征综合提高语音识别性能的三种方法仍然能够提高识别率7%左右。相似文献

18.

Binaural temporal fine structure sensitivity, cognitive function, and spatial speech recognition of hearing-impaired listeners (L)

Neher T Lunner T Hopkins K Moore BC 《The Journal of the Acoustical Society of America》2012,131(4):2561-2564

The relationships between spatial speech recognition (SSR; the ability to understand speech in complex spatial environments), binaural temporal fine structure (TFS) sensitivity, and three cognitive tasks were assessed for 17 hearing-impaired listeners. Correlations were observed between SSR, TFS sensitivity, and two of the three cognitive tasks, which became non-significant when age effects were controlled for, suggesting that reduced TFS sensitivity and certain cognitive deficits may share a common age-related cause. The third cognitive measure was also significantly correlated with SSR, but not with TFS sensitivity or age, suggesting an independent non-age-related cause. 相似文献

19.

Tone model integration based on discriminative weight training for Putonghua speech recognition

HUANG Hao ZHU Jie 《声学学报：英文版》2008,27(3):193-202

A discriminative framework of tone model integration in continuous speech recognition was proposed. The method uses model dependent weights to scale probabilities of the hidden Markov models based on spectral features and tone models based on tonal features. The weights are discriminatively trained by minimum phone error criterion. Update equation of the model weights based on extended Baum-Welch algorithm is derived. Various schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is ewluated on tonal syllable output and character output speech recognition tasks. The experimental results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration. 相似文献

20.

汉语语音识别中基于区分性权重训练的声调集成方法

黄浩朱杰《声学学报》2008,33(1):1-8

提出一种区分性方法,将声调信息加入大词汇量连续语音识别系统中。该方法根据最小音子错误准则,区分性地圳练模型相关的概率权重。利用这些权重对传统基于传统谱特征的隐马尔可夫模型概率以及声调模型概率进行加权,通过调整模型之间的作用程度提高系统识别率。推导了利用扩展Baum-welch算法的权重更新公式。对不同模型权重组合策略进行了评估,并利用权重之间的平滑方法来克服权重训练过拟合的问题。分别通过大词汇连续语音的带调音节输出和汉字输出两种识别任务来验证区分性模型权重训练的性能。实验结果表明在两种识别任务上,区分性的模型权重较使用全局模型权重分别获得9.5%以及4.7%的相对误识率降低。这表明了区分性模型权重对提高声调集成性能的有效性。相似文献