期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

钟明辉曹乃文黄汉明郑建华陈芽玲《广西物理》2007,(4)

探讨了连续隐马尔可夫模型的基本原理及其在汉语数码语音识别中的应用,实现了一个汉语数码语音识别系统,其正确识别率达到99%以上。相似文献

2.

景春进陈东东周琳琦《应用声学》2014,22(8):2571-2573

针对舰艇指挥训练系统的特点,提出了一种利用语音识别技术提高其训练效率的方法;首先分析了舰艇指挥指令的语言特点,然后研究了基于Sphinx平台的汉语连续语音识别的相关问题,包括声学模型的训练、语言模型的训练及语音识别引擎等;最后设计并实现了一个非特定人,中等专用词汇量的连续汉语语音识别系统;实验采用了一定数量的数字和专用词汇进行验证,结果表明,经过声学模型训练后,该系统的识别率有较大提高;该方法对提高舰艇指挥训练系统的自动化水平具有一定的指导意义。 相似文献

3.

基于连续高斯混合密度HMM的汉语全音节语音识别研究 总被引：5，自引：0，他引：5

国立新莫福源李昌立《声学学报》1995,(5)

本文在大量语音分析实验的基础上，对ＨＭＭ用于汉语全音节语音识别进行了较为深入的探讨，建立了一个连续高斯混合密度ＨＭＭ的汉语全音节语音识别系统．该系统在训练算法上撇开了传统的Ｂａｕｍ－Ｗｅｌｃｈ算法，代之以计算复杂度小、存储量小、迭代次数少且具有自动分割效应的分段Ｋ平均算法。对于ＨＭＭ的模型单元的选择，单元的结构以及模型参数的选取，充分考虑了汉语语音的特点；并在语音特征上做了深入的实验分析工作，采用了符合人耳听觉特性的Ｍｅｌ－Ｓｃａｌｅｄ参数，用ＦＦＴ倒谱代替了ＬＰＣ倒谱，同时利用了语音的动态谱特征和能量特征。另外，本文还针对汉语声母的特点，独特地提出了变帧移分析策略。整个识别系统的首选正识率为９１．１％．相似文献

4.

基于听觉模型的耳语音的声韵切分 总被引：5，自引：0，他引：5

下载免费PDF全文

丁慧栗学丽徐柏龄《应用声学》2004,23(2):20-25,44

本文分析了耳语音的特点，并根据生理声学及心理声学的基本理论与实验资料，提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次：耳蜗对声音频率的分解机理；听觉系统的时域和频域非线性变化；中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性，因而适于耳语音识别，在耳语音声韵母切分实验中得到了满意的结果。相似文献

5.

重庆方言语音识别系统的设计与实现

张策韦鹏程陆晓燕石熙《应用声学》2018,26(1)

语音识别赋予了计算机能够识别出语音内容的功能，是人机交互技术领域的重要研究内容。随着计算机技术的发展，语音识别已经得到了成熟的发展。但是关于方言的语音识别还有很大的发展空间。中国是一个幅员辽阔、人口众多的国家，因此方言种类繁多，其中有3000多万人交流使用的重庆方言就是其中之一。采集了重庆方言的部分词语的文本文件和对应的语音文件建立语料库，根据重庆方言的发音特点，选取重庆方言的声韵母作为声学建模基元，选取隐马尔可夫模型(Hidden Markov Model, HMM)为声学模型设计了一个基于HMM的重庆方言语音识别系统。在训练过程利用语料库中训练集语料对声学模型进行训练，形成HMM模型库；在识别过程利用语料库中的测试集语料进行识别测试。实验结果表明，该系统能够实现重庆方言的语音识别，并且识别的正确率为100%。相似文献

6.

基于随机轨迹模型的汉语连续语音识别方法研究 总被引：1，自引：0，他引：1

马小辉富煜清陆佶人龚一凡《声学学报》1997,(2)

本文在指出隐马尔可夫模型（HMM）不合理假设的基础上,介绍了随机轨迹模型（STM）的理论机制及优越性。随机轨迹模型将语音基元的声学观察表示为参数空间中轨迹的聚类,并将轨迹建模为状态随机序列概率密度函数的混合,该模型可以克服HMM的不合理假设,在理论上更合理。根据STM的特点及汉语语音特色,本文对汉语连续语音识别基元的选取进行了讨论,提出了音素类单元作为识别系统的识别基元。基于STM的汉语连续语音识别的实验结果证明了STM的有效性和音素类单元的一致性。相似文献

7.

采用低维特征映射的耳语音向正常音转换 总被引：1，自引：0，他引：1

下载免费PDF全文

周健窦云峰刘荣敏王华彬陶亮《声学学报》2018,43(5):855-863

在将耳语音转换为正常音时,为了研究降维后语音特征对耳语音转换的影响,分别对耳语音和正常音谱包络进行自适应编码以提取耳语音和正常音的低维特征,然后使用BP网络建立耳语音和正常音低维谱包络特征之间的映射关系以及正常音基频和耳语音低维谱包络特征之间的关系。转换时,根据耳语音低维谱包络特征获得对应正常音的低维谱包络特征和基频,对低维谱包络特征进行解码后获得对应的正常音谱包络。实验结果表明,采用此方法转换后的语音与正常音之间的倒谱距离相比高斯混合模型方法下降了10%,转换后语音的自然度和可懂度都有所提高。相似文献

8.

基于发音特征的汉语普通话语音声学建模 总被引：3，自引：0，他引：3

张晴晴潘接林颜永红《声学学报》2010,35(2)

将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。相似文献

9.

基于特征分量输出概率加权的多数据流鲁棒语音识别方法

张军韦岗余华《声学学报》2008,33(2):102-108

针对传统多数据流语音识别方法不考虑数据流内各特征分量受噪声影响差异的缺点,提出了一种基于特征分量输出概率加权的数据流结合新方法,分析了特征分量输出概率加权对识别的影响,并结合丢失数据技术中的边缘化(Marginalisation)模型和软判决(Soft decision)模型给出了两种具体的数据流结合方案.将所提数据流结合方案应用到复合子带语音识别系统中,实验结果表明,所提识别方法可以根据噪声环境的不同自适应地调整数据流对识别影响的大小,其性能显著优于传统的多数据流识别方法. 相似文献

10.

认人的限定主题的连续汉语语音识别系统的研究 总被引：3，自引：0，他引：3

林道发杨家沅罗万伯王跟东《声学学报》1992,(6)

本文描述一个基于矢量量化(VQ)、隐马尔可夫模型和有限态文法的认人的限定主题的连续汉语语音识别系统。引入跨零幅度差函数作为判定语音有无的特征参量之一,HMM训练用的各单个词语的语音数据由连续话句的语音数据经自动切分而得,识别过程中,每帧都考虑多个可能过渡到其它模型的文法节点。这些技术措施显著地提高了识别系统的准确率。这类系统能用于特定人操作的、特定主题的信息查询任务。待进一步解决非特定人的连续语音识别问题后,可用于特定主题的公用信息查询系统。相似文献

11.

Study of the formant and duration in Chinese whispered vowel speech

Yue Zhao Wei Lin 《Applied Acoustics》2016

Study on the acoustical characteristic is important to speech and speaker recognition in Chinese whispered speech. In this paper, the characteristics of whispered speech are introduced and the acoustical characteristics in Chinese whispered speech are discussed. There is no fundamental frequency in the whispered speech, so other characteristics such as the duration and frequency of formant are extracted and analyzed. From experiments with six simple Chinese whispered vowels, it is proved that the duration and the frequency of formant can be used as the main acoustical characteristics in the Chinese whispered recognition. 相似文献

12.

Study on the method and system implementation of an isolated word recognition system used in noisy environment

YANG Ziyun HAN Jiqing XU Jinpei WANG Chengfa REN Weimin 《声学学报：英文版》1996,(2)

I.IntroductionRecentlytherearemanykindsofsystemsandproductsforspeechrecognition,butalmostallofthemareworkinginquietenvironment,theperformancearedegradedorevencan'tworkwhenitisoperatedinhighnoisyenvironmentssuchasincockpits,vehicle,workshopsetc.SonoiserobustnesshasbecomeoneofthemainobstaclesfortherealaPplicationsoftheautomaticspeechrecognizersanditattractstheattentionofresearchersinspeechtechnologyareas.Since1978,substantialeffortshavebeendevotedtotestandevaluatethespeechrecognizersusedinfight… 相似文献

13.

一种改进的DNN-HMM的语音识别方法* 总被引：2，自引：1，他引：1

下载免费PDF全文

李云红梁思程贾凯莉张秋铭宋鹏何琛王刚毅李禹萱《应用声学》2019,38(3):371-377

针对深度神经网络与隐马尔可夫模型(DNN-HMM)结合的声学模型在语音识别过程中建模能力有限等问题,提出了一种改进的DNN-HMM模型语音识别算法。首先根据深度置信网络(DBN)结合深度玻尔兹曼机(DBM),建立深度神经网络声学模型,然后提取梅尔频率倒谱系数(MFCC)和对数域的Mel滤波器组系数(Fbank)作为声学特征参数,通过TIMIT语音数据集进行实验。实验结果表明:结合了DBM的DNN-HMM模型相比DNN-HMM模型更具优势,其中,使用MFCC声学特征在词错误率与句错误率方面分别下降了1.26%和0.20%。此外,使用默认滤波器组的Fbank特征在词错误率与句错误率方面分别下降了0.48%和0.82%,并且适量增加滤波器组可以降低错误率。总之,研究取得句错误率与词错误率分别降低到21.06%和3.12%的好成绩。相似文献

14.

Statistical properties of Chinese phonemic networks

Shuiyuan Yu Chunshan Xu 《Physica A》2011,390(7):1370-1380

The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects. 相似文献

15.

Whispered speaker identification based on feature and model hybrid compensation

GU Xiaojiang ZHAO Heming LU|¨ Gang 《声学学报：英文版》2012,(4):499-508

In order to increase short time whispered speaker recognition rate in variable channel conditions,the hybrid compensation in model and feature domains was proposed.This method is based on joint factor analysis in training model stage.It extracts speaker factor and eliminates channel factor by estimating training speech speaker and channel spaces.Then in the test stage,the test speech channel factor is projected into feature space to engage in feature compensation,so it can remove channel information both in model and feature domains in order to improve recognition rate.The experiment result shows that the hybrid compensation can obtain the similar recognition rate in the three different training channel conditions and this method is more effective than joint factor analysis in the test of short whispered speech. 相似文献

16.

基于双向循环神经网络的汉语语音识别*

李鹏杨元维杜李慧高贤君周意蒋梦月张净波《应用声学》2020,39(3):464-471

当前基于深度神经网络模型中,虽然其隐含层可设置多层,对复杂问题适应能力强,但每层之间的节点连接是相互独立的,这种结构特性导致了在语音序列中无法利用上下文相关信息来提高识别效果,而传统的循环神经网络虽然做出了改进,但是只能对上文信息进行利用。针对以上问题,该文采用可以同时利用语音序列中上下文相关信息的双向循环神经网络模型与深度神经网络模型相结合,并应用于语音识别。构建具有5层隐含层的模型,其中第3层为双向循环神经网络结构,其他层采用深度神经网络结构。实验结果表明:加入了双向循环神经网络结构的模型与其他模型相比,较好地提高了识别正确率;噪声对双向循环神经网络汉语识别有重要影响,尤其是训练集和测试集附加噪声类型不同时,单一的含噪声语音的训练模型无法适应不同噪声类型的语音识别;调整神经网络模型中隐含层神经元数量后,识别正确率并不是一直随着隐含层中神经元数量的增加而增加,神经元数量数目增加到一定程度后正确率出现了降低的趋势。相似文献

17.

Effects of speaking style on speech intelligibility for Mandarin-speaking cochlear implant users

Li Y Zhang G Kang HY Liu S Han D Fu QJ 《The Journal of the Acoustical Society of America》2011,129(6):EL242-EL247

Cochlear implant (CI) users' speech understanding may be influenced by different speaking styles. In this study, speech recognition was measured in Mandarin-speaking CI and normal-hearing (NH) subjects for sentences produced according to four styles: slow, normal, fast, and whispered. CI subjects were tested using their clinical processors; NH subjects were tested while listening to a four-channel CI simulation. Performance gradually worsened with increasing speaking rate and was much poorer with whispered speech. CI performance was generally similar to NH performance with the four-channel simulation. Results suggest that some speaking styles, especially whispering, may negatively affect Mandarin-speaking CI users' speech understanding. 相似文献

18.

Specification of cross-modal source information in isolated kinematic displays of speech

Lachs L Pisoni DB 《The Journal of the Acoustical Society of America》2004,116(1):507-518

Information about the acoustic properties of a talker's voice is available in optical displays of speech, and vice versa, as evidenced by perceivers' ability to match faces and voices based on vocal identity. The present investigation used point-light displays (PLDs) of visual speech and sinewave replicas of auditory speech in a cross-modal matching task to assess perceivers' ability to match faces and voices under conditions when only isolated kinematic information about vocal tract articulation was available. These stimuli were also used in a word recognition experiment under auditory-alone and audiovisual conditions. The results showed that isolated kinematic displays provide enough information to match the source of an utterance across sensory modalities. Furthermore, isolated kinematic displays can be integrated to yield better word recognition performance under audiovisual conditions than under auditory-alone conditions. The results are discussed in terms of their implications for describing the nature of speech information and current theories of speech perception and spoken word recognition. 相似文献

19.

Robust speech recognition from binary masks

Narayanan A Wang D 《The Journal of the Acoustical Society of America》2010,128(5):EL217-EL222

Inspired by recent evidence that a binary pattern may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed by classifying binary masks corresponding to a word utterance. The proposed method is evaluated using a subset of the TIDigits corpus to perform isolated digit recognition. Despite dramatic reduction of speech information encoded in a binary mask, the proposed system performs surprisingly well. The system is compared with a traditional HMM based approach and is shown to perform well under low SNR conditions. 相似文献