首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
汉语连续语音识别中一种新的音节间相关识别单元   总被引:1,自引:0,他引:1  
考虑汉语连续语音中的协同发音现象对语音识别性能的提高是非常重要的。针对汉语语音的特点,提出了一种新的在汉语连续语音识别中考虑音节间协同发音现象,对声学模型进行细化的识别单元。然后基于语音学知识对音节间上下文影响进行分类,实现单元间状态参数的共享,降低了模型的复杂程度,保证了模型的可训练度。这种方法和传统方法的最大不同在于:这种方法完全利用语音学知识进行聚类,而传统方法采用数据驱动的聚类方式。识别实验表明,基于语音学分类的音节间相关识别单元对识别性能有明显的改善,系统的首选误识率降低了17%。  相似文献   

2.
基于声调建模的带噪汉语数字串语音识别   总被引:1,自引:1,他引:1  
尝试利用声调信息来改善噪声下汉语数字串语音识别性能。为解决声调特征不连续问题,提出采用基于多空间概率分布的隐马尔可夫模型进行声调建模。简要分析噪声对声调特征提取的影响,论证了在带噪数字串语音识别中利用声调信息的可行性。实验结果显示,与不采用声调信息的方法相比,在5 dB到20 dB的测试数据上,所提方法可使错误率平均相对下降17.2%。这说明声调信息及所提建模方法对于改善带噪汉语数字串语音识别性能是有效的。  相似文献   

3.
将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。  相似文献   

4.
本项研究从实际语音材料出发,运用曲线拟合和矢量量化的方法,对汉语双音节调位的模型进行了系统地研究。研究结果表明;(1)利用曲线拟合和矢量量化技术对汉语双字词声调组合进行模式归类是一种可行和有效的方法。(2)虽然在部分声调组合中不同样品间的离散比较小,但大部分双音节词的声调组合模式可以分为几个子类,因此用一个总体平均的统计模式是不够的.(3)音高曲线的差异明显表现为声调高音点的不同,它是与双字词的音节重音模式相关联的。  相似文献   

5.
基于随机轨迹模型的汉语连续语音识别方法研究   总被引:1,自引:0,他引:1  
本文在指出隐马尔可夫模型(HMM)不合理假设的基础上,介绍了随机轨迹模型(STM)的理论机制及优越性。随机轨迹模型将语音基元的声学观察表示为参数空间中轨迹的聚类,并将轨迹建模为状态随机序列概率密度函数的混合,该模型可以克服HMM的不合理假设,在理论上更合理。根据STM的特点及汉语语音特色,本文对汉语连续语音识别基元的选取进行了讨论,提出了音素类单元作为识别系统的识别基元。基于STM的汉语连续语音识别的实验结果证明了STM的有效性和音素类单元的一致性。  相似文献   

6.
研究汉语自然口语识别中的建模单元选择问题。在HMM三状态模型中,声韵母单元与音素单元作为两种最流行的建模单元各有优劣。一方面从自然口语音变严重的问题出发,倾向采用粗粒度的声韵母单元以概括各种音变;另一方面从三状态结构可能无法有效描述复杂单元的问题出发,又倾向采用细粒度的音素单元。本文在实验语音学理论研究成果与声韵母时长分析实验结果的基础上,主张对扩展声韵母单元进行有选择的拆分,提出了基于鼻韵尾分离的声韵母拆分方法。实验结果表明本文的方法与扩展声韵母单元、音素单元相比,识别性能有了明显改善,其字错误率分别降低2.23%和9.45%。  相似文献   

7.
提出了一种在汉语连续语音识别中基于 3维空间 Viterbi算法的音素模型和声调模型识别概率的统合方法。该方法采用60个音素单位的HMM和8个声调单位的HMM作为识别用基元模型。音素和声调基元模型识别结果的统合,采用音素的HMM状态、声调的HMM状态和时间的3 维空间帧同步Viterbi 算法来实现。本文还探讨了在该方法的基础上,给予不同路径限制时的匹配统合效果,并且通过和传统的匹配统合方式的比较,证明了提出的方法的有效性。  相似文献   

8.
基于递阶模糊聚类的混沌时间序列预测   总被引:5,自引:0,他引:5       下载免费PDF全文
刘福才  孙立萍  梁晓明 《物理学报》2006,55(7):3302-3306
提出一种新的基于递阶模糊聚类系统的模糊建模方法.目的在于通过一系列的步骤优化T-S模糊模型结构,实现非线性系统的建模和预测.首先利用最近邻聚类法初始划分输入空间,得到规则数及初始聚类中心,用模糊C均值算法(FCM)进一步优化聚类中心;然后利用加权最小二乘法估计模糊模型的初始参数,进一步利用带遗忘因子的递推最小二乘法优化结论参数.采用该方法对Mackey-Glass混沌时间序列进行预测实验,结果表明可以对Mackey-Glass混沌时间序列进行准确建模和预测,证明了本方法的有效性. 关键词: 递阶模糊聚类 模糊建模 混沌时间序列 最小二乘  相似文献   

9.
随着现代社会信息的全球化,双语以及多语混合的语言现象日趋普遍,随之而产生的双语或多语语音识别也成为语音识别研究领域的热门课题。在双语混合语音识别中,主要面临的问题有两个:一是在保证双语识别率的前提下控制系统的复杂度;二是有效处理插入语中原用语引起的非母语口音现象。为了解决双语混合现象以及减少统计建模所需的数据量,通过音素混合聚类方法建立起一个统一的双语识别系统。在聚类算法中,提出了一种新型基于混淆矩阵的两遍音素聚类算法,并将该方法与传统的基于声学似然度准则的聚类方法进行比较;针对双语语音中非母语语音识别性能较低的问题,提出一种新型的双语模型修正算法用于提高非母语语音的识别性能。实验结果表明,通过上述方法建立起来的中英双语语音识别系统在有效控制模型规模的同时,实现了同时对两种语言的识别,且在单语言语音和混合语言语音上的识别性能也能得到有效保证。  相似文献   

10.
刘福才  张彦柳  陈超 《物理学报》2008,57(5):2784-2790
采用一种基于鲁棒模糊聚类算法的模糊辨识方法,通过引入局部划分关联度因子,增强了系统辨识的抗干扰能力,提高了系统辨识的鲁棒性.首先用最近邻模糊聚类法划分初始输入空间,得到模糊规则数及初始聚类中心;然后用鲁棒模糊聚类算法求解并优化模糊隶属度和聚类中心,建立高精度的T-S模糊模型;最后利用最小二乘法辨识模型的初始结论参数,进一步利用带遗忘因子的递推最小二乘法优化结论参数.采用该方法对Mackey-Glass混沌时间序列进行建模和预测,仿真结果表明利用本方法可以进行准确建模和预测,验证了本方法的鲁棒性、有效性和实 关键词: 最近邻模糊聚类 鲁棒模糊聚类 混沌时间序列 最小二乘法  相似文献   

11.
Whether or not categorical perception results from the operation of a special, language-specific, speech mode remains controversial. In this cross-language (Mandarin Chinese, English) study of the categorical nature of tone perception, we compared native Mandarin and English speakers' perception of a physical continuum of fundamental frequency contours ranging from a level to rising tone in both Mandarin speech and a homologous (nonspeech) harmonic tone. This design permits us to evaluate the effect of language experience by comparing Chinese and English groups; to determine whether categorical perception is speech-specific or domain-general by comparing speech to nonspeech stimuli for both groups; and to examine whether categorical perception involves a separate categorical process, distinct from regions of sensory discontinuity, by comparing speech to nonspeech stimuli for English listeners. Results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. Categorical perception of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners, and perception of nonspeech stimuli was more categorical for English listeners than was perception of speech stimuli. These findings lead us to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations.  相似文献   

12.
汉语耳语音孤立字识别研究   总被引:6,自引:0,他引:6       下载免费PDF全文
杨莉莉  林玮  徐柏龄 《应用声学》2006,25(3):187-192
耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%.  相似文献   

13.
In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese.  相似文献   

14.
Tone recognition is important for speech understanding in tonal languages such as Mandarin Chinese. Cochlear implant patients are able to perceive some tonal information by using temporal cues such as periodicity-related amplitude fluctuations and similarities between the fundamental frequency (F0) contour and the amplitude envelope. The present study investigates whether modifying the amplitude envelope to better resemble the F0 contour can further improve tone recognition in multichannel cochlear implants. Chinese tone and vowel recognition were measured for six native Chinese normal-hearing subjects listening to a simulation of a four-channel cochlear implant speech processor with and without amplitude envelope enhancement. Two algorithms were proposed to modify the amplitude envelope to more closely resemble the F0 contour. In the first algorithm, the amplitude envelope as well as the modulation depth of periodicity fluctuations was adjusted for each spectral channel. In the second algorithm, the overall amplitude envelope was adjusted before multichannel speech processing, thus reducing any local distortions to the speech spectral envelope. The results showed that both algorithms significantly improved Chinese tone recognition. By adjusting the overall amplitude envelope to match the F0 contour before multichannel processing, vowel recognition was better preserved and less speech-processing computation was required. The results suggest that modifying the amplitude envelope to more closely resemble the F0 contour may be a useful approach toward improving Chinese-speaking cochlear implant patients' tone recognition.  相似文献   

15.
胡航烨  王蔚 《应用声学》2023,42(1):76-83
情感语声合成技术对于人机交互具有重要的意义。面对儿童情感语声合成所需汉语语声数据资源缺乏以及模型训练时长较长等问题,该文提出利用迁移学习实现汉语儿童情感语声合成的方法。首先基于汉语语声数据库训练深度学习模型实现中文语声端到端合成模型,再使用高质量大样本的中文情感语料库完成情感语声合成模型,最后利用自行采样的小样本汉语儿童情感语料对模型进行迁移学习实现低资源的语声合成。客观实验结果中梅尔倒谱失真指标为4.91,主观听辨实验指标分别为3.61和4.17。通过实验对比表明,该文的方法在情感语声合成技术的应用上具有良好的性能表现,并且优于现有先进的低资源情感语声合成方法。  相似文献   

16.
Chinese sentence recognition strongly relates to the reception of tonal information. For cochlear implant (CI) users with residual acoustic hearing, tonal information may be enhanced by restoring low-frequency acoustic cues in the nonimplanted ear. The present study investigated the contribution of low-frequency acoustic information to Chinese speech recognition in Mandarin-speaking normal-hearing subjects listening to acoustic simulations of bilaterally combined electric and acoustic hearing. Subjects listened to a 6-channel CI simulation in one ear and low-pass filtered speech in the other ear. Chinese tone, phoneme, and sentence recognition were measured in steady-state, speech-shaped noise, as a function of the cutoff frequency for low-pass filtered speech. Results showed that low-frequency acoustic information below 500 Hz contributed most strongly to tone recognition, while low-frequency acoustic information above 500 Hz contributed most strongly to phoneme recognition. For Chinese sentences, speech reception thresholds (SRTs) improved with increasing amounts of low-frequency acoustic information, and significantly improved when low-frequency acoustic information above 500 Hz was preserved. SRTs were not significantly affected by the degree of spectral overlap between the CI simulation and low-pass filtered speech. These results suggest that, for CI patients with residual acoustic hearing, preserving low-frequency acoustic information can improve Chinese speech recognition in noise.  相似文献   

17.
The paper examines physical mechanisms of frequency modulations in acoustics of the vocal tract and methods of estimation of these modulations in the speech signal. It has been found that vibrations of the tract walls make a negligibly small effect on modulations of its resonance frequencies. The model of the process of speech formation with account for the subglottal cavity shows that a change in boundary conditions at the open glottis produces noticeable variations in resonance frequencies. Along with this type of modulations, modulations determined by the shape of the source of excitation also arise in the speech signal. They substantially depend on the ratio of the frequency of the fundamental tone to the resonance frequency and of the parameters of methods estimating modulations and methods of analysis of the speech signal. Overall, this may sometimes cause unstable and unpredictable modulations of estimated formant frequencies in the speech signal.  相似文献   

18.
A discriminative framework of tone model integration in continuous speech recognition was proposed. The method uses model dependent weights to scale probabilities of the hidden Markov models based on spectral features and tone models based on tonal features. The weights are discriminatively trained by minimum phone error criterion. Update equation of the model weights based on extended Baum-Welch algorithm is derived. Various schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is ewluated on tonal syllable output and character output speech recognition tasks. The experimental results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration.  相似文献   

19.
Cochlear implant (CI) users in tone language environments report great difficulty in perceiving lexical tone. This study investigated the augmentation of simulated cochlear implant audio by visual (facial) speech information for tone. Native speakers of Mandarin and Australian English were asked to discriminate between minimal pairs of Mandarin tones in five conditions: Auditory-Only, Auditory-Visual, CI-simulated Auditory-Only, CI-simulated Auditory-Visual, and Visual-Only (silent video). Discrimination in CI-simulated audio conditions was poor compared with normal audio, and varied according to tone pair, with tone pairs with strong non-F0 cues discriminated the most easily. The availability of visual speech information also improved discrimination in the CI-simulated audio conditions, particularly on tone pairs with strong durational cues. In the silent Visual-Only condition, both Mandarin and Australian English speakers discriminated tones above chance levels. Interestingly, tone-nai?ve listeners outperformed native listeners in the Visual-Only condition, suggesting firstly that visual speech information for tone is available, and may in fact be under-used by normal-hearing tone language perceivers, and secondly that the perception of such information may be language-general, rather than the product of language-specific learning. This may find application in the development of methods to improve tone perception in CI users in tone language environments.  相似文献   

20.
Several approaches to Chinese dialect identification based on segmental and prosodic features of speech are described in this paper. When using segmental information only, the system performs phonotactic analysis after speech utterances have been tokenized into sequences of broad phonetic classes. The second scheme comprises prosodic models which are trained to capture tone sequence information for individual dialects. Also proposed is a novel approach that examines differences between Chinese dialects at broad phonetic and prosodic levels. These algorithms were evaluated via a multispeaker read-speech mode. Simulation results indicate that the combined use of segmental and prosodic features allows the proposed system to discriminate among three major Chinese dialects spoken in Taiwan with 93.0% accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号