期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

汉语连续语音识别中一种新的音节间相关识别单元 总被引：1，自引：0，他引：1

李春王作英《声学学报》2003,(2)

考虑汉语连续语音中的协同发音现象对语音识别性能的提高是非常重要的。针对汉语语音的特点,提出了一种新的在汉语连续语音识别中考虑音节间协同发音现象,对声学模型进行细化的识别单元。然后基于语音学知识对音节间上下文影响进行分类,实现单元间状态参数的共享,降低了模型的复杂程度,保证了模型的可训练度。这种方法和传统方法的最大不同在于:这种方法完全利用语音学知识进行聚类,而传统方法采用数据驱动的聚类方式。识别实验表明,基于语音学分类的音节间相关识别单元对识别性能有明显的改善,系统的首选误识率降低了17％。相似文献

2.

基于声调建模的带噪汉语数字串语音识别 总被引：1，自引：1，他引：1

王欢良钱瑶 F.K.SOONG 韩纪庆《声学学报》2007,32(5):454-460

尝试利用声调信息来改善噪声下汉语数字串语音识别性能。为解决声调特征不连续问题,提出采用基于多空间概率分布的隐马尔可夫模型进行声调建模。简要分析噪声对声调特征提取的影响,论证了在带噪数字串语音识别中利用声调信息的可行性。实验结果显示,与不采用声调信息的方法相比,在5 dB到20 dB的测试数据上,所提方法可使错误率平均相对下降17.2%。这说明声调信息及所提建模方法对于改善带噪汉语数字串语音识别性能是有效的。相似文献

3.

基于发音特征的汉语普通话语音声学建模

张晴晴潘接林颜永红《声学学报》2010,35(2)

将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。相似文献

4.

汉语双音节调位的矢量量化(VQ)研究

孔江平吕士楠《声学学报》2000,(2)

本项研究从实际语音材料出发,运用曲线拟合和矢量量化的方法,对汉语双音节调位的模型进行了系统地研究。研究结果表明;（１）利用曲线拟合和矢量量化技术对汉语双字词声调组合进行模式归类是一种可行和有效的方法。（２）虽然在部分声调组合中不同样品间的离散比较小,但大部分双音节词的声调组合模式可以分为几个子类,因此用一个总体平均的统计模式是不够的．（３）音高曲线的差异明显表现为声调高音点的不同,它是与双字词的音节重音模式相关联的。相似文献

5.

基于随机轨迹模型的汉语连续语音识别方法研究 总被引：1，自引：0，他引：1

马小辉富煜清陆佶人龚一凡《声学学报》1997,(2)

本文在指出隐马尔可夫模型（HMM）不合理假设的基础上,介绍了随机轨迹模型（STM）的理论机制及优越性。随机轨迹模型将语音基元的声学观察表示为参数空间中轨迹的聚类,并将轨迹建模为状态随机序列概率密度函数的混合,该模型可以克服HMM的不合理假设,在理论上更合理。根据STM的特点及汉语语音特色,本文对汉语连续语音识别基元的选取进行了讨论,提出了音素类单元作为识别系统的识别基元。基于STM的汉语连续语音识别的实验结果证明了STM的有效性和音素类单元的一致性。相似文献

6.

基于鼻韵尾分离的汉语声韵母识别模型

邵健赵庆卫颜永红《声学学报》2010,35(5)

研究汉语自然口语识别中的建模单元选择问题。在HMM三状态模型中,声韵母单元与音素单元作为两种最流行的建模单元各有优劣。一方面从自然口语音变严重的问题出发,倾向采用粗粒度的声韵母单元以概括各种音变;另一方面从三状态结构可能无法有效描述复杂单元的问题出发,又倾向采用细粒度的音素单元。本文在实验语音学理论研究成果与声韵母时长分析实验结果的基础上,主张对扩展声韵母单元进行有选择的拆分,提出了基于鼻韵尾分离的声韵母拆分方法。实验结果表明本文的方法与扩展声韵母单元、音素单元相比,识别性能有了明显改善,其字错误率分别降低2.23%和9.45%。相似文献

7.

基于3维空间Viterbi算法的音素模型和声调模型识别概率统合方法的研究 总被引：2，自引：1，他引：1

赵力邹采荣吴镇扬《声学学报》2001,(3)

提出了一种在汉语连续语音识别中基于３维空间Ｖｉｔｅｒｂｉ算法的音素模型和声调模型识别概率的统合方法。该方法采用６０个音素单位的ＨＭＭ和８个声调单位的ＨＭＭ作为识别用基元模型。音素和声调基元模型识别结果的统合,采用音素的ＨＭＭ状态、声调的ＨＭＭ状态和时间的３维空间帧同步Ｖｉｔｅｒｂｉ算法来实现。本文还探讨了在该方法的基础上,给予不同路径限制时的匹配统合效果,并且通过和传统的匹配统合方式的比较,证明了提出的方法的有效性。相似文献

8.

基于递阶模糊聚类的混沌时间序列预测 总被引：5，自引：0，他引：5

下载免费PDF全文

刘福才孙立萍梁晓明《物理学报》2006,55(7):3302-3306

提出一种新的基于递阶模糊聚类系统的模糊建模方法.目的在于通过一系列的步骤优化T-S模糊模型结构，实现非线性系统的建模和预测.首先利用最近邻聚类法初始划分输入空间，得到规则数及初始聚类中心，用模糊C均值算法(FCM)进一步优化聚类中心；然后利用加权最小二乘法估计模糊模型的初始参数，进一步利用带遗忘因子的递推最小二乘法优化结论参数.采用该方法对Mackey-Glass混沌时间序列进行预测实验，结果表明可以对Mackey-Glass混沌时间序列进行准确建模和预测，证明了本方法的有效性. 关键词：递阶模糊聚类模糊建模混沌时间序列最小二乘相似文献

9.

混合双语语音识别的研究

张晴晴潘接林颜永红《声学学报》2010,35(2)

随着现代社会信息的全球化,双语以及多语混合的语言现象日趋普遍,随之而产生的双语或多语语音识别也成为语音识别研究领域的热门课题。在双语混合语音识别中,主要面临的问题有两个:一是在保证双语识别率的前提下控制系统的复杂度;二是有效处理插入语中原用语引起的非母语口音现象。为了解决双语混合现象以及减少统计建模所需的数据量,通过音素混合聚类方法建立起一个统一的双语识别系统。在聚类算法中,提出了一种新型基于混淆矩阵的两遍音素聚类算法,并将该方法与传统的基于声学似然度准则的聚类方法进行比较;针对双语语音中非母语语音识别性能较低的问题,提出一种新型的双语模型修正算法用于提高非母语语音的识别性能。实验结果表明,通过上述方法建立起来的中英双语语音识别系统在有效控制模型规模的同时,实现了同时对两种语言的识别,且在单语言语音和混合语言语音上的识别性能也能得到有效保证。相似文献

10.

基于鲁棒模糊聚类的混沌时间序列预测

下载免费PDF全文

刘福才张彦柳陈超《物理学报》2008,57(5):2784-2790

采用一种基于鲁棒模糊聚类算法的模糊辨识方法，通过引入局部划分关联度因子，增强了系统辨识的抗干扰能力，提高了系统辨识的鲁棒性.首先用最近邻模糊聚类法划分初始输入空间，得到模糊规则数及初始聚类中心；然后用鲁棒模糊聚类算法求解并优化模糊隶属度和聚类中心，建立高精度的T-S模糊模型；最后利用最小二乘法辨识模型的初始结论参数，进一步利用带遗忘因子的递推最小二乘法优化结论参数.采用该方法对Mackey-Glass混沌时间序列进行建模和预测，仿真结果表明利用本方法可以进行准确建模和预测，验证了本方法的鲁棒性、有效性和实 关键词：最近邻模糊聚类鲁棒模糊聚类混沌时间序列最小二乘法相似文献

11.

Effects of language experience and stimulus complexity on the categorical perception of pitch direction

Xu Y Gandour JT Francis AL 《The Journal of the Acoustical Society of America》2006,120(2):1063-1074

Whether or not categorical perception results from the operation of a special, language-specific, speech mode remains controversial. In this cross-language (Mandarin Chinese, English) study of the categorical nature of tone perception, we compared native Mandarin and English speakers' perception of a physical continuum of fundamental frequency contours ranging from a level to rising tone in both Mandarin speech and a homologous (nonspeech) harmonic tone. This design permits us to evaluate the effect of language experience by comparing Chinese and English groups; to determine whether categorical perception is speech-specific or domain-general by comparing speech to nonspeech stimuli for both groups; and to examine whether categorical perception involves a separate categorical process, distinct from regions of sensory discontinuity, by comparing speech to nonspeech stimuli for English listeners. Results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. Categorical perception of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners, and perception of nonspeech stimuli was more categorical for English listeners than was perception of speech stimuli. These findings lead us to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations. 相似文献

12.

汉语耳语音孤立字识别研究 总被引：6，自引：0，他引：6

下载免费PDF全文

杨莉莉林玮徐柏龄《应用声学》2006,25(3):187-192

耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%. 相似文献

13.

Experimental study on the naturalness of synthetic speech

LU Shinan ZHANG Jialu QI Shiqian 《声学学报：英文版》1993,(3)

In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese. 相似文献

14.

Enhancing Chinese tone recognition by manipulating amplitude envelope: implications for cochlear implants 总被引：1，自引：0，他引：1

Luo X Fu QJ 《The Journal of the Acoustical Society of America》2004,116(6):3659-3667

Tone recognition is important for speech understanding in tonal languages such as Mandarin Chinese. Cochlear implant patients are able to perceive some tonal information by using temporal cues such as periodicity-related amplitude fluctuations and similarities between the fundamental frequency (F0) contour and the amplitude envelope. The present study investigates whether modifying the amplitude envelope to better resemble the F0 contour can further improve tone recognition in multichannel cochlear implants. Chinese tone and vowel recognition were measured for six native Chinese normal-hearing subjects listening to a simulation of a four-channel cochlear implant speech processor with and without amplitude envelope enhancement. Two algorithms were proposed to modify the amplitude envelope to more closely resemble the F0 contour. In the first algorithm, the amplitude envelope as well as the modulation depth of periodicity fluctuations was adjusted for each spectral channel. In the second algorithm, the overall amplitude envelope was adjusted before multichannel speech processing, thus reducing any local distortions to the speech spectral envelope. The results showed that both algorithms significantly improved Chinese tone recognition. By adjusting the overall amplitude envelope to match the F0 contour before multichannel processing, vowel recognition was better preserved and less speech-processing computation was required. The results suggest that modifying the amplitude envelope to more closely resemble the F0 contour may be a useful approach toward improving Chinese-speaking cochlear implant patients' tone recognition. 相似文献

15.

汉语儿童情感语声合成

下载免费PDF全文

胡航烨王蔚《应用声学》2023,42(1):76-83

情感语声合成技术对于人机交互具有重要的意义。面对儿童情感语声合成所需汉语语声数据资源缺乏以及模型训练时长较长等问题,该文提出利用迁移学习实现汉语儿童情感语声合成的方法。首先基于汉语语声数据库训练深度学习模型实现中文语声端到端合成模型,再使用高质量大样本的中文情感语料库完成情感语声合成模型,最后利用自行采样的小样本汉语儿童情感语料对模型进行迁移学习实现低资源的语声合成。客观实验结果中梅尔倒谱失真指标为4.91,主观听辨实验指标分别为3.61和4.17。通过实验对比表明,该文的方法在情感语声合成技术的应用上具有良好的性能表现,并且优于现有先进的低资源情感语声合成方法。相似文献

16.

Contribution of low-frequency acoustic information to Chinese speech recognition in cochlear implant simulations

Luo X Fu QJ 《The Journal of the Acoustical Society of America》2006,120(4):2260-2266

Chinese sentence recognition strongly relates to the reception of tonal information. For cochlear implant (CI) users with residual acoustic hearing, tonal information may be enhanced by restoring low-frequency acoustic cues in the nonimplanted ear. The present study investigated the contribution of low-frequency acoustic information to Chinese speech recognition in Mandarin-speaking normal-hearing subjects listening to acoustic simulations of bilaterally combined electric and acoustic hearing. Subjects listened to a 6-channel CI simulation in one ear and low-pass filtered speech in the other ear. Chinese tone, phoneme, and sentence recognition were measured in steady-state, speech-shaped noise, as a function of the cutoff frequency for low-pass filtered speech. Results showed that low-frequency acoustic information below 500 Hz contributed most strongly to tone recognition, while low-frequency acoustic information above 500 Hz contributed most strongly to phoneme recognition. For Chinese sentences, speech reception thresholds (SRTs) improved with increasing amounts of low-frequency acoustic information, and significantly improved when low-frequency acoustic information above 500 Hz was preserved. SRTs were not significantly affected by the degree of spectral overlap between the CI simulation and low-pass filtered speech. These results suggest that, for CI patients with residual acoustic hearing, preserving low-frequency acoustic information can improve Chinese speech recognition in noise. 相似文献

17.

Frequency modulations in the speech signal

A. S. Leonov I. S. Makarov V. N. Sorokin 《Acoustical Physics》2009,55(6):876-887

The paper examines physical mechanisms of frequency modulations in acoustics of the vocal tract and methods of estimation of these modulations in the speech signal. It has been found that vibrations of the tract walls make a negligibly small effect on modulations of its resonance frequencies. The model of the process of speech formation with account for the subglottal cavity shows that a change in boundary conditions at the open glottis produces noticeable variations in resonance frequencies. Along with this type of modulations, modulations determined by the shape of the source of excitation also arise in the speech signal. They substantially depend on the ratio of the frequency of the fundamental tone to the resonance frequency and of the parameters of methods estimating modulations and methods of analysis of the speech signal. Overall, this may sometimes cause unstable and unpredictable modulations of estimated formant frequencies in the speech signal. 相似文献

18.

Tone model integration based on discriminative weight training for Putonghua speech recognition

HUANG Hao ZHU Jie 《声学学报：英文版》2008,27(3):193-202

A discriminative framework of tone model integration in continuous speech recognition was proposed. The method uses model dependent weights to scale probabilities of the hidden Markov models based on spectral features and tone models based on tonal features. The weights are discriminatively trained by minimum phone error criterion. Update equation of the model weights based on extended Baum-Welch algorithm is derived. Various schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is ewluated on tonal syllable output and character output speech recognition tasks. The experimental results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration. 相似文献

19.

Faciliation of Mandarin tone perception by visual speech in clear and degraded audio: implications for cochlear implants

Smith D Burnham D 《The Journal of the Acoustical Society of America》2012,131(2):1480-1489

Cochlear implant (CI) users in tone language environments report great difficulty in perceiving lexical tone. This study investigated the augmentation of simulated cochlear implant audio by visual (facial) speech information for tone. Native speakers of Mandarin and Australian English were asked to discriminate between minimal pairs of Mandarin tones in five conditions: Auditory-Only, Auditory-Visual, CI-simulated Auditory-Only, CI-simulated Auditory-Visual, and Visual-Only (silent video). Discrimination in CI-simulated audio conditions was poor compared with normal audio, and varied according to tone pair, with tone pairs with strong non-F0 cues discriminated the most easily. The availability of visual speech information also improved discrimination in the CI-simulated audio conditions, particularly on tone pairs with strong durational cues. In the silent Visual-Only condition, both Mandarin and Australian English speakers discriminated tones above chance levels. Interestingly, tone-nai?ve listeners outperformed native listeners in the Visual-Only condition, suggesting firstly that visual speech information for tone is available, and may in fact be under-used by normal-hearing tone language perceivers, and secondly that the perception of such information may be language-general, rather than the product of language-specific learning. This may find application in the development of methods to improve tone perception in CI users in tone language environments. 相似文献

20.

Chinese dialect identification using segmental and prosodic features

Chang WW Tsai WH 《The Journal of the Acoustical Society of America》2000,108(4):1906-1913

Several approaches to Chinese dialect identification based on segmental and prosodic features of speech are described in this paper. When using segmental information only, the system performs phonotactic analysis after speech utterances have been tokenized into sequences of broad phonetic classes. The second scheme comprises prosodic models which are trained to capture tone sequence information for individual dialects. Also proposed is a novel approach that examines differences between Chinese dialects at broad phonetic and prosodic levels. These algorithms were evaluated via a multispeaker read-speech mode. Simulation results indicate that the combined use of segmental and prosodic features allows the proposed system to discriminate among three major Chinese dialects spoken in Taiwan with 93.0% accuracy. 相似文献