期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

汉语普通话按规则合成系统 总被引：1，自引：0，他引：1

石波吕士楠《声学学报》1995,20(2):146-155

与其他言语合成技术相比较,规则合成有两个明显的优点：语音库占用的内存很小和可以灵活控制合成言语的声学特征和韵律特征。本文介绍第一作者在1986—1990年期间在英国伦敦大学所开发的汉语按规则合成系统,和1992年在中国科学院合作期间所作的改进。包括汉语按规则合成的原理和策略,特别是韵律控制的方法,以及对系统的评价。相似文献

2.

Experimental study on the naturalness of synthetic speech

LU Shinan ZHANG Jialu QI Shiqian 《声学学报：英文版》1993,(3)

In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese. 相似文献

3.

一种高清晰度、高自然度的汉语文语转换系统

初敏吕士楠《声学学报》1996,21(S1):639-647

以基音同步叠加技术为基础,以汉语单音节为合成单元,有一包含词调模式、重音模式和句调模式的韵律规则库的汉语文语转换系统,可合成出高清晰度和高自然度的汉语语音。研究表明,影响汉语合成语音的自然度的主要因素是音高和音强随时间的变化、各音节的音长分布以及音节间的协同发音,其中以音高和音长的影响最为显著。时域基音同步叠加技术提供了一种在时域改变语音波形的音高和音长的方法,从而使在用波形拼接法合成汉语时,进行词一级和句一级的韵律调节成为可能。对新闻广播语言的声学特征的分析,为建立汉语合成的韵律调节规则提供了理论依据。本文介绍新的汉语文语转换系统的结构及流程、对广播语言韵律特征的初步研究结果、汉语合成规则及合成系统语音质量的评测结果。相似文献

4.

Prosodic strengthening and featural enhancement: evidence from acoustic and articulatory realizations of /a,i/ in English

Cho T 《The Journal of the Acoustical Society of America》2005,117(6):3867-3878

In this study the effects of accent and prosodic boundaries on the production of English vowels (/a,i/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /i/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /i/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts. 相似文献

5.

多路实时、高音质数字串合成系统

刘庆峰膝永盛王仁华《声学学报》1999,24(5):510-515

根据汉语普通话中数字串发音的韵律规则和特点,利用LMA语音合成器,实现了一种全新的数字串报号系统。本系统可以在不足300kbytes的极小的音库容量下,通过采用预先计算、查表拼接快速处理方法,在各种特定应用场合下多路实时实现高自然度、高音质的任意多位的数字号码的合成语音。测听实验和用户反馈信息均表明,合成输出语音的听觉效果已经可以与播音员原始发音相媲美。相似文献

6.

A text-to-speech system with high intelligibility and naturalness for Chinese 总被引：1，自引：0，他引：1

CHU Min LU Shinan 《声学学报：英文版》1996,(1)

I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet… 相似文献

7.

按规则合成无限词汇汉语语声的研究

黄载禄姬丽《声学学报》1990,15(3):194-201

本文研究了由计算机输入汉语拼音字符,以汉语音素作为声元素,根据语音的韵律规则合成连续汉语语声的方法。该方法数据量少,便于控制语音、语调。实验表明,利用该方法构成汉语文-语转换系统是可行的。相似文献

8.

基于言语数据库的汉语音高下倾现象研究 总被引：4，自引：3，他引：1

王安红陈明吕士楠《声学学报》2004,29(4):353-358

提出了一种通过语句中声调音域低音点和次低音点求出语调短语低音线的方法,并以低音线为基准,考察从大规模言语语料库中随机抽取的汉语自然语句的语调特征。研究结果表明,与已有的汉语句末语调理论不同,汉语同样具有各种语言中普遍存在的音高下倾和重置的语调特征。低音线的下倾和重置可以出现在音步、语调短语和语句等不同的韵律单元中,表现最明显的是在语调短语这一级。实验结果支持汉语语调双线模型中高音线和低音线语言学功能不同的观点。相似文献

9.

Entropy-Argumentative Concept of Computational Phonetic Analysis of Speech Taking into Account Dialect and Individuality of Phonation

Viacheslav Kovtun Oksana Kovtun Andriy Semenov 《Entropy (Basel, Switzerland)》2022,24(7)

相似文献

10.

Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener's native phonological system 总被引：2，自引：0，他引：2

Best CT McRoberts GW Goodell E 《The Journal of the Acoustical Society of America》2001,109(2):775-794

Classic non-native speech perception findings suggested that adults have difficulty discriminating segmental distinctions that are not employed contrastively in their own language. However, recent reports indicate a gradient of performance across non-native contrasts, ranging from near-chance to near-ceiling. Current theoretical models argue that such variations reflect systematic effects of experience with phonetic properties of native speech. The present research addressed predictions from Best's perceptual assimilation model (PAM), which incorporates both contrastive phonological and noncontrastive phonetic influences from the native language in its predictions about discrimination levels for diverse types of non-native contrasts. We evaluated the PAM hypotheses that discrimination of a non-native contrast should be near-ceiling if perceived as phonologically equivalent to a native contrast, lower though still quite good if perceived as a phonetic distinction between good versus poor exemplars of a single native consonant, and much lower if both non-native segments are phonetically equivalent in goodness of fit to a single native consonant. Two experiments assessed native English speakers' perception of Zulu and Tigrinya contrasts expected to fit those criteria. Findings supported the PAM predictions, and provided evidence for some perceptual differentiation of phonological, phonetic, and nonlinguistic information in perception of non-native speech. Theoretical implications for non-native speech perception are discussed, and suggestions are made for further research. 相似文献

11.

合成言语自然度的研究

吕士楠齐士钤张家《声学学报》1994,19(1):59-65

在过去的十年中,中国科学院声学研究所建立了一个文语转换系统,它包括语音库,声调模型和基本合成规则．无限词汇的汉语合成问题初步解决,但合成言语的自然度必须进一步改进．我们对语言的音段特征和超音段特征对合成言语自然度的影响做了研究,结果表明影响合成言语自然度的基本因素是语言的节奏和协同发音．本系统所采用的声调模式适合于单句合成,对于大于单句的语言单元的合成,必须十分仔细地控制语调才能达成高自然度．本文介绍利用主观评价对合成语言自然度进行研究的方法和结果．相似文献

12.

长时语音特征在说话人识别技术上的应用 总被引：1，自引：0，他引：1

张建平李明索宏彬杨琳付强颜永红《声学学报》2010,35(2):267-269

本文除介绍常用的说话人识别技术外,主要论述了一种基于长时时频特征的说话人识别方法,对输入的语音首先进行VAD处理,得到干净的语音后,对其提取基本时频特征。在每一语音单元内把基频、共振峰、谐波等时频特征的轨迹用Legendre多项式拟合的方法提取出主要的拟合参数,再利用HLDA的技术进行特征降维,用高斯混合模型的均值超向量表示每句话音时频特征的统计信息。在NIST06说话人1side-1side说话人测试集中,取得了18.7%的等错率,与传统的基于MFCC特征的说话人系统进行融合,等错率从4.9%下降到了4.6%,获得了6%的相对等错率下降。相似文献

13.

短时频谱通用背景模型群联合韵律的年龄语音转换

下载免费PDF全文

惠琳俞一彪《声学学报》2017,42(6):762-768

提出一种短时频谱通用背景模型群与韵律参数相结合进行年龄语音转换的方法。谱参数转换方面,同一年龄段各说话者提取语音短时谱系数并建立高斯混合模型,然后依据语音特征相似性对说话者进行聚类,每一类训练一个通用背景模型,最终得到通用背景模型群和一组短时频谱转换函数。谱参数转换之后再对共振峰进一步微调。韵律参数转换方面,基频和语速分别建立单高斯和平均时长率模型来推导转换函数。实验结果显示,提出的方法在ABX和MOS等评价指标上比传统的双线性法有明显的优势,相对单一通用背景模型法的对数似然度变化率提高了4%。这一结果表明提出的方法能够使转换语音具有良好目标倾向性的同时有较好的语音质量,性能较传统方法有明显提升。相似文献

14.

A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition

Juneja A Espy-Wilson C 《The Journal of the Acoustical Society of America》2008,123(2):1154-1168

A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets, syllabic peaks and dips, fricative onsets and offsets, and sonorant consonant onsets and offsets. Binary classifiers of the manner phonetic features-syllabic, sonorant and continuant-are used for probabilistic detection of these landmarks. The probabilistic framework exploits two properties of the acoustic cues of phonetic features-(1) sufficiency of acoustic cues of a phonetic feature for a probabilistic decision on that feature and (2) invariance of the acoustic cues of a phonetic feature with respect to other phonetic features. Probabilistic landmark sequences are constrained using manner class pronunciation models for isolated word recognition with known vocabulary. The performance of the system is compared with (1) the same probabilistic system but with mel-frequency cepstral coefficients (MFCCs), (2) a hidden Markov model (HMM) based system using APs and (3) a HMM based system using MFCCs. 相似文献

15.

An approach to normalization of coarticulation effects for vowels in connected speech

H Kuwabara 《The Journal of the Acoustical Society of America》1985,77(2):686-694

A method is proposed to reduce the ambiguity of vowels in connected speech by normalizing the coarticulation effects. The method is applied to vowels in phonetic environments where great ambiguity would be likely to occur, taking as their features the first and second formant trajectories. The separability between vowel clusters is found to be greatly improved for the vowel samples. In addition, distribution of the vowels on a feature plane characterized by this method seems to reflect their perceptual nature when presented to listeners without isolation from their phonetic environments. The results suggest that the method proposed here is useful for automatic speech recognition and help infer some possible mechanisms underlying dynamic aspects of human speech recognition. 相似文献

16.

On the perception of similarity among talkers

Remez RE Fellowes JM Nagel DS 《The Journal of the Acoustical Society of America》2007,122(6):3688-3696

A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers. 相似文献

17.

汉语连续语音识别中一种新的音节间相关识别单元 总被引：1，自引：0，他引：1

李春王作英《声学学报》2003,28(2):187-191

考虑汉语连续语音中的协同发音现象对语音识别性能的提高是非常重要的。针对汉语语音的特点,提出了一种新的在汉语连续语音识别中考虑音节间协同发音现象,对声学模型进行细化的识别单元。然后基于语音学知识对音节间上下文影响进行分类,实现单元间状态参数的共享,降低了模型的复杂程度,保证了模型的可训练度。这种方法和传统方法的最大不同在于:这种方法完全利用语音学知识进行聚类,而传统方法采用数据驱动的聚类方式。识别实验表明,基于语音学分类的音节间相关识别单元对识别性能有明显的改善,系统的首选误识率降低了17%。相似文献

18.

Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion

Ghosh PK Narayanan S 《The Journal of the Acoustical Society of America》2011,130(4):EL251-EL257

An automatic speech recognition approach is presented which uses articulatory features estimated by a subject-independent acoustic-to-articulatory inversion. The inversion allows estimation of articulatory features from any talker's speech acoustics using only an exemplary subject's articulatory-to-acoustic map. Results are reported on a broad class phonetic classification experiment on speech from English talkers using data from three distinct English talkers as exemplars for inversion. Results indicate that the inclusion of the articulatory information improves classification accuracy but the improvement is more significant when the speaking style of the exemplar and the talker are matched compared to when they are mismatched. 相似文献

19.

Review of text-to-speech conversion for English 总被引：7，自引：0，他引：7

D H Klatt 《The Journal of the Acoustical Society of America》1987,82(3):737-793

The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix. 相似文献

20.

基于Transformer的普通话语声识别模型位置编码选择

下载免费PDF全文

徐冬冬《应用声学》2021,40(2):194-199

具有自注意机制的Transformer网络在语声识别研究领域渐渐得到广泛关注.该文围绕着将位置信息嵌入与语声特征相结合的方向,研究更加适合普通话语声识别模型的位置编码方法.实验结果得出,采用卷积编码的输入表示代替正弦位置编码,可以更好地融合语声特征上下文联系和相对位置信息,获得较好的识别效果.训练的语声识别系统是在Tr... 相似文献