首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
汉语语音资料库的语音学标记及人工切分   总被引:2,自引:0,他引:2  
介绍了汉语语音综合资料库的一个子库:CAS-SYL。该数据库包括汉语全部有调音节1267个,共计10个发音人;全部语音数据由人工完成音段切分及语音学标注。针对汉语音节的声韵结构,语音学标注水平被定位在半音节层次上.语音学标注符号系统采用了计算机可读的音标符号系统一汉语SAMPA-X(extendedSAMPhoneticAlphabet).还介绍了语音学标注策略,音段定位原则,基于语音波形的声门关闭时刻:GCI(GlottalClosedInstant)的声学线索。同时对声韵间的协同发音的声学体现进行了总结。最后对人工切分带来的非稳定性进行了分析.  相似文献   

2.
汉语普通话机读音标SAMPA-SC   总被引:2,自引:0,他引:2  
机读音标SAMPA在欧共体语言中已得到普遍应用,在国际言语资料库和言语输入/输出系统评价协调委员会(CO-COSDA)的主持下,已向世界上众多语言推广.本文是对汉语机读音标的修改和扩展.汉语普通话机读音标,是以汉语拼音方案为基础,分别列出声母(辅音)、元音、韵母和声调的机读音标SAMPA和国际音标,并给出相应的例字.本文还特别把机读音标扩展到儿化韵,以适应语音识别、文-语转换技术和语言教学的需要.  相似文献   

3.
本文摘要介绍言语资料库和言语输入/输出评价方法研究的历史发展,国际组织,国家组织,以及国内外的现状和未来的工作.  相似文献   

4.
根据汉语语音特点,为汉语人机对话系统设计了反映汉语语音主要特征、现象的语音试验材料集,作为汉语人机对话系统语音知识库的素材,用于研究语音合成规则、训练语音识别模板等.语音材料的选择以全面、典型、适量为原则,以反映汉语语音的韵律特征和音色特征的变化规律为目的,分别以声调及其组合、声韵组合为基础选择语音材料.全部材料已用数字录音机录制完毕.  相似文献   

5.
汉语语音合成系统评价方法   总被引:1,自引:0,他引:1  
从1994年开始,对汉语语音合成系统的工作性能定期举行全国评测.采用语言清晰度测试方法,1994年对五个不同的合成系统进行了评测和诊断.听音人为16名大学生(男8,女8),对合成言语没有经验.听音人响应是开放的听音记录.同时,还采用十点主观评价(MOS)测定言语自然度.为给出各合成系统音段层的诊断信息,对合成语音的辅音知觉混淆矩阵进行了分析.借助于对比自然言语和合成言语在不同语言层次上清晰度试验得分间的统计关系,来考察合成系统韵律特征处理的缺陷.结果表明,采用上述方法可得到评测合成系统工作性能的稳定合理的指标.有关韵律特征的评价方法有待于进一步发展.  相似文献   

6.
中国科学院声学研究所建立了一个汉语普通话语音数据库,这个语音数据库由声母、韵母、1282个单音节、几百个双音词和三音词、语音试验句、短文及数字0—9等构成。该语音数据库的发音人有六位(三男三女),他们是广播学院的教师和职业播音员,讲标准的汉语普通话。语音材料录制在高质量的磁带上,其中有一部分已数字化。已有许多汉语语音研究部门使用该语音数据库。  相似文献   

7.
颜永红 《应用声学》2009,28(2):81-89
本文对语言声学研究的最新进展进行综述。首先介绍了人类的言语的产生和感知以及声学分析方面的近期发展,接着重点阐述了计算机处理人类语音(包括语音识别和合成,发音评估以及演唱评价)的最新研究、成果。同时提及了这些研究成果的相关应用。最后是总结与展望。  相似文献   

8.
张家騄 《应用声学》1990,9(4):35-40
语言是人类最重要的交际工具,在现代信息社会中,语言也正在成为人和机器之间交往的重要工具。为实现这个目标——人机语言通讯,就要对语言(口头语言和书面语言)的各种特性进行深入的研究。为了竞争信息机器的市场,近五年来,许多国家都建立了国家资助的项目,也有一些国际合作项目,以促进语言信息处理技术的发展。由于语言具有很强的民族特点和地区差别,为了能赶上现代信息社会的发展,各国都需要对本国的官方语言进行必要的研究。本文着重介绍国际上几个影响较大的国家项目,包括它们的研究目标、组织结构、研究路线和所取得的结果。  相似文献   

9.
在过去的十年中,中国科学院声学研究所建立了一个文语转换系统,它包括语音库,声调模型和基本合成规则.无限词汇的汉语合成问题初步解决,但合成言语的自然度必须进一步改进.我们对语言的音段特征和超音段特征对合成言语自然度的影响做了研究,结果表明影响合成言语自然度的基本因素是语言的节奏和协同发音.本系统所采用的声调模式适合于单句合成,对于大于单句的语言单元的合成,必须十分仔细地控制语调才能达成高自然度.本文介绍利用主观评价对合成语言自然度进行研究的方法和结果.  相似文献   

10.
汉语听觉视觉双模态数据库CAVSR1.0   总被引:8,自引:0,他引:8  
听觉视觉双模态语音识别在国际上已经逐渐成为当前语音识别的热点之一,汉语的双模态识别研究也已开始启动。然而,由于视觉信息获取及处理难度极大,目前的双模态语音数据库的建设尚显薄弱,汉语方面更是空白。鉴于此,我们在进行听觉视觉双模态语音识别关键技术研究的同时,在分析国外同类数据库的结构的基础上,结合汉语语音的特点,建立了汉语语音的第一个双模态数据库CAVSR1.0。它具有如下特点:采用的语料涵盖所有声韵母,其规模(总数据量、音节量)超出目前国际上同类数据库;语料分布符合汉语声韵母的实际分布概率,因此其反映的规律具有代表性;捆绑了自动音节分割程序及脸部主要特征标定程序,使数据库具有很强的可扩展性。  相似文献   

11.
基于发音特征的汉语普通话语音声学建模   总被引:3,自引:0,他引:3  
将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。   相似文献   

12.
The problems of evaluating the phonetic quality of speech and the characteristic features of a speaker’s articulatory base from the data of acoustic-phonetic measurements are considered. The evaluation is recommended to be performed using the GOST 50840-95 standard “Speech Transmission over Varied Communication Channels: Techniques for Measurement of Speech Quality, Intelligibility, and Voice Identification,” which was put into effect in Russia in 1997. Examples of experimental evaluation measurements in speech communication and criminalistic expertise are presented.  相似文献   

13.
肖东  莫福源  陈庚  郭圣明  马力 《声学学报》2013,38(5):589-596
中远距离(>10 km)水声语音通信时,由于可利用带宽窄、复杂多变等不利因素对信息传输率的制约,语音编码速率应降到尽可能的低。利用水声信道传播时延大的特点,结合人耳听觉感知的特性,在深入研究混合激励线性预测编码(MELP)标准之后,提出一种语音编码速率可调节的变比特率语音编码算法。其平均码速率约600 bps,主观语音质量评估平均得分(PESQ MOS)约2.8分。对该编码算法性能进行了计算机仿真和海上实验验证。实验及仿真表明,在误码率不高于10-3时,本算法表现良好且稳定,合成语音清晰可懂,易于辨认说话人。   相似文献   

14.
针对语音无线通信中带宽资源受限的问题,提出基于压缩采样的低速率语音编码算法。以基尼系数为指标,比较不同稀疏变换域下语音信号的稀疏性,分析常见重构算法对语音信号压缩采样观测信号的重构特性。对标准耳蜗滤波器——伽马啁啾滤波器组的参数进行研究,并以梯度投影稀疏重建(GPSR)算法重构语音信号。利用语音质量感知评估(PESQ)、信噪比和主观听觉测试,对编解码后的合成语音信号进行了质量评估。实验表明,基于压缩感知的语音编码器以4 kbps的低速率对语音进行编码时,PESQ得分可达到3.16,计算复杂度相对较低,可以用于实际的语音编码环境。  相似文献   

15.
Speech signal is corrupted unavoidably by noisy environment in subway, factory, and restaurant or speech from other speakers in speech communication. Speech enhancement methods have been widely studied to minimize noise influence in different linear transform domain, such as discrete Fourier transform domain, Karhunen-Loeve transform domain or discrete cosine transform domain. Kernel method as a nonlinear transform has received a lot of interest recently and is commonly used in many applications including audio signal processing. However this kind of method typically suffers from the computational complexity. In this paper, we propose a speech enhancement algorithm using low-rank approximation in a reproducing kernel Hilbert space to reduce storage space and running time with very little performance loss in the enhanced speech. We also analyze the root mean squared error bound between the enhanced vectors obtained by the approximation kernel matrix and the full kernel matrix. Simulations show that the proposed method can improve the computation speed of the algorithm with the approximate performance compared with that of the full kernel matrix.  相似文献   

16.
Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentence recognition across sentence lists. Given the phonetic balancing within lists and the validation with spectrally degraded speech, the present MSP test materials may be useful for assessing speech performance of Mandarin-speaking CI listeners.  相似文献   

17.
A Speech Intelligibility Index (SII) for the sentences in the Cantonese version of the Hearing In Noise Test (CHINT) was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe [J. Speech Hear. Res. 34, 427-438 (1991)]. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test [Nilsson et al., J. Acoust. Soc. Am. 95, 1085-1099 (1994)], the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in ,frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors.  相似文献   

18.
Speech intelligibility in classrooms affects the learning efficiency of students directly, especially for the students who are using a second language. The speech intelligibility value is determined by many factors such as speech level, signal to noise ratio, and reverberation time in the rooms. This paper investigates the contributions of these factors with subjective tests, especially speech level, which is required for designing the optimal gain for sound amplification systems in classrooms. The test material was generated by mixing the convolution output of the English Coordinate Response Measure corpus and the room impulse responses with the background noise. The subjects are all Chinese students who use English as a second language. It is found that the speech intelligibility increases first and then decreases with the increase of speech level, and the optimal English speech level is about 71 dBA in classrooms for Chinese listeners when the signal to noise ratio and the reverberation time keep constant. Finally, a regression equation is proposed to predict the speech intelligibility based on speech level, signal to noise ratio, and reverberation time.  相似文献   

19.
汉语耳语标准频谱的测量与计算   总被引:1,自引:0,他引:1  
孙飞  沈勇  李炬  安康 《声学学报》2010,35(4):477-480
提出了与GB7348-87《耳语标准频谱》不同的汉语耳语功率谱密度级随频率的变化关系。在消声室中测量以提高测量信噪比,使用实时分析仪测量单个人耳语发音的长期声压频谱,并且对每个人的长期声压频谱做自归一化,通过数学方法将多个样本"混录",计算出汉语耳语的功率谱密度级。汉语耳语标准频谱的测量和计算结果可为一切产生、传输、接收和处理汉语耳语信号的系统及电声器件的设计提供依据。   相似文献   

20.
The effects of intensity on monosyllabic word recognition were studied in adults with normal hearing and mild-to-moderate sensorineural hearing loss. The stimuli were bandlimited NU#6 word lists presented in quiet and talker-spectrum-matched noise. Speech levels ranged from 64 to 99 dB SPL and S/N ratios from 28 to -4 dB. In quiet, the performance of normal-hearing subjects remained essentially constant in noise, at a fixed S/N ratio, it decreased as a linear function of speech level. Hearing-impaired subjects performed like normal-hearing subjects tested in noise when the data were corrected for the effects of audibility loss. From these and other results, it was concluded that: (1) speech intelligibility in noise decreases when speech levels exceed 69 dB SPL and the S/N ratio remains constant; (2) the effects of speech and noise level are synergistic; (3) the deterioration in intelligibility can be modeled as a relative increase in the effective masking level; (4) normal-hearing and hearing-impaired subjects are affected similarly by increased signal level when differences in speech audibility are considered; (5) the negative effects of increasing speech and noise levels on speech recognition are similar for all adult subjects, at least up to 80 years; and (6) the effective dynamic range of speech may be larger than the commonly assumed value of 30 dB.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号