首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 46 毫秒
1.
张全 《应用声学》2002,21(1):35-39
本文涉及语言声学的语音学研究、文语转换技术、语音识别技术及自然语言处理等方面,简要介绍了有关汉语在前三方面的进展和发展方向;重点介绍了面向整个自然语言理解处理的新理论一概念层次网络(HNC)理论的主要内容及其进展,试图在理论层面上给出HNC理论的基本概貌。  相似文献   

2.
颜永红 《应用声学》2012,31(1):35-41
本文对语言声学与内容理解研究的最新进展进行综述。首先介绍人类的言语的产生、感知以及声学分析方面的进展,接着分别介绍采用计算机来对语音中的各种信息进行抽取(包括语音、说话人和语种识别)和内容分析与理解(包括文档内容分析和理解与对话)的最新成果,最后对语言声学与内容理解的研究进行了总结和展望。  相似文献   

3.
语音中元音和辅音的听觉感知研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文对语音中元音和辅音的听觉感知研究进行综述。80多年前基于无意义音节的权威实验结果表明辅音对人的听感知更为重要,由于实验者在学术上的成就和权威性,这一结论成为了常识,直到近20年前基于自然语句的实验挑战了这个结论并引发了新一轮的研究。本文主要围绕元音和辅音对语音感知的相对重要性、元音和辅音的稳态信息和边界动态信息对语音感知的影响以及相关研究的潜在应用等进行较为系统的介绍,最后给出了总结与展望。  相似文献   

4.
刘育坤  郑霖  黎塔  张鹏远 《声学学报》2023,(6):1260-1268
提出了一种面向多样化声学场景自适应设计声学编码器的方法 (SAE)。该方法通过学习不同声学场景下语音中包含的声学特征的差异,适应性地为端到端语音识别任务设计出合适的声学编码器。通过引入神经网络结构搜索技术,提高了编码器设计的有效性,从而改善了下游识别任务的性能。在Aishell-1、HKUST和SWBD三个常用的中英文数据集上的实验表明,通过所提场景自适应设计方法得到的声学编码器相比已有的声学编码器可以获得平均5%以上的错误率改善。所提方法是一种深入分析特定场景下语音特征、针对性设计高性能声学编码器的有效方法。  相似文献   

5.
本文研究了英语篇章朗读的计算机自动评分。本文根据人工评分的角度和准则,用语音识别技术分析语音,提取一系列评价特征,包括朗读完整度特征、发音准确度特征、流利度特征,然后通过SVM回归把这些评价特征映射为质量分数。在对4000名中学生的英语水平自动测试中,用3200名学生的人工评分训练系统,对其余800名学生的机器自动测试取得分差为1.18的良好结果,而专家评分与参考评分的平均分差为1.31。实验表明该项技术已达到实用化水平。  相似文献   

6.
本刊由中国科学院声学研究所《声学进展》编辑部编辑.由福建科学技术出版社出版.主要报道我国和世界各国有关声学的新成果、新技术、新应用、新工艺和新材料;评述声学研究的发展动向和最新进展;反映声学技术在工业、农业、军工、教育、科学文化、交通运输、医药卫生、海洋开发、环境保护等方面的应用.内  相似文献   

7.
连接时序分类准则声学建模方法优化   总被引:2,自引:1,他引:1       下载免费PDF全文
对基于连接时序分类准则(connectionist temporal classification,CTC)的端到端声学建模方法进行研究和优化。研究分析了不同声学特征、建模单元以及神经网络结构对CTC声学模型性能的影响,针对CTC模型中blank符号共享导致的建模缺陷提出了建模单元相关的非共享blank方法进行改进,并引入融合建模单元关联信息的模型初始化方法进一步提高CTC模型的性能。在300小时标准英文数据集Switchboard的实验结果显示,结合非共享blank、时延神经网络以及融合建模单元关联信息的初始化方法,CTC声学模型相对于基线系统在词错误率上取得绝对1.1%的下降,同时在训练速度上取得3.3倍的提高,实验结果证明本文针对端到端声学建模提出的优化方法是有效的。  相似文献   

8.
研究对成年口吃者在流畅朗读过程中的塞音进行了声学分析,测量了嗓音起始时间并且计算了塞音爆破时刻的频谱矩,并将口吃者在言语矫治前后与非口吃者进行了对比。多因素方差分析结果显示,口吃者嗓音起始时间虽稍长于非口吃者但未达到统计意义上的显著性差异水平,而且受发音部位和韵母的影响程度较大。同时还观察到矫治前口吃者和非口吃者在塞音爆破段的频谱均值呈现出显著性差异,口吃者频谱均值低于非口吃者可能是由于口吃者舌与齿龈或软硬腭形成阻塞的部位在声道中偏后所导致,还发现口吃者韵母对塞音爆破段频谱的影响较小,此结果表明口吃者表现出相对较弱的协同发音现象。口吃者经过言语矫治后塞音的嗓音起始时间和爆破段频谱有向非口吃者逼近的趋势。  相似文献   

9.
黄泰翼 《应用声学》1993,12(1):46-46
由中国自动化学会、中国声学学会等五个学术团体联合主持的第二届全国人机语音通讯学术会议,于1992年9月18日—20日在桂林举行。近一百名学者、专家及青年科技人员参加了这次会议。 在会上宣读了95篇论文,这批论文涉及听觉模型与特征提取、语音识别方法与系统、非特定人语音识别、连续语音识别与语言模型、说话人识别、神经网络  相似文献   

10.
语言是人类最重要的交际工具,随着人类社会的进步,交际活动也就更加频繁、更加重要.因而,语言作为交际工具的功能和作用也在不断扩大,这表现在近代语言通讯手段的不断发展和提高上.所谓交际主要是交换信息.语言——口头的和书面的,便是最直接、最方便、最有效的信息源,因为语言不但是交际工具,也是思维工具. 近代语言通讯技术不但延长了人类口、耳的作用距离,而且扩大了它们的功能.语言本来是人与人之间的交际工具;电子计算机的出现和计算技术的发展,人类的大脑和双手得到了进一步的延长.于是一种新的交际方式——人机对话出现了.第五代计算机,便是以具有自然语言输入输出为其特征的.今天语言通讯不但在空间距离上已经超出了人类居住的地球而伸向了宇宙空间,在功能上还扩展到了人与机器的“交往”之中.可以认为,语言通讯正处在一个发展的新阶段.因此,我们现在考察一下它的发展历史和展望未来的前景便很有必要了.  相似文献   

11.
This paper examines the impact of room acoustic conditions on the speech intelligibility of four languages (English, Polish, Arabic and Mandarin). Listening test scores (diagnostic rhyme tests, phonemically balanced word tests and phonemically balanced sentence tests) of the four languages were compared under four room acoustic conditions defined by their speech transmission index (STI = 0.2, 0.4, 0.6 and 0.8). The results obtained indicated that there was a statistically significant difference between the word intelligibility scores of languages under all room acoustic conditions, apart from the STI = 0.8 condition. English was the most intelligible language under all conditions, and differences with other languages were larger when conditions were poor (maximum difference of 29% at STI = 0.2, 33% at STI = 0.4 and 14% at STI = 0.6). Results also showed that Arabic and Polish were particularly sensitive to background noise, and that Mandarin was significantly more intelligible than those languages at STI = 0.4. Consonant-to-vowel ratios and languages’ distinctive features and acoustical properties explained some of the scores obtained. Sentence intelligibility scores confirmed variations between languages, but these variations were statistically significant only at the STI = 0.4 condition (sentence tests being less sensitive to very good and very poor room acoustic conditions). Overall, the results indicate that large variations between the speech intelligibility of different languages can occur, especially for spaces that are expected to be challenging in terms of room acoustic conditions. Recommendations solely based on room acoustic parameters (e.g. STI) might then prove to be insufficient for designing a multilingual environment.  相似文献   

12.
The purpose of this study was to determine whether individuals show differences in speech and voice during reading of the same news before and after attending a radio announcing course. Twenty-five students of a Radio Announcing Course in Sao Paulo city, 17 men and 8 women, aged 19 to 55 years, participated in this study. The readings were recorded in a professional audio studio, and the speech samples were submitted to perceptual and acoustic analysis. For the perceptual analysis, the samples were randomly presented in pairs and five trained speech pathologists identified each recording as pre- and posttraining, and also justified their choices by indicating what parameters better based their judgment: type of voice, articulation and pronunciation, loudness, pitch, resonance, speech rate, respiratory coordination, and use of emphasis. The acoustic parameters analyzed were mean, minimum, and maximum fundamental frequency, frequency range, text duration, and pause duration. The perceptual analysis showed that the posttraining speech samples were considered the best productions in 80% of the evaluations. Emphasis characterized the readings (70.4%), followed by type of voice (44.8%) and pitch (40.8%). Acoustic analysis showed higher mean fundamental frequency and increase of frequency range posttraining. These results indicated richer modulation in the posttraining readings. There are differences in the readings of the same news pre- and posttraining in a radio announcing course, and the posttraining reading was considered the best production, indicating the positive effect of the training.  相似文献   

13.
张家騄 《应用声学》1998,17(2):44-48
本文以介绍第五届欧洲言语通讯和技会议-Eurospeech’97及其卫星会议为主,概述言语科学与技术研究领域的国际学术会议情况以及本领域的最新发展。  相似文献   

14.
The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak.  相似文献   

15.
Performance-oriented children who encounter voice problems benefit from a team approach to intervention. Developmental and psychosocial issues in addition to the acquisition of information and vocal skills must be addressed. This article presents information from a speech-language pathologist's perspective and includes some examples of clinical approaches and strategies. The importance of building motivation to change vocal habits and the need for the child to develop insight and self-evaluation strategies is emphasized.  相似文献   

16.
张志浩  王坤侠 《应用声学》2022,41(5):843-850
语声情感识别对人机交互和情感计算研究领域具有重要作用,各类研究方法层出不穷。近期研究学者应用卷积神经网络和长短期记忆网络方法提取对数Mel谱图空间特征和时间特征,取得了一定的成果。然而不论是卷积神经网络还是长短期记忆网络提取特征时,都会产生特征冗余,导致语声情感识别效果下降。针对这一问题,该文提出了一种基于时空注意力机制的卷积-递归神经网络模型,采用对数Mel谱图和其一阶差分、二阶差分作为特征输入,在使用卷积神经网络提取空间特征和长短期记忆网络提取时间特征时,加入空间注意力和时间注意力机制,从而使上述网络能够更好地提取到对数Mel谱图中有效表征情感的空间特征和时间特征。该模型在Emo-DB和IEMOCAP语声数据集上的加权准确率分别达到86.8%、69.4%,未加权准确率分别达到84.7%、65.5%,优于当前大多数先进方法。  相似文献   

17.
Generation of magnetic micrbubbles and their basic magnetic and acoustic mechanism are reviewed. The ultrasound (US) and magnetic resonance (MR) dual imaging, the controlled therapeutic delivery, as well as theranostic multifunctions are all introduced based on recent research results. Some on-going research is also discussed.  相似文献   

18.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号