首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
张家騄 《应用声学》1998,17(2):44-48
本文以介绍第五届欧洲言语通讯和技会议-Eurospeech’97及其卫星会议为主,概述言语科学与技术研究领域的国际学术会议情况以及本领域的最新发展。  相似文献   

2.
颜永红 《应用声学》2009,28(2):81-89
本文对语言声学研究的最新进展进行综述。首先介绍了人类的言语的产生和感知以及声学分析方面的近期发展,接着重点阐述了计算机处理人类语音(包括语音识别和合成,发音评估以及演唱评价)的最新研究、成果。同时提及了这些研究成果的相关应用。最后是总结与展望。  相似文献   

3.
It has been suggested that pauses between words could act as indices of processes such as selection, retrieval or planning that are required before an utterance is articulated. For normal meaningful phrase utterances, there is hardly any information regarding the relationship between articulation and pause duration and their subsequent relation to the final phrase duration. Such associations could provide insights into the mechanisms underlying the planning and execution of a vocal utterance. To execute a fluent vocal utterance, children might adopt different strategies in development. We investigate this hypothesis by examining the roles of articulation time and pause duration in meaningful phrase utterances in 46 children between the ages of 4 and 8 years, learning English as a second language.Our results indicate a significant reduction in phrase, word and interword pause duration with increasing age. A comparison of pause, word and phrase duration for individual subjects belonging to different age groups indicates a changing relationship between pause and word duration for the production of fluent speech. For the youngest children, a strong correlation between pause and word duration indicates local planning at word level for speech production and thus greater dependence of pause on immediate word utterance. In contrast for the oldest children we find a significant drop in correlation between word and pause indicating the emergence of articulation and pause planning as two independent processes directed at producing a fluent utterance. Strong correlations between other temporal parameters indicate a more holistic approach being adopted by the older children for language production.  相似文献   

4.
汉语耳语音孤立字识别研究   总被引:6,自引:0,他引:6       下载免费PDF全文
杨莉莉  林玮  徐柏龄 《应用声学》2006,25(3):187-192
耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%.  相似文献   

5.
I.IntroductionRecentlytherearemanykindsofsystemsandproductsforspeechrecognition,butalmostallofthemareworkinginquietenvironment,theperformancearedegradedorevencan'tworkwhenitisoperatedinhighnoisyenvironmentssuchasincockpits,vehicle,workshopsetc.SonoiserobustnesshasbecomeoneofthemainobstaclesfortherealaPplicationsoftheautomaticspeechrecognizersanditattractstheattentionofresearchersinspeechtechnologyareas.Since1978,substantialeffortshavebeendevotedtotestandevaluatethespeechrecognizersusedinfight…  相似文献   

6.
The acoustical characteristics of 14 university classrooms at the University of British Columbia were measured before and after renovation—seven of these are discussed in detail here. From these measurements, and theoretical considerations, values of quantities used to assess each classroom configuration were predicted, and used to evaluate renovation quality. Information on each renovation was determined with the help of the university campus-planning office and/or the project acoustical consultant. These were related to the evaluation results in order to determine the relationship between design and acoustical quality. The criteria focused on the quality of verbal communication in the classrooms. Room-average Speech Intelligibility (SI) and its physical correlate, Speech Transmission Index (STI), were used to quantify verbal-communication quality. A simplified STI-calculation procedure was applied. The results indicate that some renovations were beneficial, others were not. Verbal-communication quality varied from ‘poor’ to ‘good’. The effect of a renovation depends on a complex interplay between changes in the reverberation and changes in the signal-to-noise level difference, as affected by sound absorption and the source outputs. Renovations which reduce noise are beneficial unless signal-to-noise level differences remain optimal. Renovations often put too much emphasis on adding sound absorption to control reverberation, at the expense of lower speech levels, particularly at the backs of classrooms. The absorption and noise contributed by room occupants has apparently often been neglected.  相似文献   

7.
The purpose of this study was to determine whether individuals show differences in speech and voice during reading of the same news before and after attending a radio announcing course. Twenty-five students of a Radio Announcing Course in Sao Paulo city, 17 men and 8 women, aged 19 to 55 years, participated in this study. The readings were recorded in a professional audio studio, and the speech samples were submitted to perceptual and acoustic analysis. For the perceptual analysis, the samples were randomly presented in pairs and five trained speech pathologists identified each recording as pre- and posttraining, and also justified their choices by indicating what parameters better based their judgment: type of voice, articulation and pronunciation, loudness, pitch, resonance, speech rate, respiratory coordination, and use of emphasis. The acoustic parameters analyzed were mean, minimum, and maximum fundamental frequency, frequency range, text duration, and pause duration. The perceptual analysis showed that the posttraining speech samples were considered the best productions in 80% of the evaluations. Emphasis characterized the readings (70.4%), followed by type of voice (44.8%) and pitch (40.8%). Acoustic analysis showed higher mean fundamental frequency and increase of frequency range posttraining. These results indicated richer modulation in the posttraining readings. There are differences in the readings of the same news pre- and posttraining in a radio announcing course, and the posttraining reading was considered the best production, indicating the positive effect of the training.  相似文献   

8.
颜永红 《应用声学》2012,31(1):35-41
本文对语言声学与内容理解研究的最新进展进行综述。首先介绍人类的言语的产生、感知以及声学分析方面的进展,接着分别介绍采用计算机来对语音中的各种信息进行抽取(包括语音、说话人和语种识别)和内容分析与理解(包括文档内容分析和理解与对话)的最新成果,最后对语言声学与内容理解的研究进行了总结和展望。  相似文献   

9.
A number of objective evaluation methods are currently used to quantify the speech intelligibility in a built environment, including the speech transmission index (STI), rapid speech transmission index (RASTI), articulation index (AI), and the percent articulation loss of consonants (%ALCons). Certain software programs can quickly evaluate STI, RASTI, and %ALCons from a measured room impulse response. In this project, two impulse-response-based software packages (WinMLS and SIA-Smaart Acoustic Tools) were evaluated for their ability to determine intelligibility accurately. In four different spaces with background noise levels less than NC 45, speech intelligibility was measured via three methods: (1) with WinMLS 2000; (2) with SIA-Smaart Acoustic Tools (v4.0.2); and (3) from listening tests with humans. The study found that WinMLS measurements of speech intelligibility based on STI, RASTI, and %ALCons corresponded well with performance on the listening tests. SIA-Smaart results were correlated to human responses, but tended to under-predict intelligibility based on STI and RASTI, and over-predict intelligibility based on %ALCons.  相似文献   

10.
朱应俊  周文君  朱川  马建敏 《应用声学》2023,42(5):1090-1098
为了使机器能够更好地理解人的情感并改善人机交互体验,可对语声特征及分类网络进行融合以提升情感识别性能。本文从网络融合的角度,把基于梅尔倒谱系数和逆梅尔倒谱系数的二维卷积神经网络和基于散射卷积网络系数的长短期记忆网络作为前端网络,提取前端网络的中间层作为话语级的特征表示,利用压缩-激励(SE)通道注意力机制对前端网络的中间层的权重进行调整并融合,然后由深度神经网络后端分类器输出情感分类结果。在汉语情感数据集中进行五折交叉验证的对比实验,实验结果表明,基于SE通道注意力机制的网络融合方式可以有效地利用不同前端网络在语声情感识别任务中的优势,提高语声情感识别的准确率。  相似文献   

11.
This paper examines the impact of room acoustic conditions on the speech intelligibility of four languages (English, Polish, Arabic and Mandarin). Listening test scores (diagnostic rhyme tests, phonemically balanced word tests and phonemically balanced sentence tests) of the four languages were compared under four room acoustic conditions defined by their speech transmission index (STI = 0.2, 0.4, 0.6 and 0.8). The results obtained indicated that there was a statistically significant difference between the word intelligibility scores of languages under all room acoustic conditions, apart from the STI = 0.8 condition. English was the most intelligible language under all conditions, and differences with other languages were larger when conditions were poor (maximum difference of 29% at STI = 0.2, 33% at STI = 0.4 and 14% at STI = 0.6). Results also showed that Arabic and Polish were particularly sensitive to background noise, and that Mandarin was significantly more intelligible than those languages at STI = 0.4. Consonant-to-vowel ratios and languages’ distinctive features and acoustical properties explained some of the scores obtained. Sentence intelligibility scores confirmed variations between languages, but these variations were statistically significant only at the STI = 0.4 condition (sentence tests being less sensitive to very good and very poor room acoustic conditions). Overall, the results indicate that large variations between the speech intelligibility of different languages can occur, especially for spaces that are expected to be challenging in terms of room acoustic conditions. Recommendations solely based on room acoustic parameters (e.g. STI) might then prove to be insufficient for designing a multilingual environment.  相似文献   

12.
Over the last few decades, researchers have been investigating the mechanisms involved in speech production. Image analysis can be a valuable aid in the understanding of the morphology of the vocal tract. The application of magnetic resonance imaging to study these mechanisms has been proven to be reliable and safe. We have applied deformable models in magnetic resonance images to conduct an automatic study of the vocal tract; mainly, to evaluate the shape of the vocal tract in the articulation of some European Portuguese sounds, and then to successfully automatically segment the vocal tract's shape in new images. Thus, a point distribution model has been built from a set of magnetic resonance images acquired during artificially sustained articulations of 21 sounds, which successfully extracts the main characteristics of the movements of the vocal tract. The combination of that statistical shape model with the gray levels of its points is subsequently used to build active shape models and active appearance models. Those models have then been used to segment the modeled vocal tract into new images in a successful and automatic manner. The computational models have thus been revealed to be useful for the specific area of speech simulation and rehabilitation, namely to simulate and recognize the compensatory movements of the articulators during speech production.  相似文献   

13.
本文提出了语音信号的一种时域-频域-能量表示,并给出了算法,可用于孤立词语音识别,这种时域-频域-能量表示有两个特点,基于短时能量梯度的非线性时间规正,可保留语音信号频域的过滤特性,丢掉其稳态特性,计算量小,适于实时应用。  相似文献   

14.
本文提出了语音信号的一种时域─频域─能量表示,并给出了算法,可用于孤立词语音识别.这种时域─频域─能量表示有两个特点:基于短时能量梯度的非线性时间规正,可保留语音信号频域的过渡特性,丢掉其稳态特性;计算量小,适于实时应用.  相似文献   

15.
Installing open ceiling meeting rooms inside a large open-plan office provides a solution to increase speech privacy and to reduce speech disturbance in the office. The open ceiling meeting rooms have advantages of low cost construction and flexibility, but have lower speech privacy than that of enclosed rooms due to the open ceiling. Existing research shows that many factors should be taken into account to achieve good speech privacy in open-plan offices and improving only one of these factors may result in little improvement, so it is important to distinguish contributions of different acoustic transmission paths of open ceiling meeting rooms in open-plan offices. This paper proposes an impulse response separation method to quantify contributions of various acoustic paths of open ceiling rooms on speech privacy in open-plan offices. The method is verified with simulations based on the Odeon software and the experiments carried out in 3 different types of rooms. Finally, the proposed method is applied to the Fabpod, a semi enclosed meeting room located in a large indoor office at the Design Research Institute of the RMIT University, to obtain the contributions of different acoustic transmission paths to its speech privacy. The method proposed in this paper and the knowledge obtained are useful for architects to improve the acoustic performance of the next generation Fabpods which are now under design at RMIT University.  相似文献   

16.
Prader-Willi syndrome (PWS) is a multisystem disorder caused by DNA abnormalities involving chromosome 15. Major characteristics are infant hypotonia, hypogonadism, mental retardation, a short stature, atypical facial appearance, and the onset of obesity due to insatiable hunger in early childhood. Also, speech and language abnormalities have been reported including voice disorders. These have seldom been studied in detail, however. This paper reports the results of an acoustic and aerodynamic investigation of the voice in 22 individuals with PWS. Two age groups were distinguished, a group of children [chronological age (CA) 6 years, 7 months through 11 years, 7 months; total intelligence quotient (TIQ) 40-88] and a group of adolescents and adults (CA 17 years, 1 month through 29 years, 5 months; TIQ 41-94). Both aerodynamic and acoustic parameters were obtained and compared with normative data from the Belgian Study Group on Voice Disorders. It was found that voice difficulties do commonly occur in individuals with PWS including impairment of frequency levels, voice quality, and poor aerodynamic capabilities.  相似文献   

17.
This paper examines how breathing differs in the upright and supine body positions. Passive and active forces and associated chest wall motions are described for resting tidal breathing and speech breathing performed in the two positions. Clinical implications are offered regarding evaluation and treatment of breathing behavior in clients with speech and voice disorders.  相似文献   

18.
S.K. Tang 《Applied Acoustics》2008,69(12):1318-1331
A survey on the speech related acoustical parameters in the Hong Kong classrooms having standardized architectural layouts is carried out in the present study. Results suggest that these acoustical parameters are highly correlated with each other even across different octave bands. It is also found that the relationships between parameters of different kinds do not depend on the frequency bands. Besides, the present results indicate that the sound pulse decay inside a not very reverberant classroom consists of an initial fast decay, leading to deviations of the field survey results from those predicted by the exponential decay under the uniform sound energy decay assumption. It is believed that the strong correlations between the various speech related acoustical parameters and the regression information obtained in the present study can help the estimation of the speech quality of the classrooms in the design stage.  相似文献   

19.
Customarily, speaking and singing have tended to be regarded as two completely separate sets of behaviors in clinical and educational settings. The treatment of speech and voice disorders has focused on the client's speaking ability, as this is perceived to be the main vocal behavior of concern. However, according to a broader voice-science perspective, given that the same vocal structure is used for speaking and singing, it may be possible to include singing in speech and voice therapy. In this article, a theoretical framework is proposed that indicates possible benefits from the inclusion of singing in such therapeutic settings. Based on a literature review, it is demonstrated theoretically why singing activities can potentially be exploited in the treatment of prepubertal children suffering from speech and voice disorders. Based on this theoretical framework, implications for further empirical research and practice are suggested.  相似文献   

20.
To evaluate the functional difference of the pars recta and pars oblique during speech production, the electromyographic activities of these muscles were measured in thyroidectomized patients. The hooked wire electrodes were inserted into the normal side of the bellies of the pars recta and pars oblique bundles. Two kinds of sentences were used to obtain pitch changes, a simple interrogative sentence and a complex sentence with stress contrasts. The pars recta and pars oblique were simultaneously activated for initial lengthening and tensing of vocal folds to produce speech. The pars oblique might be initially more active than the pars recta at the initial task of speech and the pars recta might be more active at the pitch elevation in the interrogative sentence and the stress contrast of the complex sentence. The maximum electromyographic activity range of the pars recta and pars oblique seemed to be nearly equal. These results demonstrated that the patterns of electrical activities of the two bellies are different during speech and the combined activities of the pars recta and pars oblique are important in the adjustment of the vocal fold length during speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号