首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
文中讨论了汉语单词的结构规律及其对清晰度得分的影响.试验表明,清晰度得分不仅与听音人收到的言语信号的物理特性——外部信息有关,而且强烈地依赖于语言本身的结构规律——内部信息. 本文提出了一种新的统计关系,它更好地符合大量试验的数据.  相似文献   

2.
采用一种新的方法导出了汉语普通话的清晰度指数.给出了音节清晰度与理想高、低通滤波系统截止频率的关系、清晰度指数与音节清晰度的关系. 对所得清晰度指数进行了试验检查.  相似文献   

3.
张家■:中国科学院声学所研究 员,中国声学学会常务理事,语 言、听觉和音乐声学分会主任, 《应用声学》副主编。主要从事 言语科学和言语技术研究。设计 了汉语普通话清晰度试验方法和建立了汉语可懂度理论基础;导出了汉语清晰度指数;建立了不同语言单位清晰度试验得分之间的统计关系,并且证明了汉语音节结构有助于提高可懂度。在不同语速、不同声级下测得了远场和近场的语言长时平均频谱。定量地证明了汉语声调对提高可懂度的作用。揭示汉语元音的内在音高规律并实验研究语音产生中的相互作用。曾获国家自然科学三等奖,中国科学院…  相似文献   

4.
汉语声调在言语可懂度中的重要作用   总被引:11,自引:0,他引:11  
在四种失真(高、低通滤波并混有非线性失真)条件下,进行言语清晰度试验,证明了失真条件下,声调具有很强的抗干扰能力,声调清晰度几乎不受影响。 采用以下不同的激励源(1.白噪声、2.基频固定的三角波、3.基频随声带基频变化的三角波、4.条件1.加条件3.)产生的合成言语,进行言语清晰度试验,证明:声调信息大大有助于提高言语可懂度,声调信息主要存在于基频随时间的变化中,强度变化对声调信息有补偿作用,以及清辅音的存在与否对声调清晰度是有一定影响的。  相似文献   

5.
句法边界的韵律学表现   总被引:7,自引:1,他引:6  
本文研究朗读语句中不同等级的句法边界与附近音节的韵律学参数和边界处停顿时长之间的系统关系。结果看到;边界前音节的时域和频域参数随边界等级的系统变化.在时域,边界前音节的时长和停顿之和随边界等级提高几乎是线性增长;边界前音节时长随边界等级的变化是双向的,在短语边界处达到最大;停顿在大的句法边界处增长很快;在音节内部辅音时长和能量峰值的归一化位置也随边界等级有系统变化.在频域,边界前音节基频均值陆边界等级提高而下降,音域逐渐收缩.这些结果将为连续言语合成、识别和理解系统中处理语句句法结构和语音的关系提供实验依据.  相似文献   

6.
本文讨论人们活动场所的语言清晰度,鸡尾酒会效应和它对语言清晰度的影响,对会议厅,剧院,教室,侯机大厅,侯车厅,体育馆,溜冰场等处的测量和清晰度试验,建立了主观评价和清晰度得分关系并给出一些这类场所不引起鸡尾酒会效应的允噪声级阈值,最后讨论了鸡尾酒会效应的计算机模型。  相似文献   

7.
汉语孤立字全音节实时识别系统   总被引:1,自引:0,他引:1  
本文在大量语音实验的基础上,对汉语语音识别方法进行了较为深入的探讨,并以IBMPC/AT配以自行研制开发的TMS320C25-E型高速信号处理板为硬件基础,建立了一个特定人汉语普通话全音节实时识别系统.该系统针对汉语普通话的语音特点,采用了分层识别策略.整个系统响应时间小于0.2秒,用4遍1240个全音节语音对系统进行的严格测试表明:系统四声识别的平均正确率为99%左右,音节识别前5个候选的正确识别率分别为82%,91%,94%,96%,97%;同时,本文根据这一测试结果建立了相应的声韵母混淆矩阵和基于Shepard方法的相似度集群分析树图,并对照汉语语音合成清晰度测试结果及汉语语音知觉结构的集群分析结果,对本系统各部分进行了较为深入的分析,提出了相应的改进措施.  相似文献   

8.
借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。  相似文献   

9.
借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。   相似文献   

10.
当音节的时程长于60毫秒时,普通话中6个单元音的识别率一直保持在95%以上的水平。当音节的时程短于60毫秒时,元音识别率随时程的缩短而逐渐下降。当音节短至8—13毫秒时,元音即不能被识别。识别误差的分布情况和元音的音位很有关系,音位相近的元音较易互相误辨。文中比较了信号时程的长短在元音识别中及在声音音调辨别中的不同作用,并讨论了差异的原因。  相似文献   

11.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

12.
The purpose of this exploratory study was to examine the relationship between undergraduate vocal music majors' diction acquisition abilities for singing in a nonnative language (as rated both by themselves and by their studio voice teachers) and their scores on an objective test of phonemic and stress perception. Ten students with varying levels of university voice training served as participants. The results showed significant negative correlations between each of the teachers' four ratings and the students' scores on the phonemic awareness subtest. In addition, 20% of the students demonstrated evidence of underdeveloped phonemic awareness skills, as indicated by their below average test performance. Considerable individual differences were also observed in the students' abilities to track phonemes within a sequence of phonemes, count and track syllables within a sequence of syllables, and track combinations of phoneme and syllable changes in sequence, as evidenced by subtest performance scores. These findings corroborate existing reports which indicate that approximately 30% of the population does not fully develop phonemic awareness skills in the absence of special training. The findings support the utility of this objective test of phonemic and stress perception as a means of identifying students who will have difficulty with diction acquisition, and point to possibilities for pretraining to improve their response to diction instruction.  相似文献   

13.
An acoustic analysis of whispered consonants in comparison to normally phonated consonants was conducted in time and intensity domains. Consonant duration and average root mean square intensity were measured for six speakers in both articulation modes. Each of 25 Serbian consonants (C) was sited between the vowel /a/ forming a syllable of /aCa/ type. Such a syllable was placed in initial, medial, and final position in the carrier sentence. Results showed that whispered consonants have a prolonged duration of about 10% on average (statistically significant, ANOVA test), and that the unvoiced consonants have a smaller time dimension extension (5.8%) than voiced ones (15.3%). Examination at subphonemic level showed that there is no difference in voice-onset-time and affrication duration in unvoiced plosives and affricates, in both whispered and phonated mode of articulation, but the difference is significant for voiced ones. Analysis of consonant duration versus place of articulation showed that palatal place is most sensitive in the process of whispering. In all experiments, the results are very consistent with respect to the subjects and test material (Pearson's correlation was between 0.6 and 0.9). In intensity domain, all unvoiced consonants in whispered mode of articulation have almost unchanged intensity in comparison to phonated mode (the difference is maximum 3.5 dB). On the contrary, voiced consonants in the whispered mode were reduced in intensity by as much as 25 dB, as nasals and semivowels. Average intensity of whispered consonants is lowered by 12d B in comparison to phonated ones, and does not depend on syllabic position inside the sentences.  相似文献   

14.
According to recent theoretical accounts of place of articulation perception, global, invariant properties of the stop CV syllable onset spectrum serve as primary, innate cues to place of articulation, whereas contextually variable formant transitions constitute secondary, learned cues. By this view, one might expect that young infants would find the discrimination of place of articulation contrasts signaled by formant transition differences more difficult than those cued by gross spectral differences. Using an operant head-turning paradigm, we found that 6-month-old infants were able to discriminate two-formant stimuli contrasting in place of articulation as well as they did five-formant + burst stimuli. Apparently, neither the global properties of the onset spectrum nor simply the additional acoustic information contained in the five-formant + burst stimuli afford the infant any advantage in the discrimination task. Rather, formant transition information provides a sufficient basis for discriminating place of articulation differences.  相似文献   

15.
Natural speech consonant-vowel (CV) syllables [( f, s, theta, s, v, z, ?] followed by [i, u, a]) were computer edited to include 20-70 ms of their frication noise in 10-ms steps as measured from their onset, as well as the entire frication noise. These stimuli, and the entire syllables, were presented to 12 subjects for consonant identification. Results show that the listener does not require the entire fricative-vowel syllable in order to correctly perceive a fricative. The required frication duration depends on the particular fricative, ranging from approximately 30 ms for [s, z] to 50 ms for [f, s, v], while [theta, ?] are identified with reasonable accuracy in only the full frication and syllable conditions. Analysis in terms of the linguistic features of voicing, place, and manner of articulation revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frication duration than identification in terms of voicing and manner of articulation.  相似文献   

16.
17.
The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably.  相似文献   

18.
汉语综合资料库的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
语言是人类最重要的交际工具,随着现代信息技术的发展,语言也是人与机器之间交际的有效工具.近年来世界各国纷纷建立本国的言语资料库作为言语科学研究和言语技术开发的基础.汉语综合资料库的语音材料有:汉语全部有调音节、数字串、单词、韵律特征材料,以及语言清晰度试验用音节表、词表、句表和有代表性的短文等.汉语综合资料库在语言学和语音学特征以及声学特征方面充分体现汉语的基本特点.首先要解决语料选取问题,考虑各种语言单位的使用频率,不仅要包括全部高频词,也要反映较全面的语音现象.数据库在结构上是开放的模块式的,同时配有灵活的数据库管理系统.  相似文献   

19.
A controversial issue in neurolinguistics is whether basic neural auditory representations found in many animals can account for human perception of speech. This question was addressed by examining how a population of neurons in the primary auditory cortex (A1) of the naive awake ferret encodes phonemes and whether this representation could account for the human ability to discriminate them. When neural responses were characterized and ordered by spectral tuning and dynamics, perceptually significant features including formant patterns in vowels and place and manner of articulation in consonants, were readily visualized by activity in distinct neural subpopulations. Furthermore, these responses faithfully encoded the similarity between the acoustic features of these phonemes. A simple classifier trained on the neural representation was able to simulate human phoneme confusion when tested with novel exemplars. These results suggest that A1 responses are sufficiently rich to encode and discriminate phoneme classes and that humans and animals may build upon the same general acoustic representations to learn boundaries for categorical and robust sound classification.  相似文献   

20.
李贤  於俊  汪增福 《声学学报》2014,39(4):509-516
面向情感语音转换,该文提出了一种韵律转换方法。该方法包含基频转换和时长转换两个部分,前者选择离散余弦变换(DCT)参数化基频,根据基频的层次结构特点,将基频分解为短语层和音节层两个层次,使用基于混合高斯模型(GMM)的转换方法对两个层次分别进行转换;后者使用基于分类回归树(CART)的方法以声韵母为基本单位对时长进行转换。一个包含三种基本情感的语料库用作训练和测试,客观评测以及主观评测实验结果显示该方法可有效进行情感韵律转换,其中悲伤情感在主观实验中达到了接近100%的正确率。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号