首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this study, the effect of articulation rate and speaking style on the perceived speech rate is investigated. The articulation rate is measured both in terms of the intended phones, i.e., phones present in the assumed canonical form, and as the number of actual, realized phones per second. The combination of these measures reflects the deletion of phones, which is related to speaking style. The effect of the two rate measures on the perceived speech rate is compared in two listening experiments on the basis of a set of intonation phrases with carefully balanced intended and realized phone rates, selected from a German database of spontaneous speech. Because the balance between input-oriented (effort) and output-oriented (communicative) constraints may be different at fast versus slow speech rates, the effect of articulation rate is compared both for fast and for slow phrases from the database. The effect of the listeners' own speaking habits is also investigated to evaluate if listeners' perception is based on a projection of their own behavior as a speaker. It is shown that listener judgments reflect both the intended and realized phone rates, and that their judgments are independent of the constraint balance and their own speaking habits.  相似文献   

2.
Patterns of durational variation were examined by applying 15 previously published rhythm measures to a large corpus of speech from five languages. In order to achieve consistent segmentation across all languages, an automatic speech recognition system was developed to divide the waveforms into consonantal and vocalic regions. The resulting duration measurements rest strictly on acoustic criteria. Machine classification showed that rhythm measures could separate languages at rates above chance. Within-language variability in rhythm measures, however, was large and comparable to that between languages. Therefore, different languages could not be identified reliably from single paragraphs. In experiments separating pairs of languages, a rhythm measure that was relatively successful at separating one pair often performed very poorly on another pair: there was no broadly successful rhythm measure. Separation of all five languages at once required a combination of three rhythm measures. Many triplets were about equally effective, but the confusion patterns between languages varied with the choice of rhythm measures.  相似文献   

3.
Variability is perhaps the most notable characteristic of speech, and it is particularly noticeable in spontaneous conversational speech. The current research examines how speakers realize the American English stops /p, k, b, g/ and flaps (? from /t, d/), in casual conversation and in careful speech. Target consonants appear after stressed syllables (e.g., "lobby") or between unstressed syllables (e.g., "humanity"), in one of six segmental/word-boundary environments. This work documents the degree and types of variability listeners encounter and must parse. Findings show greater reduction in connected and spontaneous speech, greater reduction in high frequency phrases (but not within high frequency words), and greater reduction between unstressed syllables than after a stress. Although highly reduced productions of stops and flaps occur often, with approximant-like tokens even in careful speech, reduction does not lead to a large amount of overlap between phonological categories. Approximant-like realizations of expected stops and flaps in some conditions constitute the majority of tokens. This shows that reduced speech is something that listeners encounter, and must perceive, in a large proportion of the speech they hear.  相似文献   

4.
汉语自然口语中声调识别的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
刘赵杰  邵健  张鹏远  赵庆卫  颜永红  冯稷 《物理学报》2007,56(12):7064-7069
汉语是一种带声调的语言,声调信息在汉语识别中具有非常重要的意义.传统的声调识别一般只研究朗读式语音中相对标准的声调,很少对声调调型比较复杂的自然口语进行专门的处理.针对汉语自然口语的特点,在声调建模单元的选择时提出了真实上下文的模型.同时,为了对声调模式进行精细建模,采用了一种层次聚类的方法来获得更多的声调模式.实验结果证明了方法的有效性. 关键词: 声调识别 自然口语 真实上下文模型 聚类  相似文献   

5.
We investigate two identical Λ-type atoms in free space, and focus on the entanglement between the two atoms. We derive a master equation for the atomic subspace and solve it analytically to show how the spontaneous emission from the two atoms system induces entanglement. The magnitude of the entanglement and the steady state entanglement are found to be strongly dependent on the initial states and the orientation of the dipoles of the two atoms.  相似文献   

6.
丁鹏  徐波 《声学学报》2004,29(1):23-28
分别采用基于数据聚类和基于先验知识的两种研究方法,深入探讨了性别、口音、语速、信道等非语境因素对语音数据分类与建模的影响。为了综合考虑语境、非语境因素在统一框架下建模的问题,采用非语境因素扩展决策树方法。而对于这种方法生成的多套非语境因素相关的高精度声学模型,提出一种依据最大似然准则,动态组合生成测试人相关声学模型的算法。这种方法可以使系统相对误识率平均降低8%~10%。实验结果说明为非语境因素分类建模可以提高声学模型的建模能力,而且模型组合算法可以有效解决统一建模所带来的模型选择问题。  相似文献   

7.
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.  相似文献   

8.
9.
10.
This study investigates whether the mora is used in controlling timing in Japanese speech, or is instead a structural unit in the language not involved in timing. Unlike most previous studies of mora-timing in Japanese, this article investigates timing in spontaneous speech. Predictability of word duration from number of moras is found to be much weaker than in careful speech. Furthermore, the number of moras predicts word duration only slightly better than number of segments. Syllable structure also has a significant effect on word duration. Finally, comparison of the predictability of whole words and arbitrarily truncated words shows better predictability for truncated words, which would not be possible if the truncated portion were compensating for remaining moras. The results support an accumulative model of variance with a final lengthening effect, and do not indicate the presence of any compensation related to mora-timing. It is suggested that the rhythm of Japanese derives from several factors about the structure of the language, not from durational compensation.  相似文献   

11.
Hands-free speech input is required in many modern telecommunication applications that employ autoregressive (AR) techniques such as linear predictive coding. When the hands-free input is obtained in enclosed reverberant spaces such as typical office rooms, the speech signal is distorted by the room transfer function. This paper utilizes theoretical results from statistical room acoustics to analyze the AR modeling of speech under these reverberant conditions. Three cases are considered: (i) AR coefficients calculated from a single observation; (ii) AR coefficients calculated jointly from an M-channel observation (M > 1); and (iii) AR coefficients calculated from the output of a delay-and sum beamformer. The statistical analysis, with supporting simulations, shows that the spatial expectation of the AR coefficients for cases (i) and (ii) are approximately equal to those from the original speech, while for case (iii) there is a discrepancy due to spatial correlation between the microphones which can be significant. It is subsequently demonstrated that at each individual source-microphone position (without spatial expectation), the M-channel AR coefficients from case (ii) provide the best approximation to the clean speech coefficients when microphones are closely spaced (<0.3m).  相似文献   

12.
13.
This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.  相似文献   

14.
Temporal constraints on the perception of variable-size speech fragments produced by interruption rates between 0.5 and 16 Hz were investigated by contrasting the intelligibility of gated sentences with and without silent intervals. Concatenation of consecutive speech fragments produced a significant decrease in intelligibility at 2 and 4 Hz, while having little effect at lower and higher rates. Consistent with previous studies, these findings indicate that (1) syllable-sized intervals associated with intermediate-rate interruptions are more susceptible to temporal distortions than the longer word-size or shorter phoneme-size intervals and (2) suggest qualitative differences in underlying perceptual processes at different rates.  相似文献   

15.
Variability in inspired lung volume prior to speech is only partially accounted for by speech-related concerns such as the length and loudness of the planned utterance. Control mechanisms known to influence volume variability in non-speech breathing could potentially account for some of this variability, but only if they operate during speech as well. This investigation was designed to test for the presence of several such mechanisms during reading aloud. Lung volumes were recorded from 5 normal females as they read silently, then aloud. Inspired volumes were correlated with the volumes of the previous and following expirations and with inspiratory duration. Coefficients of variation were calculated for inspiratory volume, duration, and mean flow. Time-series analyses were used to compare periodicity in inspired volume for quiet and speech breathing. Control mechanisms operating during both quiet breathing and reading aloud included slow oscillations in inspired volume and minimized variability in mean flow. Inspired volume prior to speech was weakly but significantly correlated with preceding and following expired volume. It is concluded that some control strategies typical of quiet breathing contribute to volume variability in speech breathing.  相似文献   

16.
The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.  相似文献   

17.
18.
In this paper we present an algorithm for building an empirical model of facial biomechanics from a set of displacement records of markers located on the face of a subject producing speech. Markers are grouped into clusters, which have a unique primary marker and a number of secondary markers with an associated weight. Motion of the secondary markers is computed as the weighted sum of the primary markers of the clusters to which they belong. This model may be used to produce facial animations, by driving the primary markers with measured kinematic signals.  相似文献   

19.
韩疆  尹宝林 《声学学报》2000,25(2):182-190
提出了基于帧特征、段特征联合建模的语音识别模型。该模型采用描述谱参数轨迹的段特征,在段尺度上实现了对语音信号帧间相关性的显式建模;采用段特征依赖的非平稳时间序列产生模型,实现了段特征与帧特征间的相关性建模,并在帧尺度上通过参数化的均值轨迹函数,实现了对语音信号帧间相关性的隐式建模。本文给出了基于帧特征、段特征联合统计距离优化的分段算法以及内嵌EM迭代的模型参数估计算法。对非特定人汉语孤立韵母以及多话者汉语基本音节的识别实验表明,该模型的识别性能优于标准HMM及趋势HMM。  相似文献   

20.
We perform a theoretical investigation of two modeling approaches for the amplified spontaneous emission (ASE) noise of a semiconductor optical amplifier (SOA), namely a stochastic time-domain and a deterministic frequency-domain approach. The theoretical results are compared to one another having as reference experimental measurements obtained from a commercial device. Special emphasis is placed on the modeling of the material gain as it is a key parameter in determining the ASE spectral characteristics. A comprehensive set of equations for both modeling approaches is developed and their numerical solution is analyzed in detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号