首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
I.IntroductionTheF,patternsofspeechareimportantnotonlyforthcprosodicfeaturesbuta1soforvoicesourcecharactcristics.Nowmoreandmorespeechscientistsrecognizedthatvoiceexcitationsourceintcxt-to-spccchsystemsp1aysanimportantro1elnbothintclligibilityandnaturalnessorsynthcticspcech.Espccially,forChinese,atone1anguagewithmulti-tonesystem,thetonalpatternswhicharcmainlydcmonstratedintheF,con-tourscarry1exicalmeaning.SomecomparativestudiesoftheF,pattcrnsinbetweentonelanguage(Chinese)andstress1anguage(En…  相似文献   

2.
Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.  相似文献   

3.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

4.
汉语语句通常存在音高下倾现象,然而关于语句内部韵律词的具体音高表现目前的研究尚较欠缺。本研究使用的对话语料选自973电话语料库,包括69段对话,涉及79位说话人;朗读话语语料为广播电台两位主持人的新闻播音,长度为221个语句,对语句内部韵律词的高音点、低音点及音域进行了分析,结果显示对话与朗读话语多数语句的音高呈前高后低的走势,不过口语对话较长语句前半段的音高下降趋势不太明显。与朗读话语相比,口语对话韵律词的音域通常比较小。对话语句最后一个韵律词的音域相对较大,而朗读话语内部韵律词的音域大多没有差异。本研究的结果,将有助于语音合成中语句内部韵律词音阶及音域的构拟。   相似文献   

5.
Downstep in pitch contour of Chinese Putonghua is examined using subtly designed sentences by controlling tone combinations. The results show both automatic and nonautomatic downstep phenomena exist in Chinese. In non-automatic downstep, low tones compress downwards the pitch range of the following syllables. and the main influence of downstep is on topline. Low tone not only lower the topline behind it, but also raise the high tones before it, the effects are compatible with each other. In automatic downstep, the topline of pitch contour in intonational phrase is presented as a linear downtrend, but it differs among speakers due to the effect of personal stress practice. In comparison with downstep phenomenon in other tone or non-tone languages, the downstep ratio in Chinese is not constant, and the domain of downstep is not limited within the adjacent tones.  相似文献   

6.
A series of experiments was carried out to investigate how fundamental frequency declination is perceived by speakers of English. Using linear predictor coded speech, nonsense sentences were constructed in which fundamental frequency on the last stressed syllable had been systematically varied. Listeners were asked to judge which stressed syllable was higher in pitch. Their judgments were found to reflect normalization for expected declination; in general, when two stressed syllables sounded equal in pitch, the second was actually lower. The pattern of normalization reflected certain major features of production patterns: A greater correction for declination was made for wide pitch range stimuli than for narrow pitch range stimuli. The slope of expected declination was less for longer stimuli than for shorter ones. Lastly, amplitude was found to have a significant effect on judgments, suggesting that the amplitude downdrift which normally accompanies fundamental frequency declination may have an important role in the perception of phrasing.  相似文献   

7.
Numerical estimates of pitch for stimulation of electrodes along the 22-electrode array of the Cochlear Limited cochlear implant were obtained from 18 subjects who became deaf very early in life. Examined were the relationships between subject differences in pitch estimation, subject variables related to auditory deprivation and experience, and speech-perception scores for closed-set monosyllabic words and open-set Bamford-Kowal-Bench (BKB) sentences. Reliability in the estimation procedure was examined by comparing subject performance in pitch estimation with that for loudness estimation for current levels between hearing threshold and comfortable listening level. For 56% of subjects, a tonotopic order of pitch percepts for electrodes on the array was found. A deviant but reliable order of pitch percepts was found for 22% of subjects, and essentially no pitch order was found for the remaining 22% of subjects. Subject differences in pitch estimation were significantly related to the duration of auditory deprivation prior to implantation, with the poorest performance for subjects who had a longer duration of deafness and a later age at implantation. Subjects with no tonotopic order of pitch percepts had the lowest scores for the BKB sentence test, but there were no differences across subjects for monosyllabic words. Performance in pitch estimation for electrodes did not appear to be related to performance in the estimation procedure, as all subjects were successful in loudness estimation for current level.  相似文献   

8.
The powerful techniques of covariance structure modeling (CSM) long have been used to study complex behavioral phenomenon in the social and behavioral sciences. This study employed these same techniques to examine simultaneous effects on vowel duration in American English. Additionally, this study investigated whether a single population model of vowel duration fits observed data better than a dual population model where separate parameters are generated for syllables that carry large information loads and for syllables that specify linguistic relationships. For the single population model, intrinsic duration, phrase final position, lexical stress, post-vocalic consonant voicing, and position in word all were significant predictors of vowel duration. However, the dual population model, in which separate model parameters were generated for (1) monosyllabic content words and lexically stressed syllables and (2) monosyllabic function words and lexically unstressed syllables, fit the data better than the single population model. Intrinsic duration and phrase final position affected duration similarly for both the populations. On the other hand, the effects of post-vocalic consonant voicing and position in word, while significant predictors of vowel duration in content words and stressed syllables, were not significant predictors of vowel duration in function words or unstressed syllables. These results are not unexpected, based on previous research, and suggest that covariance structure analysis can be used as a complementary technique in linguistic and phonetic research.  相似文献   

9.
Perceptual distances among single tokens of American English vowels were established for nonreverberant and reverberant conditions. Fifteen vowels in the phonetic context (b-t), embedded in the sentence "Mark the (b-t) again" were recorded by a male talker. For the reverberant condition, the sentences were played through a room with a reverberation time of 1.2 s. The CVC syllables were removed from the sentences and presented in pairs to ten subjects with audiometrically normal hearing, who judged the similarity of the syllable pairs separately for the nonreverberant and reverberant conditions. The results were analyzed by multidimensional scaling procedures, which showed that the perceptual data were accounted for by a three-dimensional vowel space. Correlations were obtained between the coordinates of the vowels along each dimension and selected acoustic parameters. For both conditions, dimensions 1 and 2 were highly correlated with formant frequencies F2 and F1, respectively, and dimension 3 was correlated with the product of the duration of the vowels and the difference between F3 and F1 expressed on the Bark scale. These observations are discussed in terms of the influence of reverberation on speech perception.  相似文献   

10.
Variation in duration and frequency during three readings of each of eight sentences by 9 normal and 4 voice-disordered subjects are compared. Instructions to the subjects varied with respect to the amount and type of cognitive cueing presented in the trials, and the sentences were read in random order. Variability in fundamental frequency (F0) was greater when pitch variation was specifically cued. Also, the portion of the sentence that was cued had greater variability in F0 than other parts of the sentence. Variation in fundamental frequency was significantly greater in the cued versus uncued sentence trials for the voice-disordered subjects but not for the normal subjects. However, all subjects exhibited significantly greater duration for cued versus uncued readings of the same sentences. Implications for theory and practice are discussed.  相似文献   

11.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

12.
The goal of this study was to measure the ability of adult hearing-impaired listeners to discriminate formant frequency for vowels in isolation, syllables, and sentences. Vowel formant discrimination for F1 and F2 for the vowels /I epsilon ae / was measured. Four experimental factors were manipulated including linguistic context (isolated vowels, syllables, and sentences), signal level (70 and 95 dB SPL), formant frequency, and cognitive load. A complex identification task was added to the formant discrimination task only for sentences to assess effects of cognitive load. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. Higher signal level also elevated formant thresholds primarily for F2. However, no effect of the additional identification task on the formant discrimination was observed. In comparable conditions, these hearing-impaired listeners had elevated thresholds for formant discrimination compared to young normal-hearing listeners primarily for F2. Altogether, poorer performance for formant discrimination for these adult hearing-impaired listeners was mainly caused by hearing loss rather than cognitive difficulty for tasks implemented in this study.  相似文献   

13.
Linguistic modality effects on fundamental frequency in speech   总被引:2,自引:0,他引:2  
This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information.  相似文献   

14.
In tone languages there are potential conflicts in the perception of lexical tone and intonation, as both depend mainly on the differences in fundamental frequency (F0) patterns. The present study investigated the acoustic cues associated with the perception of sentences as questions or statements in Cantonese, as a function of the lexical tone in sentence final position. Cantonese listeners performed intonation identification tasks involving complete sentences, isolated final syllables, and sentences without the final syllable (carriers). Sensitivity (d' scores) were similar for complete sentences and final syllables but were significantly lower for carriers. Sensitivity was also affected by tone identity. These findings show that the perception of questions and statements relies primarily on the F0 characteristics of the final syllables (local F0 cues). A measure of response bias (c) provided evidence for a general bias toward the perception of statements. Logistic regression analyses showed that utterances were accurately classified as questions or statements by using average F0 and F0 interval. Average F0 of carriers (global F0 cue) was also found to be a reliable secondary cue. These findings suggest that the use of F0 cues for the perception of intonation question in tonal languages is likely to be language-specific.  相似文献   

15.
I.IntroductionEady[1]examinedhowthefundamentalfrequencyF,patternsintone1anguage-Chinesearcdifferentfromthatinstresslang1agnEnglish.HearguedthatthcF,patternsinatonelanguagearesystematica1lydifferentfromthoseinastress1anguageandhisfindingcontradictstheclaimofBo1ingerthat-humanspcakerseverywhercdoessentiallythesamethingwithfundamentalpiteh'.EadyusedcepstalmethodtOdoF,extractionsohecannotgetaninsightintothemicrostructureofthelaryngea1vibrahonsprecisely,althoughanaveragerateofchangeinF,forevery…  相似文献   

16.
The pitch strength of rippled noise and iterated rippled noise has recently been fitted by an exponential function of the height of the first peak in the normalized autocorrelation function [Yost, J. Acoust. Soc. Am. 100, 3329-3335 (1996)]. The current study compares the pitch strengths and autocorrelation functions of rippled noise (RN) and another regular-interval noise, "AABB." RN is generated by delaying a copy of a noise sample and adding it to the undelayed version. AABB with the same pitch is generated by taking a sample of noise (A) with the same duration as the RN delay and repeating it to produce AA, and then concatenating many of these once-repeated sequences to produce AABBCCDD.... The height of the first peak (h1) in the normalized autocorrelation function of AABB is 0.5, identical to that of RN. The current experiments show the following: (1) AABB and RN can be discriminated when the pitch is less than about 250 Hz. (2) For these low pitches, the pitch strength of AABB is greater than that for RN whereas it is about the same for pitches above 250 Hz. (3) When RN is replaced by iterated rippled noise (IRN) adjusted to match the pitch strength of AABB, the two are no longer discriminable. The pitch-strength difference between AABB and RN below 250 Hz is explained in terms of a three-stage, running-autocorrelation model. It is suggested that temporal integration of pitch information is achieved in two stages separated by a nonlinearity. The first integration stage is implemented as running autocorrelation with a time constant of 1.5 ms. The second model stage is a nonlinear transformation. In the third model stage, the output of the nonlinear transformation is long-term averaged (second integration stage) to provide a measure of pitch strength. The model provides an excellent fit to the pitch-strength matching data over a wide range of pitches.  相似文献   

17.
通用实时语言识别系统——RTSRS(01)   总被引:2,自引:0,他引:2       下载免费PDF全文
俞铁城 《物理学报》1978,27(5):508-515
本文描述一个通用实时语言识别系统——RTSRS(01)。在以前工作的基础上,每条口呼命令的参数在时间域上规正,采用二值频谱,大大压缩了参考音的参数存贮量,同时应用新的求差距的办法,使得识别所需的时间大为缩短,以致字表为200时能实时识别。专人的识别结果为:口呼数字,99.7%;20句话(每句7字),99.7%;四字成语100个,99.5%;四字成语150个,99.3%;四字成语200个,98.8%;四字成语400个,99.7%。非正式的实验表明,对于不同音节数的字表,乃至口呼英语数字或BASIC语句名字等,都有高的正确识别率。 关键词:  相似文献   

18.
This study examined the perception and acoustics of a large corpus of vowels spoken in consonant-vowel-consonant syllables produced in citation-form (lists) and spoken in sentences at normal and rapid rates by a female adult. Listeners correctly categorized the speaking rate of sentence materials as normal or rapid (2% errors) but did not accurately classify the speaking rate of the syllables when they were excised from the sentences (25% errors). In contrast, listeners accurately identified the vowels produced in sentences spoken at both rates when presented the sentences and when presented the excised syllables blocked by speaking rate or randomized. Acoustical analysis showed that formant frequencies at syllable midpoint for vowels in sentence materials showed "target undershoot" relative to citation-form values, but little change over speech rate. Syllable durations varied systematically with vowel identity, speaking rate, and voicing of final consonant. Vowel-inherent-spectral-change was invariant in direction of change over rate and context for most vowels. The temporal location of maximum F1 frequency further differentiated spectrally adjacent lax and tense vowels. It was concluded that listeners were able to utilize these rate- and context-independent dynamic spectrotemporal parameters to identify coarticulated vowels, even when sentential information about speaking rate was not available.  相似文献   

19.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

20.
维吾尔语焦点的韵律实现及感知   总被引:1,自引:0,他引:1  
通过严格控制的语音实验,研究了维吾尔语陈述句中焦点对音高和时长的调节作用。实验设计了两个目标句,请发音人根据上下文自然地强调句中相应的词,随后还考察了焦点的感知问题。结果表明:(1)以句末焦点为基线,维吾尔语焦点的韵律编码方式类似于北京话和英语中的"三区段"调节模式,表现为焦点词音高升高、音域扩大和焦点后音高骤降(音域变窄),而焦点前音高变化不大;(2)焦点词和焦点前的词时长都有延长,而焦点后的词没有明显变化;(3)对焦点感知的正确率平均可达90%左右,表明焦点的韵律编码方式是有效的感知线索;(4)感知实验及语调分析还显示,维吾尔语"中性焦点"语调特征与英语和汉语不同,它接近句首焦点而不是句末焦点。另外,论文特别讨论了"焦点后音高骤降"在中国语言中的分布及来源问题。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号