首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.  相似文献   

2.
张璐  祖漪清  闫润强 《声学学报》2012,37(4):448-456
研究了语调短语边界处焦点、词重音位置与上升的边界调对语调短语末词基频模式的影响。通过分析两个美式英语语料库语调短语末词的声学特征,我们发现当该单词是焦点时,重音的基频峰值比边界调的尾值高;边界调在重音实现后才充分体现出来;词重音在音节结构中后移会压缩词重音后基频调域范围。当语调短语末词不是焦点时,边界调的上升趋势从开始就体现出来,并压制了词重音的基频凸显。我们的结论是,焦点可以通过提升词重音基频峰值的高度完成;焦点和边界调实现的力度受词重音所处位置限制,在极端的情况下,边界调只能在语调短语最末音节的尾部实施。在有限音段上这些韵律特征都有表达其功能最彻底的一段位置,它们竞相展现,此消彼长。   相似文献   

3.
This paper formalizes and tests two key assumptions of the concept of suprasegmental timing: segmental independence and suprasegmental mediation. Segmental independence holds that the duration of a suprasegmental unit such as a syllable or foot is only minimally dependent on its segments. Suprasegmental mediation states that the duration of a segment is determined by the duration of its suprasegmental unit and its identity, but not directly by the specific prosodic context responsible for suprasegmental unit duration. Both assumptions are made by various versions of the isochrony hypothesis [I. Lehiste, J. Phonetics 5, 253-263 (1977)], and by the syllable timing hypothesis [W. Campbell, Speech Commun. 9, 57-62 (1990)]. The validity of these assumptions was studied using the syllable as suprasegmental unit in American English and Mandarin Chinese. To avoid unnatural timing patterns that might be induced when reading carrier phrase material, meaningful, nonrepetitive sentences were used with a wide range of lengths. Segmental independence was tested by measuring how the average duration of a syllable in a fixed prosodic context depends on its segmental composition. A strong association was found; in many cases the increase in average syllabic duration when one segment was substituted for another (e.g., bin versus pin) was the same as the difference in average duration between the two segments (i.e., [b] versus [p]). Thus, the [i] and [n] were not compressed to make room for the longer [p], which is inconsistent with segmental independence. Syllabic mediation was tested by measuring which locations in a syllable are most strongly affected by various contextual factors, including phrasal position, within-word position, tone, and lexical stress. Systematic differences were found between these factors in terms of the intrasyllabic locus of maximal effect. These and earlier results obtained by van Son and van Santen [R. J. J. H van Son and J. P. H. van Santen, "Modeling the interaction between factors affecting consonant duration," Proceedings Eurospeech-97, 1997, pp. 319-322] showing a three-way interaction between consonantal identity (coronals vs labials), within-word position of the syllable, and stress of surrounding vowels, imply that segmental duration cannot be predicted by compressing or elongating segments to fit into a predetermined syllabic time interval. In conclusion, while there is little doubt that suprasegmental units play important predictive and explanatory roles as phonological units, the concept of suprasegmental timing is less promising.  相似文献   

4.
Ma EP  Baken RJ  Roark RM  Li PM 《Journal of voice》2012,26(5):670.e1-670.e6
Vocal attack time (VAT) is the time lag between the growth of the sound pressure signal and the development of physical contact of vocal folds at vocal initiation. It can be derived by a cross-correlation of short-time amplitude changes occurring in the sound pressure and electroglottographic (EGG) signals. Cantonese is a tone language in which tone determines the lexical meaning of the syllable. Such linguistic function of tone has implications for the physiology of tone production. The aim of the present study was to investigate the possible effects of Cantonese tones on VAT. Sound pressure and EGG signals were simultaneously recorded from 59 native Cantonese speakers (31 females and 28 males). The subjects were asked to read aloud 12 disyllabic words comprising homophone pairs of the six Cantonese lexical tones. Results revealed a gender difference in VAT values, with the mean VAT significantly smaller in females than in males. There was also a significant difference in VAT values between the two tone categories, with the mean VAT values of the three level tones (tone 1, 3, and 6) significantly smaller than those of the three contour tones (tone 2, 4, and 5). The findings support the notion that norms and interpretations based on nontone European languages may not be directly applied to tone languages.  相似文献   

5.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

6.
This paper reports two series of experiments that examined the phonetic correlates of lexical stress in Vietnamese compounds in comparison to their phrasal constructions. In the first series of experiments, acoustic and perceptual characteristics of Vietnamese compound words and their phrasal counterparts were investigated on five likely acoustic correlates of stress or prominence (f0 range and contour, duration, intensity and spectral slope, vowel reduction), elicited under two distinct speaking conditions: a "normal speaking" condition and a "maximum contrast" condition which encouraged speakers to employ prosodic strategies for disambiguation. The results suggested that Vietnamese lacks phonetic resources for distinguishing compounds from phrases lexically and that native speakers may employ a phrase-level prosodic disambiguation strategy (juncture marking), when required to do so. However, in a second series of experiments, minimal pairs of bisyllabic coordinative compounds with reversible syllable positions were examined for acoustic evidence of asymmetrical prominence relations. Clear evidence of asymmetric prominences in coordinative compounds was found, supporting independent results obtained from an analysis of reduplicative compounds and tone sandhi in Vietnamese [Nguye;n and Ingram, 2006]. A reconciliation of these apparently conflicting findings on word stress in Vietnamese is presented and discussed.  相似文献   

7.
Post-low bouncing is a phenomenon whereby after reaching a very low pitch in a low lexical tone, F(0) bounces up and then gradually drops back in the following syllables. This paper reports the results of an acoustic analysis of the phenomenon in two Mandarin Chinese corpora and presents a simple mechanical model that can effectively simulate this bouncing effect. The acoustic analysis shows that most of the F(0) dynamic features profiling the bouncing effect strongly correlate with the amount of F(0) lowering in the preceding low-tone syllable, and that the additional F(0) raising commences at the onset of the first post-low syllable. Using the quantitative Target Approximation model, this bouncing effect was simulated by adding an acceleration adjustment to the initial F(0) state of the first post-low syllable. A highly linear relation between F(0) lowering and estimated acceleration adjustment was found. This relation was then used to effectively simulate the bouncing effect in both the neutral tone and the full tones. The results of the analysis and simulation are consistent with the hypothesis that the bouncing effect is due to a temporary perturbation of the balance between antagonistic forces in the laryngeal control in producing a very low pitch.  相似文献   

8.
Listeners' auditory discrimination of vowel sounds depends in part on the order in which stimuli are presented. Such presentation order effects have been argued to be language independent, and to result from psychophysical (not speech- or language-specific) factors such as the decay of memory traces over time or increased weighting of later-occurring stimuli. In the present study, native Cantonese speakers' discrimination of a linguistic tone continuum is shown to exhibit order of presentation effects similar to those shown for vowels in previous studies. When presented with two successive syllables differing in fundamental frequency by approximately 4 Hz, listeners were significantly more sensitive to this difference when the first syllable was higher in frequency than the second. However, American English-speaking listeners with no experience listening to Cantonese showed no such contrast effect when tested in the same manner using the same stimuli. Neither English nor Cantonese listeners showed any order of presentation effects in the discrimination of a nonspeech continuum in which tokens had the same fundamental frequencies as the Cantonese speech tokens but had a qualitatively non-speech-like timbre. These results suggest that tone presentation order effects, unlike vowel effects, may be language specific, possibly resulting from the need to compensate for utterance-related pitch declination when evaluating fundamental frequency for tone identification.  相似文献   

9.
Relying on a corpus of thirty narrative discourses,the roles of pitch and duration of prosodic words in sentence accent were studied in discourse context.At first,the pitch was normalized.Then according to the pitch range,the sentence and prosodic word were classified into three ranks of strengthened,normal and weakened respectively.In the same time the sentence accent was classified into two levels of primary and secondary by perceptual evaluation. The results showed that the relative pitch range of prosodic words in opposition to sentence contributed dominantly to sentence accent.Furthermore,the roles of pitch and duration in sentence accent were affected interactively by the rank of sentence and prosodic words.In normal prosodic words,primary sentence accents were realized by the mutual performance of pitch and duration while secondary sentence accents mainly depended on the variation of pitch. In strengthened prosodic words,the role of duration in sentence accent was more significant when the pitch range of the sentence was more compressed.Finally,it was found that the correlation between pitch and duration was influenced primarily by the strength of prosodic words,and in weakened,normal and strengthened prosodic words,the correlations between pitch and duration were positive,null,and negative respectively.  相似文献   

10.
通过设计特定声调组合和语境的实验室语句,考察了韵律短语边界对语句中降阶和焦点后音高骤降的影响规律,以及降阶和焦点的作用域。结果发现,在由两个韵律短语组成的语句中,韵律短语边界会阻断前一短语中的降阶作用,降阶的作用域是韵律短语。焦点的实现与降阶不同:焦点后的正向音高降低作用会跨越韵律短语边界,使得后一韵律短语的高音线明显降低;如果后一韵律短语中有降阶,则焦点的跨边界音高降低作用会与降阶作用累积在一起,产生更低的高音线,说明焦点的作用域是语调短语。但当后一韵律短语也出现焦点时,音高重置阻断了前一短语中焦点的正向音高降低作用,此时两个焦点分别独立地实现。   相似文献   

11.
The effects of prosodic phrase(PP)boundary on the pitch lowering of downstep and focus,as well as the domains of them were investigated in Chinese Putonghua,by using designed sentences which consist of two prosodic phrases(i.e.,PP1,PP2).The results showed that:(1)The PP boundary blocked the downstep effect in the preceding phrase,indicating that PP is the domain of downstep.(2)The post-focus F_0 lowering effect in PP1 spread across the PP boundary and lower the FO contour of PP2.If there is a downstep effect in PP2,the postboundary compression effect of the prior focus will accumulate with the downstep,producing further lowered contour.Therefore,the domain of focus is an intonational phrase(IP).(3)When there is one contrastive focus in each phrase,the outstanding pitch reset elicited by the second focus will block the FO lowering effect of PP1 onto PP2,and the two foci are realized independently.  相似文献   

12.
Recent research has found that while speaking, subjects react to perturbations in pitch of voice auditory feedback by changing their voice fundamental frequency (F0) to compensate for the perceived pitch-shift. The long response latencies (150-200 ms) suggest they may be too slow to assist in on-line control of the local pitch contour patterns associated with lexical tones on a syllable-to-syllable basis. In the present study, we introduced pitch-shifted auditory feedback to native speakers of Mandarin Chinese while they produced disyllabic sequences /ma ma/ with different tonal combinations at a natural speaking rate. Voice F0 response latencies (100-150 ms) to the pitch perturbations were shorter than syllable durations reported elsewhere. Response magnitudes increased from 50 cents during static tone to 85 cents during dynamic tone productions. Response latencies and peak times decreased in phrases involving a dynamic change in F0. The larger response magnitudes and shorter latency and peak times in tasks requiring accurate, dynamic control of F0, indicate this automatic system for regulation of voice F0 may be task-dependent. These findings suggest that auditory feedback may be used to help regulate voice F0 during production of bi-tonal Mandarin phrases.  相似文献   

13.
对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。   相似文献   

14.
Infant-directed speech (IDS) is believed to facilitate language learning. However, the benefit may be either due to clearer acoustic correlates to linguistic structures, or simply increased attention from infants induced by IDS exaggerated prosody. This study investigated the pure effect of IDS pitch on lexical tone learning, with attentional/affective factors removed by using artificial neural networks. Following training with the pitch of Mandarin tones in IDS versus adult-directed speech, the networks yielded equal tonal categorization for both registers. IDS pitch produced no additional linguistic support. IDS pitch appears to strictly play the non-linguistic role of attention/affect, which may indirectly benefit learning.  相似文献   

15.
There is a tendency across languages to use a rising pitch contour to convey question intonation and a falling pitch contour to convey a statement. In a lexical tone language such as Mandarin Chinese, rising and falling pitch contours are also used to differentiate lexical meaning. How, then, does the multiplexing of the F(0) channel affect the perception of question and statement intonation in a lexical tone language? This study investigated the effects of lexical tones and focus on the perception of intonation in Mandarin Chinese. The results show that lexical tones and focus impact the perception of sentence intonation. Question intonation was easier for native speakers to identify on a sentence with a final falling tone and more difficult to identify on a sentence with a final rising tone, suggesting that tone identification intervenes in the mapping of F(0) contours to intonational categories and that tone and intonation interact at the phonological level. In contrast, there is no evidence that the interaction between focus and intonation goes beyond the psychoacoustic level. The results provide insights that will be useful for further research on tone and intonation interactions in both acoustic modeling studies and neurobiological studies.  相似文献   

16.
A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.  相似文献   

17.
The differences of the pitch and duration of Chinese syllables between Putonghua (PTH) and Taiwan Mandarin (TM) were studied. The speech materials to be used are not only isolated syllables, but also sentences. The results reveal that: For the isolated syllables, T1 and T2 in TM are influenced by Minnan dialect, therefore their pitch are lower than those in PTH. T3 is fall-rise in PTH, while it is fall in TM. Moreover, the syllable duration sequence for different tone is T3〉T2〉T1〉T4 in PTH, while it is T1〉T2〉T3〉T4 in TM. For the syllables in sentences, T2 is mid-rise in PTH, while it is mid-level in TM. And the T3 is longer than T4 but shorter than T1 or T2 in PTH, while it is the shortest in TM. Furthermore the effects of prosodic phrase boundary on duration for different tones are almost the same in PTH, but the lengthening part of T1 or T2 is longer than that of T3 or T4 in TM.  相似文献   

18.
The three experiments reported here compare the effectiveness of natural prosodic and vocal-tract size cues at overcoming spatial cues in selective attention. Listeners heard two simultaneous sentences and decided which of two simultaneous target words came from the attended sentence. Experiment 1 used sentences that had natural differences in pitch and in level caused by a change in the location of the main sentence stress. The sentences' pitch contours were moved apart or together in order to separate out effects due to pitch and those due to other prosodic factors such as intensity. Both pitch and the other prosodic factors had an influence on which target word was reported, but the effects were not strong enough to override the spatial difference produced by an interaural time difference of +/- 91 microseconds. In experiment 2, a large (+/- 15%) difference in apparent vocal-tract size between the speakers of the two sentences had an additional and strong effect, which, in conjunction with the original prosodic differences overrode an interaural time difference of +/- 181 microseconds. Experiment 3 showed that vocal-tract size differences of +/- 4% or less had no detectable effect. Overall, the results show that prosodic and vocal-tract size cues can override spatial cues in determining which target word belongs in an attended sentence.  相似文献   

19.
The perceptive multi-dimension structure of Chinese syllables is studied by psychological-physical experiment. The results indicate that FO and duration are interrelated to two main dimensions of the perceptive structure of Chinese syllable. And the prosodic characteristics such as the position of syllable in prosodic hierarchical structure, as well as the stress will be induced the various distribution of syllable in perception space.  相似文献   

20.
Schematic fundamental frequency curves of simple statements and questions are generated for Hausa, a two-tone language of Nigeria, using a modified version of an intonational model developed by G?rding and Bruce [Nordic Prosody II, edited by T. Fretheim (Tapir, Trondheim, 1981), pp. 33-39]. In this model, rules for intonation and tones are separated. Intonation is represented as sloping grids of (near) parallel lines, inside which tones are placed. The tones are associated with turning points of the fundamental frequency contour. Local rules may also modify the exact placement of a tone within the grid. The continuous fundamental frequency contour is modeled by concatenating the tonal points using polynomial equations. Thus the final pitch contour is modeled as an interaction between global and local factors. The slope of the intonational grid lines depends at least on sentence type (statement or question), sentence length, and tone pattern. The model is tested by reference to data from nine speakers of Kano Hausa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号