期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A text-to-speech system with high intelligibility and naturalness for Chinese 总被引：1，自引：0，他引：1

CHU Min LU Shinan 《声学学报：英文版》1996,(1)

I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet… 相似文献

2.

Interacting effects of syllable and phrase position on consonant articulation

Byrd D Lee S Riggs D Adams J 《The Journal of the Acoustical Society of America》2005,118(6):3860-3873

The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably. 相似文献

3.

普通话中语段重音对小句声学特征的调节

陈玉东吕士楠杨玉芳《声学学报》2009,34(4):378-384

对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。相似文献

4.

Suprasegmental and segmental timing models in Mandarin Chinese and American English

van Santen JP Shih C 《The Journal of the Acoustical Society of America》2000,107(2):1012-1026

This paper formalizes and tests two key assumptions of the concept of suprasegmental timing: segmental independence and suprasegmental mediation. Segmental independence holds that the duration of a suprasegmental unit such as a syllable or foot is only minimally dependent on its segments. Suprasegmental mediation states that the duration of a segment is determined by the duration of its suprasegmental unit and its identity, but not directly by the specific prosodic context responsible for suprasegmental unit duration. Both assumptions are made by various versions of the isochrony hypothesis [I. Lehiste, J. Phonetics 5, 253-263 (1977)], and by the syllable timing hypothesis [W. Campbell, Speech Commun. 9, 57-62 (1990)]. The validity of these assumptions was studied using the syllable as suprasegmental unit in American English and Mandarin Chinese. To avoid unnatural timing patterns that might be induced when reading carrier phrase material, meaningful, nonrepetitive sentences were used with a wide range of lengths. Segmental independence was tested by measuring how the average duration of a syllable in a fixed prosodic context depends on its segmental composition. A strong association was found; in many cases the increase in average syllabic duration when one segment was substituted for another (e.g., bin versus pin) was the same as the difference in average duration between the two segments (i.e., [b] versus [p]). Thus, the [i] and [n] were not compressed to make room for the longer [p], which is inconsistent with segmental independence. Syllabic mediation was tested by measuring which locations in a syllable are most strongly affected by various contextual factors, including phrasal position, within-word position, tone, and lexical stress. Systematic differences were found between these factors in terms of the intrasyllabic locus of maximal effect. These and earlier results obtained by van Son and van Santen [R. J. J. H van Son and J. P. H. van Santen, "Modeling the interaction between factors affecting consonant duration," Proceedings Eurospeech-97, 1997, pp. 319-322] showing a three-way interaction between consonantal identity (coronals vs labials), within-word position of the syllable, and stress of surrounding vowels, imply that segmental duration cannot be predicted by compressing or elongating segments to fit into a predetermined syllabic time interval. In conclusion, while there is little doubt that suprasegmental units play important predictive and explanatory roles as phonological units, the concept of suprasegmental timing is less promising. 相似文献

5.

A statistics-based pitch contour model for Mandarin speech

Chen SH Lai WH Wang YR 《The Journal of the Acoustical Society of America》2005,117(2):908-925

相似文献

6.

Segmental durations in the vicinity of prosodic phrase boundaries.

C W Wightman S Shattuck-Hufnagel M Ostendorf P J Price 《The Journal of the Acoustical Society of America》1992,91(3):1707-1717

Numerous studies have indicated that prosodic phrase boundaries may be marked by a variety of acoustic phenomena including segmental lengthening. It has not been established, however, whether this lengthening is restricted to the immediate vicinity of the boundary, or if it extends over some larger region. In this study, segmental lengthening in the vicinity of prosodic boundaries is examined and found to be restricted to the rhyme of the syllable preceding the boundary. By using a normalized measure of segmental lengthening, and by compensating for differences in speaking rate, it is also shown that at least four distinct types of boundaries can be distinguished on the basis of this lengthening. 相似文献

7.

焦点、词重音与边界调对语调短语末词基频模式的影响

下载免费PDF全文

张璐祖漪清闫润强《声学学报》2012,37(4):448-456

研究了语调短语边界处焦点、词重音位置与上升的边界调对语调短语末词基频模式的影响。通过分析两个美式英语语料库语调短语末词的声学特征,我们发现当该单词是焦点时,重音的基频峰值比边界调的尾值高;边界调在重音实现后才充分体现出来;词重音在音节结构中后移会压缩词重音后基频调域范围。当语调短语末词不是焦点时,边界调的上升趋势从开始就体现出来,并压制了词重音的基频凸显。我们的结论是,焦点可以通过提升词重音基频峰值的高度完成;焦点和边界调实现的力度受词重音所处位置限制,在极端的情况下,边界调只能在语调短语最末音节的尾部实施。在有限音段上这些韵律特征都有表达其功能最彻底的一段位置,它们竞相展现,此消彼长。相似文献

8.

Chinese dialect identification using segmental and prosodic features

Chang WW Tsai WH 《The Journal of the Acoustical Society of America》2000,108(4):1906-1913

Several approaches to Chinese dialect identification based on segmental and prosodic features of speech are described in this paper. When using segmental information only, the system performs phonotactic analysis after speech utterances have been tokenized into sequences of broad phonetic classes. The second scheme comprises prosodic models which are trained to capture tone sequence information for individual dialects. Also proposed is a novel approach that examines differences between Chinese dialects at broad phonetic and prosodic levels. These algorithms were evaluated via a multispeaker read-speech mode. Simulation results indicate that the combined use of segmental and prosodic features allows the proposed system to discriminate among three major Chinese dialects spoken in Taiwan with 93.0% accuracy. 相似文献

9.

The effects of prosodic boundaries on nasality in Taiwan Min

Pan HH 《The Journal of the Acoustical Society of America》2007,121(6):3755-3769

This study explores the effects of prosodic boundaries on nasality at intonational phrase, word, and syllable boundaries. The subjects were recorded saying phrases that contained a syllable-final nasal consonant followed by a syllable-initial stop. The timing, duration, and magnitude of the nasal airflows measured were used to determine the extent of nasality across boundaries. Nasal amplitudes were found to vary in a speaker-dependent manner among boundary types. However, the patterns of nasal contours and temporal aspects of the airflow parameters consistently varied with boundary type across all the speakers. In general, the duration of nasal airflow and nasal plateau were the longest at the intonational phrase boundary, followed by word boundary and then syllable boundary. In addition to the hierarchical influence of boundary strength, there were unique phonetic markings associated with individual boundaries. In particular, two nasal rises interrupted by nasal inhalation occurred only across an intonation phrase boundary. Also, unexpectedly, a word boundary was marked by the longest postboundary vowel, whereas a syllable boundary was marked with the shortest nasal duration. The results here support the hierarchical effect of boundary on both domain-edge strengthening and cross-boundary coarticulation. 相似文献

10.

Focus in production: tonal shape, intensity and word order

Vainio M Järvikivi J 《The Journal of the Acoustical Society of America》2007,121(2):EL55-EL61

The effect of word order and prosodic focus on the tonal shape and intensity in the production of prosody was studied. The results show that the production of focus in Finnish follows a global pattern with regard to tonal features. The relative pitch height difference between contrasted words is the most important pitch-related factor in signaling narrow prosodic focus. Narrow focus is not localized to prosodically emphasized words only but relates to the utterance as a whole. It was also found that syntactic structure with respect to both intensity and tonal structure modulated relative prosodic prominence of individual words. 相似文献

11.

Contrastive analysis on tone between Putonghua and Taiwan Mandarin

DENG Dan SHI Feng L Shinan 《声学学报：英文版》2008,27(2):150-158

The differences of the pitch and duration of Chinese syllables between Putonghua （PTH） and Taiwan Mandarin （TM） were studied. The speech materials to be used are not only isolated syllables, but also sentences. The results reveal that： For the isolated syllables, T1 and T2 in TM are influenced by Minnan dialect, therefore their pitch are lower than those in PTH. T3 is fall-rise in PTH, while it is fall in TM. Moreover, the syllable duration sequence for different tone is T3〉T2〉T1〉T4 in PTH, while it is T1〉T2〉T3〉T4 in TM. For the syllables in sentences, T2 is mid-rise in PTH, while it is mid-level in TM. And the T3 is longer than T4 but shorter than T1 or T2 in PTH, while it is the shortest in TM. Furthermore the effects of prosodic phrase boundary on duration for different tones are almost the same in PTH, but the lengthening part of T1 or T2 is longer than that of T3 or T4 in TM. 相似文献

12.

Experimental study on declination in Chinese intonation

GAO Lu HUANG Xianjun YANG Yufang L Shinan 《声学学报：英文版》2011,30(2):214-226

The aim of designing sentences with special tone combinations is to investigate the declination tendency of intonation in Chinese Putonghua.The result shows that baselines of prosodic phrases(PP),as the basic declination units,show the phenomena of declination clearly,and the declination slope is in inverse proportion to the length of PP.Different tone combinations of PP have different declination slopes.The primary conclusions are as follows: (1) If low points are at the beginning of prosodic word(PW),the absolute value of baseline slope is larger than those at the end;(2) The declination of PP is larger,if it is at the beginning of sentences than that at the end;(3) When there are several PPs in sentences,although in different syntactic relations the degree of declination is different,they have the common ground of PP declination. 相似文献

13.

Acoustic and perceptual cues for compound-phrasal contrasts in Vietnamese

Nguyen AT Ingram JC 《The Journal of the Acoustical Society of America》2007,122(3):1746

This paper reports two series of experiments that examined the phonetic correlates of lexical stress in Vietnamese compounds in comparison to their phrasal constructions. In the first series of experiments, acoustic and perceptual characteristics of Vietnamese compound words and their phrasal counterparts were investigated on five likely acoustic correlates of stress or prominence (f0 range and contour, duration, intensity and spectral slope, vowel reduction), elicited under two distinct speaking conditions: a "normal speaking" condition and a "maximum contrast" condition which encouraged speakers to employ prosodic strategies for disambiguation. The results suggested that Vietnamese lacks phonetic resources for distinguishing compounds from phrases lexically and that native speakers may employ a phrase-level prosodic disambiguation strategy (juncture marking), when required to do so. However, in a second series of experiments, minimal pairs of bisyllabic coordinative compounds with reversible syllable positions were examined for acoustic evidence of asymmetrical prominence relations. Clear evidence of asymmetric prominences in coordinative compounds was found, supporting independent results obtained from an analysis of reduplicative compounds and tone sandhi in Vietnamese [Nguye;n and Ingram, 2006]. A reconciliation of these apparently conflicting findings on word stress in Vietnamese is presented and discussed. 相似文献

14.

音节清晰度与音位清晰度之间的统计关系

下载免费PDF全文

张家騄《物理学报》1974,23(5):17-22

在大量普通话清晰度试验的基础上,根据汉语音节结构的具体分析,认为语言的内部信息——结构规律在言语的主观识别过程中是十分重要的因素。这特别表现在言语信号受到较大的干扰或失真,言语信号的物理特性已不能提供充分的识别条件时,内部信息发生较大的作用。本文提出了考虑到这一内部信息作用的音节清晰度与音位清晰度之间的统计关系。这一关系,比Fletcher与Steinberg所建立的统计关系,更好地符合大量试验的结果。相似文献

15.

Prosodic strengthening and featural enhancement: evidence from acoustic and articulatory realizations of /a,i/ in English

Cho T 《The Journal of the Acoustical Society of America》2005,117(6):3867-3878

In this study the effects of accent and prosodic boundaries on the production of English vowels (/a,i/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /i/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /i/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts. 相似文献

16.

汉语语调音高下倾的实验研究 总被引：2，自引：0，他引：2

黄贤军高路杨玉芳吕士楠《声学学报》2009,34(2):158-166

通过提取和分析特定声调组合的实验室语句的音高曲线,探讨了确定条件下的汉语语句音高下倾趋势。分析结果表明,在不同类型声调组合的陈述句中,低音线清晰地呈现出以韵律短语为基本单元的下倾现象,下倾的斜率与韵律短语长度成反比.声调组合不同,以及承载下倾特征点的音节在韵律词中的位置不同,都会导致低音线下倾的斜率不同。具体表现为:(1)当低音点处于韵律词词首时,低音线斜率的绝对值大于低音点处于韵律词词末时的绝对值;(2)韵律短语音高下倾程度还受其在句中所处位置的影响,句首韵律短语的下倾程度大于旬末韵律短语的下倾程度;(3)主句包含多个韵律短语时,它们的低音线起点可以是依次单调递降的,具体的下倾模式受短语之间句法语义关系的制约。相似文献

17.

普通话声母和韵母的统计特性

下载免费PDF全文

孙金城倪宏莫福源李昌立《应用声学》1995,14(3):35-41

本文对普通话书面语中声母、韵母的动态与静态分布特性及其差异作了统计分析，结果表明：普通话声母间的、韵母的动态与表态的相对分布关系一致，语音间的相对分布主要与发声系统有关，不受频度的影响。普通话声母、韵母的动态与静态的出现率差异，与声母发音方法和韵线组合结构、声母发音部位与韵母四呼的配合关系、音节的成字率和字的频度有关，主要受送气与不送气声母、韵母的动态与静态的出现率差异最大，多音节词中的韵母的动态相似文献

18.

Spectral structure across the syllable specifies final-stop voicing for adults and children alike

Nittrouer S Lowenstein JH 《The Journal of the Acoustical Society of America》2008,123(1):377-385

Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure. 相似文献

19.

Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style

SHAO Yanqiu HAN Jiqing ZHAO Yongzhen LIU Ting 《声学学报：英文版》2007,26(1):49-62

Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed. 相似文献

20.

Phrase boundary effects on the temporal kinematics of sequential tongue tip consonants

Byrd D Lee S Campos-Astorkiza R 《The Journal of the Acoustical Society of America》2008,123(6):4456-4465

This study evaluates the effects of phrase boundaries on the intra- and intergestural kinematic characteristics of blended gestures, i.e., overlapping gestures produced with a single articulator. The sequences examined are the juncture geminate [d(#)d], the sequence [d(#)z], and, for comparison, the singleton tongue tip gesture in [d(#)b]. This allows the investigation of the process of gestural aggregation [Munhall, K. G., and Lofqvist, A. (1992). "Gestural aggregation in speech: laryngeal gestures," J. Phonetics 20, 93-110] and the manner in which it is affected by prosodic structure. Juncture geminates are predicted to be affected by prosodic boundaries in the same way as other gestures; that is, they should display prosodic lengthening and lesser overlap across a boundary. Articulatory prosodic lengthening is also investigated using a signal alignment method of the functional data analysis framework [Ramsay, J. O., and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. (Springer-Verlag, New York)]. This provides the ability to examine a time warping function that characterizes relative timing difference (i.e., lagging or advancing) of a test signal with respect to a given reference, thus offering a way of illuminating local nonlinear deformations at work in prosodic lengthening. These findings are discussed in light of the pi-gesture framework of Byrd and Saltzman [(2003) "The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening," J. Phonetics 31, 149-180]. 相似文献