首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese.  相似文献   

2.
对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。   相似文献   

3.
汉语语句通常存在音高下倾现象,然而关于语句内部韵律词的具体音高表现目前的研究尚较欠缺。本研究使用的对话语料选自973电话语料库,包括69段对话,涉及79位说话人;朗读话语语料为广播电台两位主持人的新闻播音,长度为221个语句,对语句内部韵律词的高音点、低音点及音域进行了分析,结果显示对话与朗读话语多数语句的音高呈前高后低的走势,不过口语对话较长语句前半段的音高下降趋势不太明显。与朗读话语相比,口语对话韵律词的音域通常比较小。对话语句最后一个韵律词的音域相对较大,而朗读话语内部韵律词的音域大多没有差异。本研究的结果,将有助于语音合成中语句内部韵律词音阶及音域的构拟。   相似文献   

4.
汉语文-语转换系统的研究与实现   总被引:4,自引:0,他引:4       下载免费PDF全文
本文重点介绍一种基于语音学的分词算法和语音韵律规律的研究.同时还介绍了我们研制的无限词汇的文字到语音的转换系统,该系统可把计算机内的文本转换成语音输出.  相似文献   

5.
6.
The effect of accentuation and word duration on the naturalness of speech.   总被引:1,自引:0,他引:1  
In this study the effect of appropriate word duration and correct (pitch) accentuation on the naturalness of speech was investigated. In the stimulus material, the information value of the target word determined the correctness of accentuation ([new, +accent] and [old, -accent] were defined as correct). Appropriate word duration was defined as either "in agreement with accentuation" ([long, +accent] and [short, -accent]) or "in agreement with information value" ([long, new] and [short, old]). Listeners were asked to give naturalness judgments along a scale from 1 (very unnatural) to 10 (very natural) on fragments consisting of two sentences. Duration and accentuation of the target word, which always occurred in the second sentence, were manipulated separately and in combinations. Judgments show that accentuation that is not in agreement with information value causes a significant decrease of naturalness. When accentuation is in agreement with information value but duration is inappropriate for both factors, the perceived naturalness decreases significantly. However, listeners were unable to give consistent naturalness judgments on the manipulated word durations in fragments with incorrect accent distributions. Based on these results and the findings of an earlier production study [W. Eefting, J. Acoust. Soc. Am. 89, 412-424 (1991)], which showed that duration is not involved in the realization of pitch accent, the following is suggested. Speakers adapt both accentuation and word duration in order to indicate that a word contains relevant information. Presence of an accent distinguishes the word from its (less relevant) environment. A longer duration provides the listener with the extra time that is needed in order to process the word's content adequately.  相似文献   

7.
The perceptive multi-dimension structure of Chinese syllables is studied by psychological-physical experiment. The results indicate that FO and duration are interrelated to two main dimensions of the perceptive structure of Chinese syllable. And the prosodic characteristics such as the position of syllable in prosodic hierarchical structure, as well as the stress will be induced the various distribution of syllable in perception space.  相似文献   

8.
The differences of the pitch and duration of Chinese syllables between Putonghua (PTH) and Taiwan Mandarin (TM) were studied. The speech materials to be used are not only isolated syllables, but also sentences. The results reveal that: For the isolated syllables, T1 and T2 in TM are influenced by Minnan dialect, therefore their pitch are lower than those in PTH. T3 is fall-rise in PTH, while it is fall in TM. Moreover, the syllable duration sequence for different tone is T3〉T2〉T1〉T4 in PTH, while it is T1〉T2〉T3〉T4 in TM. For the syllables in sentences, T2 is mid-rise in PTH, while it is mid-level in TM. And the T3 is longer than T4 but shorter than T1 or T2 in PTH, while it is the shortest in TM. Furthermore the effects of prosodic phrase boundary on duration for different tones are almost the same in PTH, but the lengthening part of T1 or T2 is longer than that of T3 or T4 in TM.  相似文献   

9.
Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.  相似文献   

10.
By analyzing the acoustic data of a Chinese news report,the present research explores the pattern of how to change syllable duration and pitch of stress when isolated clauses are connected into a discourse.Comparing the same clause between isolated and in discourse context,the pitch variation of the clause nucleus can be most manifests,i.e.the top points as a whole fall remarkably;furthermore the degree of pitch falling varies with different kinds of stresses.When clause stresses are not assigned the status of discourse stress,they show a weakening effect of stress;it means pitch falling and syllable duration shortening.In a discourse composed of several clauses,speakers can modulate clause prosody by varying the strength of stresses;thereby realize the overall control of the discourse prosody and exact semantic expression.The findings on phonetic material from broadcast will shed light on the teaching of news broadcasting and contribute to the prosodic control of Chinese Putonghua synthesis.  相似文献   

11.
Review of text-to-speech conversion for English   总被引:7,自引:0,他引:7  
The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix.  相似文献   

12.
Relying on a corpus of thirty narrative discourses,the roles of pitch and duration of prosodic words in sentence accent were studied in discourse context.At first,the pitch was normalized.Then according to the pitch range,the sentence and prosodic word were classified into three ranks of strengthened,normal and weakened respectively.In the same time the sentence accent was classified into two levels of primary and secondary by perceptual evaluation. The results showed that the relative pitch range of prosodic words in opposition to sentence contributed dominantly to sentence accent.Furthermore,the roles of pitch and duration in sentence accent were affected interactively by the rank of sentence and prosodic words.In normal prosodic words,primary sentence accents were realized by the mutual performance of pitch and duration while secondary sentence accents mainly depended on the variation of pitch. In strengthened prosodic words,the role of duration in sentence accent was more significant when the pitch range of the sentence was more compressed.Finally,it was found that the correlation between pitch and duration was influenced primarily by the strength of prosodic words,and in weakened,normal and strengthened prosodic words,the correlations between pitch and duration were positive,null,and negative respectively.  相似文献   

13.
Automatic speech recognition using psychoacoustic models.   总被引:1,自引:0,他引:1  
An approach to automatic speech recognition is described, which, in a straightforward way, follows the concept of (1) preprocessing in terms of auditory parameters and (2) subsequent classification and recognition. The preprocessing system has been realized in analog hardware, while recognition is carried out on a digital computer. In the preprocessing system, the essential psychoacoustic principles of the perception of loudness, pitch, roughness, and subjective duration are implemented with some approximation. The system essentially consists of 24 bandpass filters, nonlinear transformation of each filter output into specific loudness and specific roughness, and final transformation of these parameters into total loudness, total roughness, and three spectral momenta. As a means to further reduce the information flow, continuous selection of dominant parameters is also considered on the basis of psychoacoustic data. The subsequent recognition process is mainly characterized by (1) discrimination between speech and silent periods, (2) detection of syllable peaks and classification of syllable nuclei, and (3) assumption of syllable boundaries and classification of consonant clusters. Though the entire system as yet is far from being complete and perfect, the present results indicate that the concept provides a systematic and promising way towards automatic recognition of continuous speech.  相似文献   

14.
采用心理统计方法对中等规模语料库进行分析,探讨句法、韵律及其声学相关物之间的关系,根据汉语口语常规重音分布的规律,研究普通话常规重音分布规则及其在实际话语中应用的先后次序,最终建立适用于汉语文语转换系统的常规重音分布规则系统。  相似文献   

15.
长时语音特征在说话人识别技术上的应用   总被引:1,自引:0,他引:1  
本文除介绍常用的说话人识别技术外,主要论述了一种基于长时时频特征的说话人识别方法,对输入的语音首先进行VAD处理,得到干净的语音后,对其提取基本时频特征。在每一语音单元内把基频、共振峰、谐波等时频特征的轨迹用Legendre多项式拟合的方法提取出主要的拟合参数,再利用HLDA的技术进行特征降维,用高斯混合模型的均值超向量表示每句话音时频特征的统计信息。在NIST06说话人1side-1side说话人测试集中,取得了18.7%的等错率,与传统的基于MFCC特征的说话人系统进行融合,等错率从4.9%下降到了4.6%,获得了6%的相对等错率下降。   相似文献   

16.
The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably.  相似文献   

17.
As text-to-speech systems develop, it becomes necessary to compare various solutions and to evaluate whether a change in the synthesis procedure has an effect on the listener's attitude to the system. The possibility of directly scaling intelligibility, naturalness, and user's satisfaction (i.e., acceptability) with the magnitude estimation technique is investigated. A magnitude estimation protocol suitable for this purpose is described. In general, within the limits of the methodological constraints discussed in this paper, the procedure appears to be reliable and valid for quantifying the perceived attributes of synthesized speech.  相似文献   

18.
In order to improve the naturalness of TTS speech so as to represent the cadence of natural speech, it is necessary to have a study on the pitch of spontaneous speech. Based on the 973 telephone corpus, the pitch ranges and pitch registers of 1084 intonation phrases are analyzed. It is found that intonation phrases can be classified according to their ranges and registers, and this is related to their positions in dialogue exchange. Compared with read speech, the pitch patterns in dialogue are more variable.  相似文献   

19.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

20.
Several approaches to Chinese dialect identification based on segmental and prosodic features of speech are described in this paper. When using segmental information only, the system performs phonotactic analysis after speech utterances have been tokenized into sequences of broad phonetic classes. The second scheme comprises prosodic models which are trained to capture tone sequence information for individual dialects. Also proposed is a novel approach that examines differences between Chinese dialects at broad phonetic and prosodic levels. These algorithms were evaluated via a multispeaker read-speech mode. Simulation results indicate that the combined use of segmental and prosodic features allows the proposed system to discriminate among three major Chinese dialects spoken in Taiwan with 93.0% accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号