首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在过去的十年中,中国科学院声学研究所建立了一个文语转换系统,它包括语音库,声调模型和基本合成规则.无限词汇的汉语合成问题初步解决,但合成言语的自然度必须进一步改进.我们对语言的音段特征和超音段特征对合成言语自然度的影响做了研究,结果表明影响合成言语自然度的基本因素是语言的节奏和协同发音.本系统所采用的声调模式适合于单句合成,对于大于单句的语言单元的合成,必须十分仔细地控制语调才能达成高自然度.本文介绍利用主观评价对合成语言自然度进行研究的方法和结果.  相似文献   

2.
In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese.  相似文献   

3.
We determined how the perceived naturalness of music and speech (male and female talkers) signals was affected by various forms of linear filtering, some of which were intended to mimic the spectral "distortions" introduced by transducers such as microphones, loudspeakers, and earphones. The filters introduced spectral tilts and ripples of various types, variations in upper and lower cutoff frequency, and combinations of these. All of the differently filtered signals (168 conditions) were intermixed in random order within one block of trials. Levels were adjusted to give approximately equal loudness in all conditions. Listeners were required to judge the perceptual quality (naturalness) of the filtered signals on a scale from 1 to 10. For spectral ripples, perceived quality decreased with increasing ripple density up to 0.2 ripple/ERB(N) and with increasing ripple depth. Spectral tilts also degraded quality, and the effects were similar for positive and negative tilts. Ripples and/or tilts degraded quality more when they extended over a wide frequency range (87-6981 Hz) than when they extended over subranges. Low- and mid-frequency ranges were roughly equally important for music, but the mid-range was most important for speech. For music, the highest quality was obtained for the broadband signal (55-16,854 Hz). Increasing the lower cutoff frequency from 55 Hz resulted in a clear degradation of quality. There was also a distinct degradation as the upper cutoff frequency was decreased from 16,845 Hz. For speech, there was a marked degradation when the lower cutoff frequency was increased from 123 to 208 Hz and when the upper cutoff frequency was decreased from 10,869 Hz. Typical telephone bandwidth (313 to 3547 Hz) gave very poor quality.  相似文献   

4.
5.
Hearing talkers produce shorter vowel and word durations in multisyllabic contexts than in monosyllabic contexts. This investigation determined whether a similar effect occurs for deaf talkers, a population often characterized as lacking coarticulation in their speech. Four prelingually deafened adults and two hearing controls produced three sets of word sequences. Each set included a kernel word and six derived forms (e.g., "speed," "speedy," "speeding," etc.). The derived forms were created by adding unstressed and stressed syllables to the kernel form. A spectrographic analysis indicated that the deaf subjects did not always decrease word and vowel durations for the derivatives. Unlike hearing speakers, they often did not reduce vowel segments more than consonant segments. Three explanations are forwarded for the shortening effects. One relates to the implementation of temporal rules, the second concerns the organization imposed upon the articulators to produce speech, and the third suggests a language-independent vocal tract characteristic. The role of auditory information in developing the shortening effects is also considered.  相似文献   

6.
The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected.  相似文献   

7.
连续话语中双音节韵律词的重音感知   总被引:5,自引:1,他引:4  
王韫佳  初敏  贺琳  冯勇强 《声学学报》2003,28(6):534-539
对于从微软亚洲研究院的汉语语音语料库中获得的300个语句中的1,898个双音节韵律词进行了重音感知实验,实验结果表明,连续话语中双音节词的重音感知特点与孤立词的重音感知特点有所不同,它受到词所在的韵律边界的显著影响。在感知实验中,词内两音节的重音得分之差与它们的高音点音高差和时长差都表现出正相关,但与高音点音高差的相关强于与时长差的相关。高音点音高差和时长差在非停顿前不相关,在停顿前为较弱的正相关。实验结果还表明,音节的重音感知受到调型的显著影响。  相似文献   

8.
Control of rate and duration of speech movements   总被引:4,自引:0,他引:4  
A computerized pulsed-ultrasound system was used to monitor tongue dorsum movements during the production of consonant-vowel sequences in which speech rate, vowel, and consonant were varied. The kinematics of tongue movement were analyzed by measuring the lowering gesture of the tongue to give estimates of movement amplitude, duration, and maximum velocity. All three subjects in the study showed reliable correlations between the amplitude of the tongue dorsum movement and its maximum velocity. Further, the ratio of the maximum velocity to the extent of the gesture, a kinematic indicator of articulator stiffness, was found to vary inversely with the duration of the movement. This relationship held both within individual conditions and across all conditions in the study such that a single function was able to accommodate a large proportion of the variance due to changes in movement duration. As similar findings have been obtained both for abduction and adduction gestures of the vocal folds and for rapid voluntary limb movements, the data suggest that a wide range of changes in the duration of individual movements might all have a similar origin. The control of movement rate and duration through the specification of biomechanical characteristics of speech articulators is discussed.  相似文献   

9.
Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation.  相似文献   

10.
The effects of intensity on monosyllabic word recognition were studied in adults with normal hearing and mild-to-moderate sensorineural hearing loss. The stimuli were bandlimited NU#6 word lists presented in quiet and talker-spectrum-matched noise. Speech levels ranged from 64 to 99 dB SPL and S/N ratios from 28 to -4 dB. In quiet, the performance of normal-hearing subjects remained essentially constant in noise, at a fixed S/N ratio, it decreased as a linear function of speech level. Hearing-impaired subjects performed like normal-hearing subjects tested in noise when the data were corrected for the effects of audibility loss. From these and other results, it was concluded that: (1) speech intelligibility in noise decreases when speech levels exceed 69 dB SPL and the S/N ratio remains constant; (2) the effects of speech and noise level are synergistic; (3) the deterioration in intelligibility can be modeled as a relative increase in the effective masking level; (4) normal-hearing and hearing-impaired subjects are affected similarly by increased signal level when differences in speech audibility are considered; (5) the negative effects of increasing speech and noise levels on speech recognition are similar for all adult subjects, at least up to 80 years; and (6) the effective dynamic range of speech may be larger than the commonly assumed value of 30 dB.  相似文献   

11.
Acoustic measurements were conducted to determine the degree to which vowel duration, closure duration, and their ratio distinguish voicing of word-final stop consonants across variations in sentential and phonetic environments. Subjects read CVC test words containing three different vowels and ending in stops of three different places of articulation. The test words were produced either in nonphrase-final or phrase-final position and in several local phonetic environments within each of these sentence positions. Our measurements revealed that vowel duration most consistently distinguished voicing categories for the test words. Closure duration failed to consistently distinguish voicing categories across the contextual variables manipulated, as did the ratio of closure and vowel duration. Our results suggest that vowel duration is the most reliable correlate of voicing for word-final stops in connected speech.  相似文献   

12.
Study on the acoustical characteristic is important to speech and speaker recognition in Chinese whispered speech. In this paper, the characteristics of whispered speech are introduced and the acoustical characteristics in Chinese whispered speech are discussed. There is no fundamental frequency in the whispered speech, so other characteristics such as the duration and frequency of formant are extracted and analyzed. From experiments with six simple Chinese whispered vowels, it is proved that the duration and the frequency of formant can be used as the main acoustical characteristics in the Chinese whispered recognition.  相似文献   

13.
Previous research in cross-language perception has shown that non-native listeners often assimilate both single phonemes and phonotactic sequences to native language categories. This study examined whether associating meaning with words containing non-native phonotactics assists listeners in distinguishing the non-native sequences from native ones. In the first experiment, American English listeners learned word-picture pairings including words that contained a phonological contrast between CC and CVC sequences, but which were not minimal pairs (e.g., [ftake], [ftalu]). In the second experiment, the word-picture pairings specifically consisted of minimal pairs (e.g., [ftake], [ftake]). Results showed that the ability to learn non-native CC was significantly improved when listeners learned minimal pairs as opposed to phonological contrast alone. Subsequent investigation of individual listeners revealed that there are both high and low performing participants, where the high performers were much more capable of learning the contrast between native and non-native words. Implications of these findings for second language lexical representations and loanword adaptation are discussed.  相似文献   

14.
A systematic improvement in auditory performance over time, following a change in the acoustic information available to the listener (that cannot be attributed to task, procedural or training effects) is known as auditory acclimatization. However, there is conflicting evidence concerning the existence of auditory acclimatization; some studies show an improvement in performance over time while other studies show no change. In an attempt to resolve this conflict, speech recognition abilities of 16 subjects with bilateral sensorineural hearing impairments were measured over a 12-week period following provision of a monaural hearing instrument for the first time. The not-fitted ear was used as the control. Three presentation levels were used representing quiet, normal, and raised speech. The results confirm the presence of acclimatization. In addition, the results show that acclimatization is evident at the higher presentation levels but not at the lowest.  相似文献   

15.
Three experiments were conducted to study the effect of segmental and suprasegmental corrections on the intelligibility and judged quality of deaf speech. By means of digital signal processing techniques, including LPC analysis, transformations of separate speech sounds, temporal structure, and intonation were carried out on 30 Dutch sentences spoken by ten deaf children. The transformed sentences were tested for intelligibility and acceptability by presenting them to inexperienced listeners. In experiment 1, LPC based reflection coefficients describing segmental characteristics of deaf speakers were replaced by those of hearing speakers. A complete segmental correction caused a dramatic increase in intelligibility from 24% to 72%, which, for a major part, was due to correction of vowels. Experiment 2 revealed that correction of temporal structure and intonation caused only a small improvement from 24% to about 34%. Combination of segmental and suprasegmental corrections yielded almost perfectly understandable sentences, due to a more than additive effect of the two corrections. Quality judgments, collected in experiment 3, were in close agreement with the intelligibility measures. The results show that, in order for these speakers to become more intelligible, improving their articulation is more important than improving their production of temporal structure and intonation.  相似文献   

16.
17.
18.
The rationale for a method to quantify the information content of linguistic stimuli, i.e., the linguistic entropy, is developed. The method is an adapted version of the letter-guessing procedure originally devised by Shannon [Bell Syst. Tech. J. 30, 50-64 (1951)]. It is applied to sentences included in a widely used test to measure speech-reception thresholds and originally selected to be approximately equally redundant. Results of a first experiment reveal that this method enables one to detect subtle differences between sentences and sentence lists with respect to linguistic entropy. Results of a second experiment show that (1) in young listeners and with the sentences employed, manipulating linguistic entropy can result in an effect on SRT of approximately 4 dB in terms of signal-to-noise ratio; (2) the range of this effect is approximately the same in elderly listeners.  相似文献   

19.
Intelligibility tests were performed by teachers and pupils in classrooms under a variety of (road traffic) noise conditions. The intelligibility scores are found to deteriorate at (indoor) noise levels exceeding a critical value of — 15 dB with regard to a teacher's long-term (reverberant) speech level. The implications for external noise levels are discussed: typically, an external noise level of 50 dB(A) would imply that the critical indoor level is exceeded for about 20 per cent of teachers.  相似文献   

20.
An experiment was performed in which a noise containing frequencies from 10 Hz to 47 Hz was used to mask speech. The behaviour of speech intelligibility with speech presentation level and masking noise level was examined briefly.The infrasonic and low frequency masking noise did reduce the intelligibility of speech. The effect only became significant when the masking noise level was present at levels of 115 dB OASPL or above.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号