期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Finding intonational boundaries using acoustic cues related to the voice source

Choi JY Hasegawa-Johnson M Cole J 《The Journal of the Acoustical Society of America》2005,118(4):2579-2587

Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection. 相似文献

2.

Effectiveness of spatial cues, prosody, and talker characteristics in selective attention

Darwin CJ Hukin RW 《The Journal of the Acoustical Society of America》2000,107(2):970-977

The three experiments reported here compare the effectiveness of natural prosodic and vocal-tract size cues at overcoming spatial cues in selective attention. Listeners heard two simultaneous sentences and decided which of two simultaneous target words came from the attended sentence. Experiment 1 used sentences that had natural differences in pitch and in level caused by a change in the location of the main sentence stress. The sentences' pitch contours were moved apart or together in order to separate out effects due to pitch and those due to other prosodic factors such as intensity. Both pitch and the other prosodic factors had an influence on which target word was reported, but the effects were not strong enough to override the spatial difference produced by an interaural time difference of +/- 91 microseconds. In experiment 2, a large (+/- 15%) difference in apparent vocal-tract size between the speakers of the two sentences had an additional and strong effect, which, in conjunction with the original prosodic differences overrode an interaural time difference of +/- 181 microseconds. Experiment 3 showed that vocal-tract size differences of +/- 4% or less had no detectable effect. Overall, the results show that prosodic and vocal-tract size cues can override spatial cues in determining which target word belongs in an attended sentence. 相似文献

3.

Second-language experience and speech-in-noise recognition: effects of talker-listener accent similarity

Pinet M Iverson P Huckvale M 《The Journal of the Acoustical Society of America》2011,130(3):1653-1662

Previous work has shown that the intelligibility of speech in noise is degraded if the speaker and listener differ in accent, in particular when there is a disparity between native (L1) and nonnative (L2) accents. This study investigated how this talker-listener interaction is modulated by L2 experience and accent similarity. L1 Southern British English, L1 French listeners with varying L2 English experience, and French-English bilinguals were tested on the recognition of English sentences mixed in speech-shaped noise that was spoken with a range of accents (French, Korean, Northern Irish, and Southern British English). The results demonstrated clear interactions of accent and experience, with the least experienced French speakers being most accurate with French-accented English, but more experienced listeners being most accurate with L1 Southern British English accents. An acoustic similarity metric was applied to the speech productions of the talkers and the listeners, and significant correlations were obtained between accent similarity and sentence intelligibility for pairs of individuals. Overall, the results suggest that L2 experience affects talker-listener accent interactions, altering both the intelligibility of different accents and the selectivity of accent processing. 相似文献

4.

边界强度对焦点实现方式的影响 总被引：1，自引：0，他引：1

下载免费PDF全文

刘璐王蓓《声学学报》2020,45(3):289-298

汉语普通话中,单焦点主要表现为焦点词音高上升和焦点后音高压缩(Post-Focus-Compression,PFC),而双焦点句中第一个焦点后音高压缩有限。韵律边界强度是否影响焦点的实现方式,特别是焦点后音高压缩?本实验借助句法上词、短语、分句和句子的分类,在句中关键词(X)后设定了4种韵律边界强度。通过问句引导的4种焦点条件分别为:关键词X为焦点,句末词Y为焦点,词X和Y都是焦点(双焦点),以及中性焦点。语音分析结果显示:(1)焦点词都表现出音高上升和时长延长,增加量在单焦点和双焦点间没有显著差异,且不受焦点词后边界强度的影响;(2)双焦点句中第一个焦点后的音高压缩会被中等强度的边界减弱,而只有非常强的边界才会减弱单焦后的音高压缩;(3)随韵律边界强度增加,边界前的词时长增加,但延长量是有上限的,且不受焦点位置的影响。总体来说,韵律边界和焦点在语调上是平行编码的。相似文献

5.

A text-to-speech system with high intelligibility and naturalness for Chinese 总被引：1，自引：0，他引：1

CHU Min LU Shinan 《声学学报：英文版》1996,(1)

I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet… 相似文献

6.

Advantage of bimodal fitting in prosody perception for children using a cochlear implant and a hearing aid

Straatman LV Rietveld AC Beijen J Mylanus EA Mens LH 《The Journal of the Acoustical Society of America》2010,128(4):1884-1895

Cochlear implants are largely unable to encode voice pitch information, which hampers the perception of some prosodic cues, such as intonation. This study investigated whether children with a cochlear implant in one ear were better able to detect differences in intonation when a hearing aid was added in the other ear ("bimodal fitting"). Fourteen children with normal hearing and 19 children with bimodal fitting participated in two experiments. The first experiment assessed the just noticeable difference in F0, by presenting listeners with a naturally produced bisyllabic utterance with an artificially manipulated pitch accent. The second experiment assessed the ability to distinguish between questions and affirmations in Dutch words, again by using artificial manipulation of F0. For the implanted group, performance significantly improved in each experiment when the hearing aid was added. However, even with a hearing aid, the implanted group required exaggerated F0 excursions to perceive a pitch accent and to identify a question. These exaggerated excursions are close to the maximum excursions typically used by Dutch speakers. Nevertheless, the results of this study showed that compared to the implant only condition, bimodal fitting improved the perception of intonation. 相似文献

7.

汉语和英语音高重音自动标注方法的对比与分析

下载免费PDF全文

倪崇嘉刘文举徐波《声学学报》2012,37(5):553-560

虽然汉语和英语的重音自动标注被广泛的研究,但是关于汉语和英语的重音自动标注之间对比的研究还鲜有报道。基于汉语韵律标注库ASCCD和英语韵律标注库Boston University Radio News Corpus,对汉语和英语的重音自动标注的异同进行对比,考察不同的特征在不同语言的语料库上的泛化性能。通过基于集成分类回归树的重音自动标注实验、特征分析及基于互信息的重音自动标注的声学对比,得到如下结论:在相同的条件下,汉语重音自动标注的正确率比英语重音自动标注的正确率要低;在重音自动标注中,词典语法相关特征比声学相关的特征更重要;不同的声学信息源在重音自动标注中所起的作用不同,时长相关的特征对汉语和英语重音自动标注都很重要;英语中大部分特征提供的互信息要比汉语相应的特征提供的互信息要高。相似文献

8.

维吾尔语焦点的韵律实现及感知 总被引：1，自引：0，他引：1

下载免费PDF全文

王蓓吐尔逊·卡得许毅《声学学报》2013,38(1):92-98

通过严格控制的语音实验,研究了维吾尔语陈述句中焦点对音高和时长的调节作用。实验设计了两个目标句,请发音人根据上下文自然地强调句中相应的词,随后还考察了焦点的感知问题。结果表明:(1)以句末焦点为基线,维吾尔语焦点的韵律编码方式类似于北京话和英语中的"三区段"调节模式,表现为焦点词音高升高、音域扩大和焦点后音高骤降(音域变窄),而焦点前音高变化不大;(2)焦点词和焦点前的词时长都有延长,而焦点后的词没有明显变化;(3)对焦点感知的正确率平均可达90%左右,表明焦点的韵律编码方式是有效的感知线索;(4)感知实验及语调分析还显示,维吾尔语"中性焦点"语调特征与英语和汉语不同,它接近句首焦点而不是句末焦点。另外,论文特别讨论了"焦点后音高骤降"在中国语言中的分布及来源问题。相似文献

9.

不同语调条件下的声调音高实现

下载免费PDF全文

王韫佳丁多永东孝拓《声学学报》2015,40(6):902-913

从调类个性、句中位置和重音级别3个层面的语音分析,考察普通话4个声调在不同语调条件下的音高实现。目标词被置于3种不同的焦点位置(即句重音最强的位置)和两种不同的非焦点位置(即非句重音位置)上,对目标词的调域以及目标声调的高音点和低音点进行了观察分析。实验结果表明,(1)在焦点条件以及非焦点条件下,阳平的音高位于调域的中低音区,去声低音点的理论调值尽管低于阳平低音点,但去声低音点在音高实现上往往接近阳平低音点甚至会高于阳平低音点;(2)焦点在句首位置表现为调域向上下两个方向扩展,在句末位置则表现为调域整体上抬,但不同声调的高音点并不都与调域上限同比例变化,不同声调低音点的变化也并不都与调域下限同比例变化;(3)重音后音节的音高对焦点音节的依赖关系受音步组合关系的制约,焦点和焦点后音节若在同一音步内,焦点后音节的音高与焦点音节的音高关系类似轻声音节与其前接非轻声音节的音高关系,焦点和焦点后音节之间如果存在音步边界,焦点后音节的音高表现出一定的独立性。这些结果说明了语句中声调音高实现的复杂性,一个具有较好预测性的汉语普通话语调模型的建立需要包括焦点结构、韵律结构、协同发音、调类个性等不同层面信息的诸多细节化规则。相似文献

10.

普通话中语段重音对小句声学特征的调节

陈玉东吕士楠杨玉芳《声学学报》2009,34(4):378-384

对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。相似文献

11.

Vowel normalization for accent: an investigation of best exemplar locations in northern and southern British English sentences

Evans BG Iverson P 《The Journal of the Acoustical Society of America》2004,115(1):352-361

Two experiments investigated whether listeners change their vowel categorization decisions to adjust to different accents of British English. Listeners from different regions of England gave goodness ratings on synthesized vowels embedded in natural carrier sentences that were spoken with either a northern or southern English accent. A computer minimization algorithm adjusted F1, F2, F3, and duration on successive trials according to listeners' goodness ratings, until the best exemplar of each vowel was found. The results demonstrated that most listeners adjusted their vowel categorization decisions based on the accent of the carrier sentence. The patterns of perceptual normalization were affected by individual differences in language background (e.g., whether the individuals grew up in the north or south of England), and were linked to the changes in production that speakers typically make due to sociolinguistic factors when living in multidialectal environments. 相似文献

12.

The effect of accentuation and word duration on the naturalness of speech. 总被引：1，自引：0，他引：1

W Eefting 《The Journal of the Acoustical Society of America》1992,91(1):411-420

In this study the effect of appropriate word duration and correct (pitch) accentuation on the naturalness of speech was investigated. In the stimulus material, the information value of the target word determined the correctness of accentuation ([new, +accent] and [old, -accent] were defined as correct). Appropriate word duration was defined as either "in agreement with accentuation" ([long, +accent] and [short, -accent]) or "in agreement with information value" ([long, new] and [short, old]). Listeners were asked to give naturalness judgments along a scale from 1 (very unnatural) to 10 (very natural) on fragments consisting of two sentences. Duration and accentuation of the target word, which always occurred in the second sentence, were manipulated separately and in combinations. Judgments show that accentuation that is not in agreement with information value causes a significant decrease of naturalness. When accentuation is in agreement with information value but duration is inappropriate for both factors, the perceived naturalness decreases significantly. However, listeners were unable to give consistent naturalness judgments on the manipulated word durations in fragments with incorrect accent distributions. Based on these results and the findings of an earlier production study [W. Eefting, J. Acoust. Soc. Am. 89, 412-424 (1991)], which showed that duration is not involved in the realization of pitch accent, the following is suggested. Speakers adapt both accentuation and word duration in order to indicate that a word contains relevant information. Presence of an accent distinguishes the word from its (less relevant) environment. A longer duration provides the listener with the extra time that is needed in order to process the word's content adequately. 相似文献

13.

Accents, focus distribution, and the perceived distribution of given and new information: An experiment

S G Nooteboom J G Kruyt 《The Journal of the Acoustical Society of America》1987,82(5):1512-1524

This article reports on an experiment examining some perceptual consequences of correspondences between accent patterns, the distribution of plus and minus focus, and the distribution of new and given information in Dutch spoken sentences. "Accent patterns" refer here to the distribution of intonational accents over spoken sentences. Each accent marks a sentence constituent as plus focus, i.e., as highlighted by the speaker. Constituents not so marked are called minus focus. The main questions examined here are to what extent are plus focus constituents generally perceived as conveying new information, and minus focus constituents as conveying earlier introduced or given information. The linguistic material for the experiment was formed by brief radio news items, each two sentences long. Leading sentences determined the distribution of new and given information in target sentences. The accent patterns and, hence, the possible focus distributions in the target utterances were varied systematically by manipulating their synthetic pitch contours according to the rules for Dutch intonation. Subjects were asked to rate on a scale from 1-10 the acceptability of each possible combination of a leading with a target utterance. Results showed that the most preferred or acceptable distributions of new and given information closely match the distributions of plus and minus focus. It was also found that new information can hardly ever acceptably be associated with minus focus, but given information can rather often, although not always, acceptably be associated with plus focus. This appears to be limited to certain conditions, defined by a combination of syntactic and focus structure of the sentence. In these conditions, plus focus cannot be perceived only as signaling new information, but also as highlighting thematic relations with the context. These results are related to work on text-to-speech systems. 相似文献

14.

Acoustic-phonetic characteristics of speech produced in noise and while wearing an oxygen mask

Z S Bond T J Moore B Gable 《The Journal of the Acoustical Society of America》1989,85(2):907-912

The present study investigated changes in the prosodic and acoustic-phonetic features of isolated words by four male talkers speaking in quite and in pink noise at a level of 95 dB SPL. Speech samples were collected both with and without an oxygen mask. Changes in duration, fundamental frequency, total energy, and formant center frequency were analyzed. In addition to the expected changes of increased pitch and amplitude associated with speaking in noise without an oxygen mask, significant effects were found (particularly in the formant center frequencies) as a result of using the oxygen mask. When the oxygen mask was employed, no further significant changes were caused by adding noise to the speaking situation. 相似文献

15.

Pitch estimation by early-deafened subjects using a multiple-electrode cochlear implant

Busby PA Clark GM 《The Journal of the Acoustical Society of America》2000,107(1):547-558

Numerical estimates of pitch for stimulation of electrodes along the 22-electrode array of the Cochlear Limited cochlear implant were obtained from 18 subjects who became deaf very early in life. Examined were the relationships between subject differences in pitch estimation, subject variables related to auditory deprivation and experience, and speech-perception scores for closed-set monosyllabic words and open-set Bamford-Kowal-Bench (BKB) sentences. Reliability in the estimation procedure was examined by comparing subject performance in pitch estimation with that for loudness estimation for current levels between hearing threshold and comfortable listening level. For 56% of subjects, a tonotopic order of pitch percepts for electrodes on the array was found. A deviant but reliable order of pitch percepts was found for 22% of subjects, and essentially no pitch order was found for the remaining 22% of subjects. Subject differences in pitch estimation were significantly related to the duration of auditory deprivation prior to implantation, with the poorest performance for subjects who had a longer duration of deafness and a later age at implantation. Subjects with no tonotopic order of pitch percepts had the lowest scores for the BKB sentence test, but there were no differences across subjects for monosyllabic words. Performance in pitch estimation for electrodes did not appear to be related to performance in the estimation procedure, as all subjects were successful in loudness estimation for current level. 相似文献

16.

Focus in production: tonal shape, intensity and word order

Vainio M Järvikivi J 《The Journal of the Acoustical Society of America》2007,121(2):EL55-EL61

The effect of word order and prosodic focus on the tonal shape and intensity in the production of prosody was studied. The results show that the production of focus in Finnish follows a global pattern with regard to tonal features. The relative pitch height difference between contrasted words is the most important pitch-related factor in signaling narrow prosodic focus. Narrow focus is not localized to prosodically emphasized words only but relates to the utterance as a whole. It was also found that syntactic structure with respect to both intensity and tonal structure modulated relative prosodic prominence of individual words. 相似文献

17.

Effects of reverberation on spatial, prosodic, and vocal-tract size cues to selective attention

Darwin CJ Hukin RW 《The Journal of the Acoustical Society of America》2000,108(1):335-342

Three experiments explored the resistance to simulated reverberation of various cues for selective attention. Listeners decided which of two simultaneous target words belonged to an attended rather than to a simultaneous unattended sentence. Attended and unattended sentences were spatially separated using interaural time differences (ITDs) of 0, +/-45, +/-91 or +/-181 micros. Experiment 1 used sentences resynthesized on a monotone, with sentence pairs having F0 differences of 0, 1, 2, or 4 semitones. Listeners' weak preference for the target word with the same monotonous F0 as the attended sentence was eliminated by reverberation. Experiment 1 also showed that listeners' ability to use ITD differences was seriously impaired by reverberation although some ability remained for the longest ITD tested. In experiment 2 the sentences were spoken with natural prosody, with sentence stress in different places in the attended and unattended sentences. The overall F0 of each sentence was shifted by a constant amount on a log scale to bring the F0 trajectories of the target words either closer together or further apart. These prosodic manipulations were generally more resistant to reverberation than were the ITD differences. In experiment 3, adding a large difference in vocal-tract size (+/- 15%) to the prosodic cues produced a high level of performance which was very resistant to reverberation. The experiments show that the natural prosody and vocal-tract size differences between talkers that were used retain their efficacy in helping selective attention under conditions of reverberation better than do interaural time differences. 相似文献

18.

The role of cognitive cueing in eliciting vocal variability

Moya Andrews Rahul Shrivastav Hiroya Yamaguchi 《Journal of voice》2000,14(4):494-501

Variation in duration and frequency during three readings of each of eight sentences by 9 normal and 4 voice-disordered subjects are compared. Instructions to the subjects varied with respect to the amount and type of cognitive cueing presented in the trials, and the sentences were read in random order. Variability in fundamental frequency (F₀) was greater when pitch variation was specifically cued. Also, the portion of the sentence that was cued had greater variability in F₀ than other parts of the sentence. Variation in fundamental frequency was significantly greater in the cued versus uncued sentence trials for the voice-disordered subjects but not for the normal subjects. However, all subjects exhibited significantly greater duration for cued versus uncued readings of the same sentences. Implications for theory and practice are discussed. 相似文献

19.

汉语口语与朗读话语陈述句音高比较

下载免费PDF全文

王茂林李金穗林茂灿熊子瑜《声学学报》2012,37(4):457-464

汉语语句通常存在音高下倾现象,然而关于语句内部韵律词的具体音高表现目前的研究尚较欠缺。本研究使用的对话语料选自973电话语料库,包括69段对话,涉及79位说话人;朗读话语语料为广播电台两位主持人的新闻播音,长度为221个语句,对语句内部韵律词的高音点、低音点及音域进行了分析,结果显示对话与朗读话语多数语句的音高呈前高后低的走势,不过口语对话较长语句前半段的音高下降趋势不太明显。与朗读话语相比,口语对话韵律词的音域通常比较小。对话语句最后一个韵律词的音域相对较大,而朗读话语内部韵律词的音域大多没有差异。本研究的结果,将有助于语音合成中语句内部韵律词音阶及音域的构拟。相似文献

20.

Prosodic strengthening and featural enhancement: evidence from acoustic and articulatory realizations of /a,i/ in English

Cho T 《The Journal of the Acoustical Society of America》2005,117(6):3867-3878

In this study the effects of accent and prosodic boundaries on the production of English vowels (/a,i/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /i/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /i/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts. 相似文献