首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 462 毫秒
1.
This study explores the effects of prosodic boundaries on nasality at intonational phrase, word, and syllable boundaries. The subjects were recorded saying phrases that contained a syllable-final nasal consonant followed by a syllable-initial stop. The timing, duration, and magnitude of the nasal airflows measured were used to determine the extent of nasality across boundaries. Nasal amplitudes were found to vary in a speaker-dependent manner among boundary types. However, the patterns of nasal contours and temporal aspects of the airflow parameters consistently varied with boundary type across all the speakers. In general, the duration of nasal airflow and nasal plateau were the longest at the intonational phrase boundary, followed by word boundary and then syllable boundary. In addition to the hierarchical influence of boundary strength, there were unique phonetic markings associated with individual boundaries. In particular, two nasal rises interrupted by nasal inhalation occurred only across an intonation phrase boundary. Also, unexpectedly, a word boundary was marked by the longest postboundary vowel, whereas a syllable boundary was marked with the shortest nasal duration. The results here support the hierarchical effect of boundary on both domain-edge strengthening and cross-boundary coarticulation.  相似文献   

2.
张璐  祖漪清  闫润强 《声学学报》2012,37(4):448-456
研究了语调短语边界处焦点、词重音位置与上升的边界调对语调短语末词基频模式的影响。通过分析两个美式英语语料库语调短语末词的声学特征,我们发现当该单词是焦点时,重音的基频峰值比边界调的尾值高;边界调在重音实现后才充分体现出来;词重音在音节结构中后移会压缩词重音后基频调域范围。当语调短语末词不是焦点时,边界调的上升趋势从开始就体现出来,并压制了词重音的基频凸显。我们的结论是,焦点可以通过提升词重音基频峰值的高度完成;焦点和边界调实现的力度受词重音所处位置限制,在极端的情况下,边界调只能在语调短语最末音节的尾部实施。在有限音段上这些韵律特征都有表达其功能最彻底的一段位置,它们竞相展现,此消彼长。   相似文献   

3.
This paper formalizes and tests two key assumptions of the concept of suprasegmental timing: segmental independence and suprasegmental mediation. Segmental independence holds that the duration of a suprasegmental unit such as a syllable or foot is only minimally dependent on its segments. Suprasegmental mediation states that the duration of a segment is determined by the duration of its suprasegmental unit and its identity, but not directly by the specific prosodic context responsible for suprasegmental unit duration. Both assumptions are made by various versions of the isochrony hypothesis [I. Lehiste, J. Phonetics 5, 253-263 (1977)], and by the syllable timing hypothesis [W. Campbell, Speech Commun. 9, 57-62 (1990)]. The validity of these assumptions was studied using the syllable as suprasegmental unit in American English and Mandarin Chinese. To avoid unnatural timing patterns that might be induced when reading carrier phrase material, meaningful, nonrepetitive sentences were used with a wide range of lengths. Segmental independence was tested by measuring how the average duration of a syllable in a fixed prosodic context depends on its segmental composition. A strong association was found; in many cases the increase in average syllabic duration when one segment was substituted for another (e.g., bin versus pin) was the same as the difference in average duration between the two segments (i.e., [b] versus [p]). Thus, the [i] and [n] were not compressed to make room for the longer [p], which is inconsistent with segmental independence. Syllabic mediation was tested by measuring which locations in a syllable are most strongly affected by various contextual factors, including phrasal position, within-word position, tone, and lexical stress. Systematic differences were found between these factors in terms of the intrasyllabic locus of maximal effect. These and earlier results obtained by van Son and van Santen [R. J. J. H van Son and J. P. H. van Santen, "Modeling the interaction between factors affecting consonant duration," Proceedings Eurospeech-97, 1997, pp. 319-322] showing a three-way interaction between consonantal identity (coronals vs labials), within-word position of the syllable, and stress of surrounding vowels, imply that segmental duration cannot be predicted by compressing or elongating segments to fit into a predetermined syllabic time interval. In conclusion, while there is little doubt that suprasegmental units play important predictive and explanatory roles as phonological units, the concept of suprasegmental timing is less promising.  相似文献   

4.
Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that represents long-span contextual dependency in speech acoustics is reported. In this model, high-level linguistic constraints are incorporated in automatic construction of the patterns of feature-overlapping and of the hidden Markov model (HMM) states induced by such patterns. The main linguistic information explored includes word and phrase boundaries, morpheme, syllable, syllable constituent categories, and word stress. A consistent computational framework developed for the construction of the feature-based model and the major components of the model are described. Experimental results on the use of the overlapping-feature model in an HMM-based system for speech recognition show improvements over the conventional triphone-based phonological model.  相似文献   

5.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

6.
In a recently proposed approach to isolated-word recognition, word reference templates are constructed from a universal set of demisyllable units by concatenating the appropriate demisyllables for each vocabulary item. A dynamic time warping (DTW) algorithm is used to align test and reference patterns optimally. Nevertheless some sort of syllable duration preadjustment is necessary because of the large potential difference in duration between isolated and in-context syllables. We have found that a simple rule that reduces the length of rhyme (final) demisyllables in nonword-final stressed syllables to approximately half their isolated-syllable duration provides recognition accuracy as high as that attained through use of complex, highly context-sensitive rules. In addition to its practical application, this result can be regarded as a further demonstration of the power of DTW. We have also investigated the requirements for parameter smoothing at demisyllable boundaries. We find that an optimal window duration for smoothing is about 60-90 ms, but that failure to smooth reduces recognition accuracy only about 2% in an 1109 word test set; that linear and parabolic smoothing are equally effective; and that it does not appear that recognition accuracy can be improved by smoothing in certain phonetic contexts only. Taken together, these results can be viewed as confirming the suitability of the demisyllable as the basic unit in recognition systems.  相似文献   

7.
Measurements of the temporal characteristics of word-initial stressed syllables in CV CV-type words in Modern Greek showed that the timing of the initial consonant in terms of its closure duration and voice onset time (VOT) is dependent on place and manner of articulation. This is contrary to recent accounts of word-initial voiceless consonants in English which propose that closure and VOT together comprise a voiceless interval independent of place and manner of articulation. The results also contribute to the development of a timing model for Modern Greek which generates closure, VOT, and vowel durations for word-initial, stressed CV syllables. The model is made up of a series of rules operating in an ordered fashion on a given word duration to derive first a stressed syllable duration and then all intrasyllabic acoustic intervals.  相似文献   

8.
Ten American English vowels were sung in a /b/-vowel-/d/ consonantal context by a professional countertenor in full voice (at F0 = 130, 165, 220, 260, and 330 Hz) and in head voice (at F0 = 220, 260, 330, 440, and 520 Hz). Four identification tests were prepared using the entire syllable or the center 200-ms portion of either the full-voice tokens or the head-voice tokens. Listeners attempted to identify each vowel by circling the appropriate word on their answer sheets. Errors were more frequent when the vowels were sung at higher F0. In addition, removal of the consonantal context markedly increased identification errors for both the head-voice and full-voice conditions. Back vowels were misidentified significantly more often than front vowels. For equal F0 values, listeners were significantly more accurate in identifying the head-voice stimuli. Acoustical analysis suggests that the difference of intelligibility between head and full voice may have been due to the head voice having more energy in the first harmonic than the full voice.  相似文献   

9.
The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected.  相似文献   

10.
Acoustic correlates of contrastive stress, i.e., fundamental frequency (F0), duration, and intensity, and listener perceptions of stress, were investigated in a profoundly deaf subject (RS) pre/post single-channel cochlear implant and longitudinally, and compared to the overall patterns of age-peer profoundly deaf (JM) and normally hearing subjects (DL). The stimuli were a group of general American English words in which a change of function from noun to verb is associated with a shift of stress from initial to final syllable, e.g., CON'trast versus conTRAST'. Precochlear implant, RS was unable to produce contrastive stress correctly. Hearing one day post-stimulation resulted in significantly higher F0 for initial and final stressed versus unstressed syllables. Four months post-stimulation, RS maintained significantly higher F0 on stressed syllables, as well as generalization of significantly increased intensity and longer syllable duration differences for all stressed versus unstressed syllables. Perceptually, listeners judged RS's contrastive stress placement as incorrect precochlear implant and as always correct post-cochlear implant. JM's contrastive stress was judged as 96% correct, and DL's contrastive stress placement was 100% correct. It was concluded that RS reacquired all acoustic correlates needed for appropriate differentiation of contrastive stress with longitudinal use of the cochlear implant.  相似文献   

11.

Background

How do listeners manage to recognize words in an unfamiliar language? The physical continuity of the signal, in which real silent pauses between words are lacking, makes it a difficult task. However, there are multiple cues that can be exploited to localize word boundaries and to segment the acoustic signal. In the present study, word-stress was manipulated with statistical information and placed in different syllables within trisyllabic nonsense words to explore the result of the combination of the cues in an online word segmentation task.

Results

The behavioral results showed that words were segmented better when stress was placed on the final syllables than when it was placed on the middle or first syllable. The electrophysiological results showed an increase in the amplitude of the P2 component, which seemed to be sensitive to word-stress and its location within words.

Conclusion

The results demonstrated that listeners can integrate specific prosodic and distributional cues when segmenting speech. An ERP component related to word-stress cues was identified: stressed syllables elicited larger amplitudes in the P2 component than unstressed ones.
  相似文献   

12.
Can native listeners rapidly adapt to suprasegmental mispronunciations in foreign-accented speech? To address this question, an exposure-test paradigm was used to test whether Dutch listeners can improve their understanding of non-canonical lexical stress in Hungarian-accented Dutch. During exposure, one group of listeners heard a Dutch story with only initially stressed words, whereas another group also heard 28 words with canonical second-syllable stress (e.g., EEKhorn, "squirrel" was replaced by koNIJN "rabbit"; capitals indicate stress). The 28 words, however, were non-canonically marked by the Hungarian speaker with high pitch and amplitude on the initial syllable, both of which are stress cues in Dutch. After exposure, listeners' eye movements were tracked to Dutch target-competitor pairs with segmental overlap but different stress patterns, while they listened to new words from the same Hungarian speaker (e.g., HERsens, herSTEL, "brain," "recovery"). Listeners who had previously heard non-canonically produced words distinguished target-competitor pairs better than listeners who had only been exposed to Hungarian accent with canonical forms of lexical stress. Even a short exposure thus allows listeners to tune into speaker-specific realizations of words' suprasegmental make-up, and use this information for word recognition.  相似文献   

13.
边界强度对焦点实现方式的影响   总被引:1,自引:0,他引:1       下载免费PDF全文
刘璐  王蓓 《声学学报》2020,45(3):289-298
汉语普通话中,单焦点主要表现为焦点词音高上升和焦点后音高压缩(Post-Focus-Compression,PFC),而双焦点句中第一个焦点后音高压缩有限。韵律边界强度是否影响焦点的实现方式,特别是焦点后音高压缩?本实验借助句法上词、短语、分句和句子的分类,在句中关键词(X)后设定了4种韵律边界强度。通过问句引导的4种焦点条件分别为:关键词X为焦点,句末词Y为焦点,词X和Y都是焦点(双焦点),以及中性焦点。语音分析结果显示:(1)焦点词都表现出音高上升和时长延长,增加量在单焦点和双焦点间没有显著差异,且不受焦点词后边界强度的影响;(2)双焦点句中第一个焦点后的音高压缩会被中等强度的边界减弱,而只有非常强的边界才会减弱单焦后的音高压缩;(3)随韵律边界强度增加,边界前的词时长增加,但延长量是有上限的,且不受焦点位置的影响。总体来说,韵律边界和焦点在语调上是平行编码的。   相似文献   

14.
This paper investigates the mechanisms controlling the phonemic quantity contrast and speech rate in nonsense p(1)Np(2)a words read by five Slovak speakers in normal and fast speech rate. N represents a syllable nucleus, which in Slovak corresponds to long and short vowels and liquid consonants. The movements of the lips and the tongue were recorded with an electromagnetometry system. Together with the acoustic durations of p(1), N, and p(2), gestural characteristics of three core movements were extracted: p(1) lip opening, tongue movement for (N)ucleus, and p(2) lip closing. The results show that, although consonantal and vocalic nuclei are predictably different on many kinematic measures, their common phonological behavior as syllabic nuclei may be linked to a stable temporal coordination of the consonantal gestures flanking the nucleus. The functional contrast between phonemic duration and speech rate was reflected in the bias in the control mechanisms they employed: the strategies robustly used for signaling phonemic duration, such as the degree of coproduction of the two lip movements, showed a minimal effect of speech rate, while measures greatly affected by speech rate, such as p(2) acoustic duration, or the degree of p(1)-N gestural coproduction, tended to be minimally influenced by phonemic quantity.  相似文献   

15.
It has been suggested that pauses between words could act as indices of processes such as selection, retrieval or planning that are required before an utterance is articulated. For normal meaningful phrase utterances, there is hardly any information regarding the relationship between articulation and pause duration and their subsequent relation to the final phrase duration. Such associations could provide insights into the mechanisms underlying the planning and execution of a vocal utterance. To execute a fluent vocal utterance, children might adopt different strategies in development. We investigate this hypothesis by examining the roles of articulation time and pause duration in meaningful phrase utterances in 46 children between the ages of 4 and 8 years, learning English as a second language.Our results indicate a significant reduction in phrase, word and interword pause duration with increasing age. A comparison of pause, word and phrase duration for individual subjects belonging to different age groups indicates a changing relationship between pause and word duration for the production of fluent speech. For the youngest children, a strong correlation between pause and word duration indicates local planning at word level for speech production and thus greater dependence of pause on immediate word utterance. In contrast for the oldest children we find a significant drop in correlation between word and pause indicating the emergence of articulation and pause planning as two independent processes directed at producing a fluent utterance. Strong correlations between other temporal parameters indicate a more holistic approach being adopted by the older children for language production.  相似文献   

16.
Quadrisyllabic words and phrases with normal stress of Mandarinwere used to study the tonal coarticuation.It was firstly found that the F_0perturbation at the starting—point and the ending—point of the F_0 curve ineach syllable caused by tonal coarticulation is larger than the intrinsic F_0 dif-ference of vowels at the starting—point and the ending—point of it.As for thetonal coarticulation,it was discovered that tonal coarticulation in word andphrase with normal stress is different to that in the nonsense sequence with evenstress,and in word and phrase with normal stress,the tonal coarticulatory ef-fects are unidirectional,and the carryover effect does not extend to theending—point of tone—section of the following syllable and the anticipatory ef-fect does not extend to the starting-point of tone-section of the preceding one,and the F_0 perturbation by tonal coarticulation has its pattern.  相似文献   

17.
18.
The powerful techniques of covariance structure modeling (CSM) long have been used to study complex behavioral phenomenon in the social and behavioral sciences. This study employed these same techniques to examine simultaneous effects on vowel duration in American English. Additionally, this study investigated whether a single population model of vowel duration fits observed data better than a dual population model where separate parameters are generated for syllables that carry large information loads and for syllables that specify linguistic relationships. For the single population model, intrinsic duration, phrase final position, lexical stress, post-vocalic consonant voicing, and position in word all were significant predictors of vowel duration. However, the dual population model, in which separate model parameters were generated for (1) monosyllabic content words and lexically stressed syllables and (2) monosyllabic function words and lexically unstressed syllables, fit the data better than the single population model. Intrinsic duration and phrase final position affected duration similarly for both the populations. On the other hand, the effects of post-vocalic consonant voicing and position in word, while significant predictors of vowel duration in content words and stressed syllables, were not significant predictors of vowel duration in function words or unstressed syllables. These results are not unexpected, based on previous research, and suggest that covariance structure analysis can be used as a complementary technique in linguistic and phonetic research.  相似文献   

19.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

20.
Multichannel cochlear implant users vary greatly in their word-recognition abilities. This study examined whether their word recognition was related to the use of either highly dynamic or relatively steady-state vowel cues contained in /bVb/ and /wVb/ syllables. Nine conditions were created containing different combinations of formant transition, steady-state, and duration cues. Because processor strategies differ, the ability to perceive static and dynamic information may depend on the type of cochlear implant used. Ten Nucleus and ten Ineraid subjects participated, along with 12 normal-hearing control subjects. Vowel identification did not differ between implanted groups, but both were significantly poorer at identifying vowels than the normal-hearing group. Vowel identification was best when at least two kinds of cues were available. Using only one type of cue, performance was better with excised vowels containing steady-state formants than in "vowelless" syllables, where the center vocalic portion was deleted and transitions were joined. In the latter syllable type, Nucleus subjects identified vowels significantly better when /b/ was the initial consonant; the other two groups were not affected by specific consonantal context. Cochlear implant subjects' word-recognition was positively correlated with the use of dynamic vowel cues, but not with steady-state cues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号