首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 17 毫秒
1.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

2.
3.
Post-low bouncing is a phenomenon whereby after reaching a very low pitch in a low lexical tone, F(0) bounces up and then gradually drops back in the following syllables. This paper reports the results of an acoustic analysis of the phenomenon in two Mandarin Chinese corpora and presents a simple mechanical model that can effectively simulate this bouncing effect. The acoustic analysis shows that most of the F(0) dynamic features profiling the bouncing effect strongly correlate with the amount of F(0) lowering in the preceding low-tone syllable, and that the additional F(0) raising commences at the onset of the first post-low syllable. Using the quantitative Target Approximation model, this bouncing effect was simulated by adding an acceleration adjustment to the initial F(0) state of the first post-low syllable. A highly linear relation between F(0) lowering and estimated acceleration adjustment was found. This relation was then used to effectively simulate the bouncing effect in both the neutral tone and the full tones. The results of the analysis and simulation are consistent with the hypothesis that the bouncing effect is due to a temporary perturbation of the balance between antagonistic forces in the laryngeal control in producing a very low pitch.  相似文献   

4.
Acoustic correlates of contrastive stress, i.e., fundamental frequency (F0), duration, and intensity, and listener perceptions of stress, were investigated in a profoundly deaf subject (RS) pre/post single-channel cochlear implant and longitudinally, and compared to the overall patterns of age-peer profoundly deaf (JM) and normally hearing subjects (DL). The stimuli were a group of general American English words in which a change of function from noun to verb is associated with a shift of stress from initial to final syllable, e.g., CON'trast versus conTRAST'. Precochlear implant, RS was unable to produce contrastive stress correctly. Hearing one day post-stimulation resulted in significantly higher F0 for initial and final stressed versus unstressed syllables. Four months post-stimulation, RS maintained significantly higher F0 on stressed syllables, as well as generalization of significantly increased intensity and longer syllable duration differences for all stressed versus unstressed syllables. Perceptually, listeners judged RS's contrastive stress placement as incorrect precochlear implant and as always correct post-cochlear implant. JM's contrastive stress was judged as 96% correct, and DL's contrastive stress placement was 100% correct. It was concluded that RS reacquired all acoustic correlates needed for appropriate differentiation of contrastive stress with longitudinal use of the cochlear implant.  相似文献   

5.
Most investigators agree that the acoustic information for American English vowels includes dynamic (time-varying) parameters as well as static "target" information contained in a single cross section of the syllable. Using the silent-center (SC) paradigm, the present experiment examined the case in which the initial and final portions of stop consonant-vowel-stop consonant (CVC) syllables containing the same vowel but different consonants were recombined into mixed-consonant SC syllables and presented to listeners for vowel identification. Ten vowels were spoken in six different syllables, /b Vb, bVd, bVt, dVb, dVd, dVt/, embedded in a carrier sentence. Initial and final transitional portions of these syllables were cross-matched in: (1) silent-center syllables with original syllable durations (silences) preserved (mixed-consonant SC condition) and (2) mixed-consonant SC syllables with syllable duration equated across the ten vowels (fixed duration mixed-consonant SC condition). Vowel-identification accuracy in these two mixed consonant SC conditions was compared with performance on the original SC and fixed duration SC stimuli, and in initial and final control conditions in which initial and final transitional portions were each presented alone. Vowels were identified highly accurately in both mixed-consonant SC and original syllable SC conditions (only 7%-8% overall errors). Neutralizing duration information led to small, but significant, increases in identification errors in both mixed-consonant and original fixed-duration SC conditions (14%-15% errors), but performance was still much more accurate than for initial and finals control conditions (35% and 52% errors, respectively). Acoustical analysis confirmed that direction and extent of formant change from initial to final portions of mixed-consonant stimuli differed from that of original syllables, arguing against a target + offglide explanation of the perceptual results. Results do support the hypothesis that temporal trajectories specifying "style of movement" provide information for the differentiation of American English tense and lax vowels, and that this information is invariant over the place of articulation and voicing of the surrounding stop consonants.  相似文献   

6.
By analyzing the acoustic data of a Chinese news report,the present research explores the pattern of how to change syllable duration and pitch of stress when isolated clauses are connected into a discourse.Comparing the same clause between isolated and in discourse context,the pitch variation of the clause nucleus can be most manifests,i.e.the top points as a whole fall remarkably;furthermore the degree of pitch falling varies with different kinds of stresses.When clause stresses are not assigned the status of discourse stress,they show a weakening effect of stress;it means pitch falling and syllable duration shortening.In a discourse composed of several clauses,speakers can modulate clause prosody by varying the strength of stresses;thereby realize the overall control of the discourse prosody and exact semantic expression.The findings on phonetic material from broadcast will shed light on the teaching of news broadcasting and contribute to the prosodic control of Chinese Putonghua synthesis.  相似文献   

7.
Measurements of the temporal characteristics of word-initial stressed syllables in CV CV-type words in Modern Greek showed that the timing of the initial consonant in terms of its closure duration and voice onset time (VOT) is dependent on place and manner of articulation. This is contrary to recent accounts of word-initial voiceless consonants in English which propose that closure and VOT together comprise a voiceless interval independent of place and manner of articulation. The results also contribute to the development of a timing model for Modern Greek which generates closure, VOT, and vowel durations for word-initial, stressed CV syllables. The model is made up of a series of rules operating in an ordered fashion on a given word duration to derive first a stressed syllable duration and then all intrasyllabic acoustic intervals.  相似文献   

8.
The differences of the pitch and duration of Chinese syllables between Putonghua (PTH) and Taiwan Mandarin (TM) were studied. The speech materials to be used are not only isolated syllables, but also sentences. The results reveal that: For the isolated syllables, T1 and T2 in TM are influenced by Minnan dialect, therefore their pitch are lower than those in PTH. T3 is fall-rise in PTH, while it is fall in TM. Moreover, the syllable duration sequence for different tone is T3〉T2〉T1〉T4 in PTH, while it is T1〉T2〉T3〉T4 in TM. For the syllables in sentences, T2 is mid-rise in PTH, while it is mid-level in TM. And the T3 is longer than T4 but shorter than T1 or T2 in PTH, while it is the shortest in TM. Furthermore the effects of prosodic phrase boundary on duration for different tones are almost the same in PTH, but the lengthening part of T1 or T2 is longer than that of T3 or T4 in TM.  相似文献   

9.
The amount of acoustic information that native and non-native listeners need for syllable identification was investigated by comparing the performance of monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Duration-preserved silent-center syllables retaining 10, 20, 30, or 40 ms of the consonant-vowel and vowel-consonant transitions were created for the target vowels /i, I, eI, epsilon, ae/ and /a/, spoken by two males in /bVb/ context. Duration-neutral syllables were created by editing the silent portion to equate the duration of all vowels. Listeners identified the syllables in a six-alternative forced-choice task. The earlier learners identified the whole-word and 40 ms duration-preserved syllables as accurately as the monolingual listeners, but identified the silent-center syllables significantly less accurately overall. Only the monolingual listener group identified syllables significantly more accurately in the duration-preserved than in the duration-neutral condition, suggesting that the non-native listeners were unable to recover from the syllable disruption sufficiently to access the duration cues in the silent-center syllables. This effect was most pronounced for the later learners, who also showed the most vowel confusions and the greatest decrease in performance from the whole word to the 40 ms transition condition.  相似文献   

10.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

11.
The perceptive multi-dimension structure of Chinese syllables is studied by psychological-physical experiment. The results indicate that FO and duration are interrelated to two main dimensions of the perceptive structure of Chinese syllable. And the prosodic characteristics such as the position of syllable in prosodic hierarchical structure, as well as the stress will be induced the various distribution of syllable in perception space.  相似文献   

12.
Which acoustic properties of the speech signal differ between rhythmically prominent syllables and non-prominent ones? A production experiment was conducted to identify these acoustic properties. Subjects read out repetitive text to a metronome, trying to match stressed syllables to its beat. The analysis searched for the function of the speech signal that best predicts the timing of the metronome ticks. The most important factor in this function is found to be the contrast in loudness between a syllable and its neighbors. The prominence of a syllable can be deduced from the specific loudness in an (approximately) 360 ms window centered on the syllable in question relative to an (approximately) 800-ms-wide symmetric window.  相似文献   

13.
对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。   相似文献   

14.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

15.
Perception of syllable timing by prebabbling infants   总被引:1,自引:0,他引:1  
Adults hear alternating syllables with isochronous syllable onset-onset times as having a long-short, alternating rhythm when the syllables differ in initial consonant. This occurs because adults attend to syllable-internal events, called the "P centers" or "stress beats", rather than to syllable onsets. Thus they report that stress-beat aligned speech is isochronous and stress-beat aligned clicks are synchronized with the speech. The question asked here is whether, like adults, infants attend to the timing of syllable stress beats. In experiment 1, infants showed differences in time to habituate to sequences of alternating monosyllables, [bad] and [strad], having two different onset-onset times (onset- and stress-beat-timed) and two different placements of clicks on the syllables (on syllable onsets and on stress beats). Infants habituated more slowly to sequences with clicks on the stress beats than to sequences with clicks on syllable onsets and most slowly of all to stress-beat-timed speech with clicks on the stress beats. To interpret these findings, a second experiment was run using sequences only of the syllable [strad] so that speech timing measured according to onsets and stress beats was the same. Syllables had isochronous timing or a long-short alternating rhythm, corresponding to two possible ways of hearing the stress-beat-timed speech of experiment 1. In addition, two patterns of click placement were compared, uniform and syncopated, corresponding to two ways of hearing the stress-beat aligned clicks of experiment 1. The patterns of sucking times in the two experiments match exactly if stress-beat aligned speech in experiment 1 is identified with the isochronous speech of experiment 2 and the stress-beat aligned clicks of experiment 1 match with the uniformly timed clicks of experiment 2. It is inferred from this correspondence that infants perceive stress beats and stress-beat timing of syllables as adults do.  相似文献   

16.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

17.
Systems designed to recognize continuous speech must be able to adapt to many types of acoustic variation, including variations in stress. A speaker-dependent recognition study was conducted on a group of stressed and destressed syllables. These syllables, some containing the short vowel /I/ and others the long vowel /ae/, were excised from continuous speech and transformed into arrays of cepstral coefficients at two levels of precision. From these data, four types of template dictionaries varying in size and stress composition were formed by a time-warping procedure. Recognition performance data were gathered from listeners and from a computer recognition algorithm that also employed warping. It was found that for a significant portion of the data base, stressed and destressed versions of the same syllable are sufficiently different from one another as to justify the use of separate dictionary templates. Second, destressed syllables exhibit roughly the same acoustic variance as their stressed counterparts. Third, long vowels tend to be involved in proportionally fewer cross-vowel errors but tend to diminish the warping algorithm's ability to discriminate consonantal information. Finally, the pattern of consonant errors that listeners make as a function of vowel length shows significant differences from that produced by the computer.  相似文献   

18.
A series of experiments was carried out to investigate how fundamental frequency declination is perceived by speakers of English. Using linear predictor coded speech, nonsense sentences were constructed in which fundamental frequency on the last stressed syllable had been systematically varied. Listeners were asked to judge which stressed syllable was higher in pitch. Their judgments were found to reflect normalization for expected declination; in general, when two stressed syllables sounded equal in pitch, the second was actually lower. The pattern of normalization reflected certain major features of production patterns: A greater correction for declination was made for wide pitch range stimuli than for narrow pitch range stimuli. The slope of expected declination was less for longer stimuli than for shorter ones. Lastly, amplitude was found to have a significant effect on judgments, suggesting that the amplitude downdrift which normally accompanies fundamental frequency declination may have an important role in the perception of phrasing.  相似文献   

19.
Dynamic specification of coarticulated vowels   总被引:1,自引:0,他引:1  
An adequate theory of vowel perception must account for perceptual constancy over variations in the acoustic structure of coarticulated vowels contributed by speakers, speaking rate, and consonantal context. We modified recorded consonant-vowel-consonant syllables electronically to investigate the perceptual efficacy of three types of acoustic information for vowel identification: (1) static spectral "targets," (2) duration of syllabic nuclei, and (3) formant transitions into and out of the vowel nucleus. Vowels in /b/-vowel-/b/ syllables spoken by one adult male (experiment 1) and by two females and two males (experiment 2) served as the corpus, and seven modified syllable conditions were generated in which different parts of the digitized waveforms of the syllables were deleted and the temporal relationships of the remaining parts were manipulated. Results of identification tests by untrained listeners indicated that dynamic spectral information, contained in initial and final transitions taken together, was sufficient for accurate identification of vowels even when vowel nuclei were attenuated to silence. Furthermore, the dynamic spectral information appeared to be efficacious even when durational parameters specifying intrinsic vowel length were eliminated.  相似文献   

20.
The articulatory kinematics of final lengthening   总被引:4,自引:0,他引:4  
In order to understand better the phonetic control of final lengthening, the articulation of phrase-final syllables was compared with that of two other contexts known to increase syllable duration: accent and slow tempo. The kinematics of jaw movements in [pap] sequences and of lower lip movements in [pe] sequences for four subjects were interpreted in terms of a task-dynamic model. There was evidence of two different control strategies: decreasing intragestural stiffness to slow down some part of the syllable, and changing intergestural phasing to decrease overlap of the vowel gesture by the consonant. The first was used in slowing down tempo, whereas the second was used to increase the duration of accented syllables over unaccented syllables. Both strategies were implicated in phrase-final lengthening. In accented syllables, final closing gestures generally were longer and slower, but not more displaced. The two slowest subjects, however, used the other strategy in their slow-tempo final syllables. Final lengthening in reduced syllables was more difficult to interpret. The relationship between peak velocity and displacement suggested that a lesser stiffness is obscured by an increased gestural amplitude. Thus, by comparison to lengthening for accent, final lengthening is like a localized change in speaking tempo, although it cannot be equated directly with the specification of stiffness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号