首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 38 毫秒
1.
In tone languages there are potential conflicts in the perception of lexical tone and intonation, as both depend mainly on the differences in fundamental frequency (F0) patterns. The present study investigated the acoustic cues associated with the perception of sentences as questions or statements in Cantonese, as a function of the lexical tone in sentence final position. Cantonese listeners performed intonation identification tasks involving complete sentences, isolated final syllables, and sentences without the final syllable (carriers). Sensitivity (d' scores) were similar for complete sentences and final syllables but were significantly lower for carriers. Sensitivity was also affected by tone identity. These findings show that the perception of questions and statements relies primarily on the F0 characteristics of the final syllables (local F0 cues). A measure of response bias (c) provided evidence for a general bias toward the perception of statements. Logistic regression analyses showed that utterances were accurately classified as questions or statements by using average F0 and F0 interval. Average F0 of carriers (global F0 cue) was also found to be a reliable secondary cue. These findings suggest that the use of F0 cues for the perception of intonation question in tonal languages is likely to be language-specific.  相似文献   

2.
There is a tendency across languages to use a rising pitch contour to convey question intonation and a falling pitch contour to convey a statement. In a lexical tone language such as Mandarin Chinese, rising and falling pitch contours are also used to differentiate lexical meaning. How, then, does the multiplexing of the F(0) channel affect the perception of question and statement intonation in a lexical tone language? This study investigated the effects of lexical tones and focus on the perception of intonation in Mandarin Chinese. The results show that lexical tones and focus impact the perception of sentence intonation. Question intonation was easier for native speakers to identify on a sentence with a final falling tone and more difficult to identify on a sentence with a final rising tone, suggesting that tone identification intervenes in the mapping of F(0) contours to intonational categories and that tone and intonation interact at the phonological level. In contrast, there is no evidence that the interaction between focus and intonation goes beyond the psychoacoustic level. The results provide insights that will be useful for further research on tone and intonation interactions in both acoustic modeling studies and neurobiological studies.  相似文献   

3.
Schematic fundamental frequency curves of simple statements and questions are generated for Hausa, a two-tone language of Nigeria, using a modified version of an intonational model developed by G?rding and Bruce [Nordic Prosody II, edited by T. Fretheim (Tapir, Trondheim, 1981), pp. 33-39]. In this model, rules for intonation and tones are separated. Intonation is represented as sloping grids of (near) parallel lines, inside which tones are placed. The tones are associated with turning points of the fundamental frequency contour. Local rules may also modify the exact placement of a tone within the grid. The continuous fundamental frequency contour is modeled by concatenating the tonal points using polynomial equations. Thus the final pitch contour is modeled as an interaction between global and local factors. The slope of the intonational grid lines depends at least on sentence type (statement or question), sentence length, and tone pattern. The model is tested by reference to data from nine speakers of Kano Hausa.  相似文献   

4.
Ma EP  Baken RJ  Roark RM  Li PM 《Journal of voice》2012,26(5):670.e1-670.e6
Vocal attack time (VAT) is the time lag between the growth of the sound pressure signal and the development of physical contact of vocal folds at vocal initiation. It can be derived by a cross-correlation of short-time amplitude changes occurring in the sound pressure and electroglottographic (EGG) signals. Cantonese is a tone language in which tone determines the lexical meaning of the syllable. Such linguistic function of tone has implications for the physiology of tone production. The aim of the present study was to investigate the possible effects of Cantonese tones on VAT. Sound pressure and EGG signals were simultaneously recorded from 59 native Cantonese speakers (31 females and 28 males). The subjects were asked to read aloud 12 disyllabic words comprising homophone pairs of the six Cantonese lexical tones. Results revealed a gender difference in VAT values, with the mean VAT significantly smaller in females than in males. There was also a significant difference in VAT values between the two tone categories, with the mean VAT values of the three level tones (tone 1, 3, and 6) significantly smaller than those of the three contour tones (tone 2, 4, and 5). The findings support the notion that norms and interpretations based on nontone European languages may not be directly applied to tone languages.  相似文献   

5.
Intonation perception of English speech was examined for English- and Chinese-native listeners. F0 contour was manipulated from falling to rising patterns for the final words of three sentences. Listener's task was to identify and discriminate the intonation of each sentence (question versus statement). English and Chinese listeners had significant differences in the identification functions such as the categorical boundary and the slope. In the discrimination functions, Chinese listeners showed greater peakedness than English peers. The cross-linguistic differences in intonation perception were similar to the previous findings in perception of lexical tones, likely due to listeners' language background differences.  相似文献   

6.
This paper reports on a methodology for acoustically analyzing tone production in Cantonese. F0 offset versus F0 onset are plotted for a series of tokens for each of the six tones in the language. These are grouped according to tone type into a set of six ellipses. Qualitative visual observations regarding the degree of differentiation of the ellipses within the tonal space are summarized numerically using two indices, referred to here as Index 1 and Index 2. Index 1 is a ratio of the area of the speaker's tonal space and the average of the areas of the ellipses of the three target tones making up the tonal space. Index 2 is a ratio of the average distance between all six tonal ellipses and the average of the sum of the two axes for all six tone ellipses. Using this methodology, tonal differentiation is compared for three groups of speakers; normally hearing adults; normally hearing children aged from 4-6 years; and, prelinguistically deafened cochlear implant users aged from 4-11 years. A potential conundrum regarding how tone production abilities can outstrip tone perception abilities is explained using the data from the acoustic analyses. It is suggested that young children of the age range tested are still learning to normalize for pitch level differences in tone production. Acoustic analysis of the data thus supports results from tone perception studies and suggests that the methodology is suitable for use in studies investigating tone production in both clinical and research contexts.  相似文献   

7.
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech?×?F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.  相似文献   

8.
Adult non-native speech perception is subject to influence from multiple factors, including linguistic and extralinguistic experience such as musical training. The present research examines how linguistic and musical factors influence non-native word identification and lexical tone perception. Groups of native tone language (Thai) and non-tone language listeners (English), each subdivided into musician and non-musician groups, engaged in Cantonese tone word training. Participants learned to identify words minimally distinguished by five Cantonese tones during training, also completing musical aptitude and phonemic tone identification tasks. First, the findings suggest that either musical experience or a tone language background leads to significantly better non-native word learning proficiency, as compared to those with neither musical training nor tone language experience. Moreover, the combination of tone language and musical experience did not provide an additional advantage for Thai musicians above and beyond either experience alone. Musicianship was found to be more advantageous than a tone language background for tone identification. Finally, tone identification and musical aptitude scores were significantly correlated with word learning success for English but not Thai listeners. These findings point to a dynamic influence of musical and linguistic experience, both at the tone dentification level and at the word learning stage.  相似文献   

9.
This study explored the relationship between music and speech by examining absolute pitch and lexical tone perception. Taiwanese-speaking musicians were asked to identify musical tones without a reference pitch and multispeaker Taiwanese level tones without acoustic cues typically present for speaker normalization. The results showed that a high percentage of the participants (65% with an exact match required and 81% with one-semitone errors allowed) possessed absolute pitch, as measured by the musical tone identification task. A negative correlation was found between occurrence of absolute pitch and age of onset of musical training, suggesting that the acquisition of absolute pitch resembles the acquisition of speech. The participants were able to identify multispeaker Taiwanese level tones with above-chance accuracy, even though the acoustic cues typically present for speaker normalization were not available in the stimuli. No correlations were found between the performance in musical tone identification and the performance in Taiwanese tone identification. Potential reasons for the lack of association between the two tasks are discussed.  相似文献   

10.
Training American listeners to perceive Mandarin tones has been shown to be effective, with trainees' identification improving by 21%. Improvement also generalized to new stimuli and new talkers, and was retained when tested six months after training [Y. Wang et al., J. Acoust. Soc. Am. 106, 3649-3658 (1999)]. The present study investigates whether the tone contrasts gained perceptually transferred to production. Before their perception pretest and after their post-test, the trainees were recorded producing a list of Mandarin words. Their productions were first judged by native Mandarin listeners in an identification task. Identification of trainees' post-test tone productions improved by 18% relative to their pretest productions, indicating significant tone production improvement after perceptual training. Acoustic analyses of the pre- and post-training productions further reveal the nature of the improvement, showing that post-training tone contours approximate native norms to a greater degree than pretraining tone contours. Furthermore, pitch height and pitch contour are not mastered in parallel, with the former being more resistant to improvement than the latter. These results are discussed in terms of the relationship between non-native tone perception and production as well as learning at the suprasegmental level.  相似文献   

11.
This paper addresses a classical but important problem: The coupling of lexical tones and sentence intonation in tonal languages, such as Chinese, focusing particularly on voice fundamental frequency (F1) contours of speech. It is important because it forms the basis of speech synthesis technology and prosody analysis. We provide a solution to the problem with a constrained tone transformation technique based on structural modeling of the F1 contours. This consists of transforming target values in pairs from norms to variants. These targets are intended to sparsely specify the prosodic contributions to the F1 contours, while the alignment of target pairs between norms and variants is based on underlying lexical tone structures. When the norms take the citation forms of lexical tones, the technique makes it possible to separate sentence intonation from observed F0 contours. When the norms take normative F0 contours, it is possible to measure intonation variations from the norms to the variants, both having identical lexical tone structures. This paper explains the underlying scientific and linguistic principles and presents an algorithm that was implemented on computers. The method's capability of separating and combining tone and intonation is evaluated through analysis and re-synthesis of several hundred observed F0 contours.  相似文献   

12.
Downstep in pitch contour of Chinese Putonghua is examined using subtly designed sentences by controlling tone combinations. The results show both automatic and nonautomatic downstep phenomena exist in Chinese. In non-automatic downstep, low tones compress downwards the pitch range of the following syllables. and the main influence of downstep is on topline. Low tone not only lower the topline behind it, but also raise the high tones before it, the effects are compatible with each other. In automatic downstep, the topline of pitch contour in intonational phrase is presented as a linear downtrend, but it differs among speakers due to the effect of personal stress practice. In comparison with downstep phenomenon in other tone or non-tone languages, the downstep ratio in Chinese is not constant, and the domain of downstep is not limited within the adjacent tones.  相似文献   

13.
The perception of pitch for pure tones with frequencies falling inside low- or high-frequency dead regions (DRs) was examined. Subjects adjusted a variable-frequency tone to match the pitch of a fixed tone. Matches within one ear were often erratic for tones falling in a DR, indicating unclear pitch percepts. Matches across ears of subjects with asymmetric hearing loss, and octave matches within ears, indicated that tones falling within a DR were perceived with an unclear pitch and/or a pitch different from "normal" whenever the tones fell more than 0.5 octave within a low- or high-frequency DR. One unilaterally impaired subject, with only a small surviving region between 3 and 4 kHz, matched a fixed 0.5-kHz tone in his impaired ear with, on average, a 3.75-kHz tone in his better ear. When asked to match the 0.5-kHz tone with an amplitude-modulated tone, he adjusted the carrier and modulation frequencies to about 3.8 and 0.5 kHz, respectively, suggesting that some temporal information was still available. Overall, the results indicate that the pitch of low-frequency tones is not conveyed solely by a temporal code. Possibly, there needs to be a correspondence between place and temporal information for a normal pitch to be perceived.  相似文献   

14.
The tendency to hear a sequence of alternating low (L) and high (H) frequency tones as two streams can be increased by a preceding induction sequence, even one composed only of same-frequency tones. Four experiments used such an induction sequence (10 identical L tones) to promote segregation in a shorter test sequence comprising L and H tones. Previous studies have shown that the build-up of stream segregation is usually reduced greatly when a sudden change in acoustic properties distinguishes all of the induction tones from their test-sequence counterparts. Experiment 1 showed that a single deviant tone, created by altering the final inducer (in frequency, level, duration, or replacement with silence) reduced reported segregation, often substantially. Experiment 2 partially replicated this finding, using changes in temporal discrimination as a measure of streaming. Experiments 3 and 4 varied the size of a frequency change applied to the deviant tone; the extent of resetting varied with size only gradually. The results suggest that resetting begins to occur once the change is large enough to be noticeable. Since the prior inducers always remained unaltered in the deviant-tone conditions, it is proposed that a single change actively resets the build-up evoked by the induction sequence.  相似文献   

15.
从调类个性、句中位置和重音级别3个层面的语音分析,考察普通话4个声调在不同语调条件下的音高实现。目标词被置于3种不同的焦点位置(即句重音最强的位置)和两种不同的非焦点位置(即非句重音位置)上,对目标词的调域以及目标声调的高音点和低音点进行了观察分析。实验结果表明,(1)在焦点条件以及非焦点条件下,阳平的音高位于调域的中低音区,去声低音点的理论调值尽管低于阳平低音点,但去声低音点在音高实现上往往接近阳平低音点甚至会高于阳平低音点;(2)焦点在句首位置表现为调域向上下两个方向扩展,在句末位置则表现为调域整体上抬,但不同声调的高音点并不都与调域上限同比例变化,不同声调低音点的变化也并不都与调域下限同比例变化;(3)重音后音节的音高对焦点音节的依赖关系受音步组合关系的制约,焦点和焦点后音节若在同一音步内,焦点后音节的音高与焦点音节的音高关系类似轻声音节与其前接非轻声音节的音高关系,焦点和焦点后音节之间如果存在音步边界,焦点后音节的音高表现出一定的独立性。这些结果说明了语句中声调音高实现的复杂性,一个具有较好预测性的汉语普通话语调模型的建立需要包括焦点结构、韵律结构、协同发音、调类个性等不同层面信息的诸多细节化规则。   相似文献   

16.
The ability of five profoundly hearing-impaired subjects to "track" connected speech and to make judgments about the intonation and stress in spoken sentences was evaluated under a variety of auditory-visual conditions. These included speechreading alone, speechreading plus speech (low-pass filtered at 4 kHz), and speechreading plus a tone whose frequency, intensity, and temporal characteristics were matched to the speaker's fundamental frequency (F0). In addition, several frequency transfer functions were applied to the normal F0 range resulting in new ranges that were both transposed and expanded with respect to the original F0 range. Three of the five subjects were able to use several of the tonal representations of F0 nearly as well as speech to improve their speechreading rates and to make appropriate judgments concerning sentence intonation and stress. The remaining two subjects greatly improved their identification performance for intonation and stress patterns when expanded F0 signals were presented alone (i.e., without speechreading), but had difficulty integrating visual and auditory information at the connected discourse level, despite intensive training in the connected discourse tracking procedure lasting from 27.8-33.8 h.  相似文献   

17.
It is well known and universally accepted that people's ability to use ongoing interaural temporal disparities conveyed via pure tones is limited to frequencies below 1600 Hz. We wish to determine if this limitation is the result of the constant amplitude and periodic axis-crossings which characterize pure tones. To this end, an acoustic pointing task was employed in which listeners varied the interaural intensitive difference of a 500-Hz narrow-band noise (the pointer) so that the position of its intracranial image matched that of a second, experimenter-controlled stimulus (the target). Targets were either pure tones or narrow bands of noise (50 or 100 Hz wide). The narrow bands of noise were delayed interaurally in two distinct manners: Either the entire waveform or only the carrier was delayed. In the latter case, the envelopes and phase-functions of the bands of noise were identical interaurally. This resulted in noises which resemble the pure tone case in that the interaural delay is manifested as a constant phase-shift and resemble ordinary noises in that the envelope and phase are random functions of time. Surprisingly, it appears that all three targets were lateralized virtually identically regardless of frequency or bandwidth. Apparently, the dynamically changing envelopes and phases did not affect the listeners' use of interaural temporal disparities in any discernible fashion.  相似文献   

18.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

19.
An acoustic analysis of a German read-speech corpus showed that utterance-final /t/ aspirations differ systematically depending on the accompanying nuclear accent contour. Two contours were included: Terminal-falling early and late F0 peaks in terms of the Kiel Intonation Model. They correspond to H+L*L-% and L*+HL-% within the autosegmental metrical (AM) model. Aspirations in early-peak contexts were characterized by (a) "short", (b) "high-intensity" noise with (c) "low" frequency values for the spectral energy maximum above the lower spectral energy boundary. The opposite holds for aspirations accompanying late-peak productions. Starting from the acoustic analysis, a perception experiment was performed using a variant of the semantic differential paradigm. The stimuli were varied in the duration and intensity pattern as well as the spectral energy pattern of the final /t/ aspiration. Results revealed that the different noise patterns found in connection with early and late peak productions were able to change the attitudinal meaning of the stimuli toward the meaning profile of the respective F0 peak category. This suggests that final aspirations can be part of the coding of meanings, so far solely associated with intonation contours. Hence, the traditionally separated segmental and suprasegmental coding levels seem to be more intertwined than previously thought.  相似文献   

20.
A set of intonation contours has been synthesized using "yes" as the carrier word. Discrimination and identification functions have been obtained for these intonation contours. It was found that the stimuli could be classified as "statement," "question," "emphatic statement," and, possibly, "hesitation" and "weak statement." Predicted discrimination functions were calculated from the identification functions, and these were found to be positively correlated with the obtained discrimination functions. This provides evidence for category boundary effects in the perception of intonation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号