首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Loudness predicts prominence: fundamental frequency lends little   总被引:1,自引:0,他引:1  
We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/nonprominence judgments, and then evaluated how well they behaved. The classifiers operate on 452 ms windows centered on syllables, using different acoustic measures. By comparing the performance of classifiers based on different measures, we can learn how prominence is expressed in speech. Contrary to textbooks and common assumption, fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance. Instead, speakers primarily marked prominence with patterns of loudness and duration. Two other acoustic measures that we examined also played a minor role, comparable to f0. All dialects and speaking styles studied here share a common definition of prominence. The result is robust to differences in labeling practice and the dialect of the labeler.  相似文献   

2.
Because they consist, in large part, of random turbulent noise, fricatives present a challenge to attempts to specify the phonetic correlates of phonological features. Previous research has focused on temporal properties, acoustic power, and a variety of spectral properties of fricatives in a number of contexts [Jongman et al., J. Acoust. Soc. Am. 108, 1252-1263 (2000); Jesus and Shadle, J. Phonet. 30, 437-467 (2002); Crystal and House, J. Acoust. Soc. Am. 83, 1553-1573 (1988a)]. However, no systematic investigation of the effects of focus and prosodic context on fricative production has been carried out. Manipulation of explicit focus can serve to selectively exaggerate linguistically relevant properties of speech in much the same manner as stress [de Jong, J. Acoust. Soc. Am. 97, 491-504 (1995); de Jong, J. Phonet. 32, 493-516 (2004); de Jong and Zawaydeh, J. Phonet. 30, 53-75 (2002)]. This experimental technique was exploited to investigate acoustic power along with temporal and spectral characteristics of American English fricatives in two prosodic contexts, to probe whether native speakers selectively attend to subsegmental features, and to consider variability in fricative production across speakers. While focus in general increased noise power and duration, speakers did not selectively enhance spectral features of the target fricatives.  相似文献   

3.
In this study the effects of accent and prosodic boundaries on the production of English vowels (/a,i/), by concurrently examining acoustic vowel formants and articulatory maxima of the tongue, jaw, and lips obtained with EMA (Electromagnetic Articulography) are investigated. The results demonstrate that prosodic strengthening (due to accent and/or prosodic boundaries) has differential effects depending on the source of prominence (in accented syllables versus at edges of prosodic domains; domain initially versus domain finally). The results are interpreted in terms of how the prosodic strengthening is related to phonetic realization of vowel features. For example, when accented, /i/ was fronter in both acoustic and articulatory vowel spaces (enhancing [-back]), accompanied by an increase in both lip and jaw openings (enhancing sonority). By contrast, at edges of prosodic domains (especially domain-finally), /i/ was not necessarily fronter, but higher (enhancing [+high]), accompanied by an increase only in the lip (not jaw) opening. This suggests that the two aspects of prosodic structure (accent versus boundary) are differentiated by distinct phonetic patterns. Further, it implies that prosodic strengthening, though manifested in fine-grained phonetic details, is not simply a low-level phonetic event but a complex linguistic phenomenon, closely linked to the enhancement of phonological features and positional strength that may license phonological contrasts.  相似文献   

4.
This study assessed the acoustic coarticulatory effects of phrasal accent on [V1.CV2] sequences, when separately applied to V1 or V2, surrounding the voiced stops [b], [d], and [g]. Three adult speakers each produced 360 tokens (six V1 contexts x ten V2 contexts x three stops x two emphasis conditions). Realizing that anticipatory coarticulation of V2 onto the intervocalic C can be influenced by prosodic effects, as well as by vowel context effects, a modified locus equation regression metric was used to isolate the effect of phrasal accent on consonantal F2 onsets, independently of prosodically induced vowel expansion effects. The analyses revealed two main emphasis-dependent effects: systematic differences in F2 onset values and the expected expansion of vowel space. By accounting for the confounding variable of stress-induced vowel space expansion, a small but consistent coarticulatory effect of emphatic stress on the consonant was uncovered in lingually produced stops, but absent in labial stops. Formant calculations based on tube models indicated similarly increased F2 onsets when stressed /d/ and /g/ were simulated with deeper occlusions resulting from more forceful closure movements during phrasal accented speech.  相似文献   

5.
It was hypothesized that the retrieval of prosodic and phonemic information from the acoustic signal is facilitated when prosodic information is encoded by co-occurring suprasegmental cues. To test the hypothesis, two-choice speeded classification experiments were conducted, which examined processing interaction between prosodic phrase-boundary vs stop-place information in speakers of Southern British English. Results confirmed that the degree of interaction between boundary and stop-place information diminished when the pre-boundary vowel was signaled by duration and F(0), compared to when it was signaled by either duration or F(0) alone. It is argued that the relative ease of retrieval of prosodic and phonemic information arose from advantages of prosodic cue integration.  相似文献   

6.
Articulatory activity underlying changes in stress and speaking rate was studied by means of x-ray cinefilm and acoustic speech records. Two Swedish subjects produced vowel-consonant-vowel (VCV) utterances under controlled rate-stress conditions. The vowels were tense (i a u), and the consonants were the voiceless stops, notably (p). The spectral characteristics of the vowels were not significantly influenced by changes in the speaking rate. They were, however, significantly emphasized under stress. At the articulatory level, stressed vowels displayed narrower oral tract constrictions than unstressed vowels at the two speaking rates studied. At the faster speaking rate, vowel- and consonant-related gestures were coproduced to a greater extent than at the slower rate. The data, failing to produce evidence for an "undershoot" mechanism, support the view that dialect-specific correlates of stress are actively safeguarded by means of articulatory reorganization.  相似文献   

7.
This study tested the hypothesis that heritage speakers of a minority language, due to their childhood experience with two languages, would outperform late learners in producing contrast: language-internal phonological contrast, as well as cross-linguistic phonetic contrast between similar, yet acoustically distinct, categories of different languages. To this end, production of Mandarin and English by heritage speakers of Mandarin was compared to that of native Mandarin speakers and native American English-speaking late learners of Mandarin in three experiments. In experiment 1, back vowels in Mandarin and English were produced distinctly by all groups, but the greatest separation between similar vowels was achieved by heritage speakers. In experiment 2, Mandarin aspirated and English voiceless plosives were produced distinctly by native Mandarin speakers and heritage speakers, who both put more distance between them than late learners. In experiment 3, the Mandarin retroflex and English palato-alveolar fricatives were distinguished by more heritage speakers and late learners than native Mandarin speakers. Thus, overall the hypothesis was supported: across experiments, heritage speakers were found to be the most successful at simultaneously maintaining language-internal and cross-linguistic contrasts, a result that may stem from a close approximation of phonetic norms that occurs during early exposure to both languages.  相似文献   

8.
This study explores the hypothesis that clear speech is produced with greater "articulatory effort" than normal speech. Kinematic and acoustic data were gathered from seven subjects as they pronounced multiple repetitions of utterances in different speaking conditions, including normal, fast, clear, and slow. Data were analyzed within a framework based on a dynamical model of single-axis frictionless movements, in which peak movement speed is used as a relative measure of articulatory effort (Nelson, 1983). There were differences in peak movement speed, distance and duration among the conditions and among the speakers. Three speakers produced the "clear" condition utterances with movements that had larger distances and durations than those for "normal" utterances. Analyses of the data within a peak speed, distance, duration "performance space" indicated increased effort (reflected in greater peak speed) in the clear condition for the three speakers, in support of the hypothesis. The remaining four speakers used other combinations of parameters to produce the clear condition. The validity of the simple dynamical model for analyzing these complex movements was considered by examining several additional parameters. Some movement characteristics differed from those required for the model-based analysis, presumably because the articulators are complicated structurally and interact with one another mechanically. More refined tests of control strategies for different speaking styles will depend on future analyses of more complicated movements with more realistic models.  相似文献   

9.
The role of language-specific factors in phonetically based trading relations was examined by assessing the ability of 20 native Japanese speakers to identify and discriminate stimuli of two synthetic /r/-/l/ series that varied temporal and spectral parameters independently. Results of forced-choice identification and oddity discrimination tasks showed that the nine Japanese subjects who were able to identify /r/ and /l/ reliably demonstrated a trading relation similar to that of Americans. Discrimination results reflected the perceptual equivalence of temporal and spectral parameters. Discrimination by the 11 Japanese subjects who were unable to identify the /r/-/l/ series differed significantly from the skilled Japanese subjects and native English speakers. However, their performance could not be predicted on the basis of acoustic dissimilarity alone. These results provide evidence that the trading relation between temporal and spectral cues for the /r/-/l/ contrast is not solely attributable to general auditory or language-universal phonetic processing constraints, but rather is also a function of phonemic processes that can be modified in the course of learning a second language.  相似文献   

10.
Native speakers of Mandarin Chinese have difficulty producing native-like English stress contrasts. Acoustically, English lexical stress is multidimensional, involving manipulation of fundamental frequency (F0), duration, intensity and vowel quality. Errors in any or all of these correlates could interfere with perception of the stress contrast, but it is unknown which correlates are most problematic for Mandarin speakers. This study compares the use of these correlates in the production of lexical stress contrasts by 10 Mandarin and 10 native English speakers. Results showed that Mandarin speakers produced significantly less native-like stress patterns, although they did use all four acoustic correlates to distinguish stressed from unstressed syllables. Mandarin and English speakers' use of amplitude and duration were comparable for both stressed and unstressed syllables, but Mandarin speakers produced stressed syllables with a higher F0 than English speakers. There were also significant differences in formant patterns across groups, such that Mandarin speakers produced English-like vowel reduction in certain unstressed syllables, but not in others. Results suggest that Mandarin speakers' production of lexical stress contrasts in English is influenced partly by native-language experience with Mandarin lexical tones, and partly by similarities and differences between Mandarin and English vowel inventories.  相似文献   

11.
Coarticulation studies in speech of deaf individuals have so far focused on intrasyllabic patterning of various consonant-vowel sequences. In this study, both inter- and intrasyllabic patterning were examined in disyllables /symbol see text #CVC/ and the effects of phonetic context, speaking rate, and segment type were explored. Systematic observation of F2 and durational measurements in disyllables minimally contrasting in vocalic ([i], [u,][a]) and in consonant ([b], [d]) context, respectively, was made at selected locations in the disyllable, in order to relate inferences about articulatory adjustments with their temporal coordinates. Results indicated that intervocalic coarticulation across hearing and deaf speakers varied as a function of the phonetic composition of disyllables (b_b or d_d). The deaf speakers showed reduced intervocalic coarticulation for bilabial but not for alveolar disyllables compared to the hearing speakers. Furthermore, they showed less marked consonant influences on the schwa and stressed vowel of disyllables compared to the hearing controls. Rate effects were minimal and did not alter the coarticulatory patterns observed across hearing status. The above findings modify the conclusions drawn from previous studies and suggest that the speech of deaf and hearing speakers is guided by different gestural organization.  相似文献   

12.
The use of prosody in syntactic disambiguation.   总被引:6,自引:0,他引:6  
Prosodic structure and syntactic structure are not identical; neither are they unrelated. Knowing when and how the two correspond could yield better quality speech synthesis, could aid in the disambiguation of competing syntactic hypotheses in speech understanding, and could lead to a more comprehensive view of human speech processing. In a set of experiments involving 35 pairs of phonetically similar sentences representing seven types of structural contrasts, the perceptual evidence shows that some, but not all, of the pairs can be disambiguated on the basis of prosodic differences. The phonological evidence relates the disambiguation primarily to boundary phenomena, although prominences sometimes play a role. Finally, phonetic analyses describing the attributes of these phonological markers indicate the importance of both absolute and relative measures.  相似文献   

13.
Numerous studies have indicated that prosodic phrase boundaries may be marked by a variety of acoustic phenomena including segmental lengthening. It has not been established, however, whether this lengthening is restricted to the immediate vicinity of the boundary, or if it extends over some larger region. In this study, segmental lengthening in the vicinity of prosodic boundaries is examined and found to be restricted to the rhyme of the syllable preceding the boundary. By using a normalized measure of segmental lengthening, and by compensating for differences in speaking rate, it is also shown that at least four distinct types of boundaries can be distinguished on the basis of this lengthening.  相似文献   

14.
A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers.  相似文献   

15.
This study presents EMA (electromagnetic articulography) data on articulation of the vowel /a/ at different prosodic boundaries in French. Three speakers of metropolitan French produced utterances containing the vowel /a/, preceded by /t/ and followed by one of six consonants /b d g f s S/ (three stops and three fricatives), with different prosodic boundaries intervening between the /a/ and the six different consonants. The prosodic boundaries investigated are the Utterance, the Intonational phrase, the Accentual phrase, and the Word. Data for the Tongue Tip, Tongue Body, and Jaw are presented. The articulatory data presented here were recorded at the same time as the acoustic data presented in Tabain [J. Acoust. Soc. Am. 113, 516-531 (2003)]. Analyses show that there is a strong effect on peak displacement of the vowel according to the prosodic hierarchy, with the stronger prosodic boundaries inducing a much lower Tongue Body and Jaw position than the weaker prosodic boundaries. Durations of both the opening movement into and the closing movement out of the vowel are also affected. Peak velocity of the articulatory movements is also examined, and, contrary to results for phrase-final lengthening, it is found that peak velocity of the opening movement into the vowel tends to increase with the higher prosodic boundaries, together with the increased magnitude of the movement between the consonant and the vowel. Results for the closing movement out of the vowel and into the consonant are not so clear. Since one speaker shows evidence of utterance-level articulatory declension, it is suggested that the competing constraints of articulatory declension and prosodic effects might explain some previous results on phrase-final lengthening.  相似文献   

16.
The purpose of the present study was to describe the effects ofacute laryngitis on some aerodynamic, acoustic, and perceptual measures. Eleven subjects with diagnosed acute laryngitis due to upper respiratory infection were recorded during a laryngitic episode and 1 week to 10 days after amelioration of the laryngitic condition. Fundamental frequency values, collapsed across the five vowels, were significantly reduced in the laryngitic compared with the normal speaking condition. The decrease in fundamental frequency associated with acute laryngitis suggests an increase in the mass of the vocal folds. In addition, aerodynamic values differed significantly for the laryngitic condition compared with the normal speaking condition, suggesting the presence of laryngeal hypofunction. Perceptual data indicated that speakers in the laryngitic condition were judged to have a hoarse voice when compared with the normal speaking condition.  相似文献   

17.
This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary.  相似文献   

18.
This study investigated whether the production of prosodic focus and phrasing contrasts was modified when interlocutors could only hear each other [auditory only (AO)], compared to when they could hear and see each other [face to face (FTF)]. The prosodic characteristics of utterances produced by six talkers were examined using both acoustic and perceptual measures (ratings of the degree of focus or clarity of the statement-question contrast). The acoustic measures showed a range of differences between narrow focus and between phrasing contrasts and some of these differences were greater in the AO setting than the FTF one. The listener's ratings of focus and phrasing showed a clear difference between the AO and FTF conditions, with perceptual attributes of both narrow focus and echoic question phrasing being rated as clearer in the AO condition. To explain these results it is proposed that talkers compensate for the lack of visual prosodic cues in the AO condition by taking extra care (relative to FTF conditions) to ensure the effective transmission of prosodic cues.  相似文献   

19.
The complexities of how prosodic structure, both at the phrasal and syllable levels, shapes speech production have begun to be illuminated through studies of articulatory behavior. The present study contributes to an understanding of prosodic signatures on articulation by examining the joint effects of phrasal and syllable position on the production of consonants. Articulatory kinematic data were collected for five subjects using electromagnetic articulography (EMA) to record target consonants (labial, labiodental, and tongue tip), located in (1) either syllable final or initial position and (2) either at a phrase edge or phrase medially. Spatial and temporal characteristics of the consonantal constriction formation and release were determined based on kinematic landmarks in the articulator velocity profiles. The results indicate that syllable and phrasal position consistently affect the movement duration; however, effects on displacement were more variable. For most subjects, the boundary-adjacent portions of the movement (constriction release for a preboundary coda and constriction formation for a postboundary onset) are not differentially affected in terms of phrasal lengthening-both lengthen comparably.  相似文献   

20.
Three experiments were conducted to study relative contributions of speaking rate, temporal envelope, and temporal fine structure to clear speech perception. Experiment I used uniform time scaling to match the speaking rate between clear and conversational speech. Experiment II decreased the speaking rate in conversational speech without processing artifacts by increasing silent gaps between phonetic segments. Experiment III created "auditory chimeras" by mixing the temporal envelope of clear speech with the fine structure of conversational speech, and vice versa. Speech intelligibility in normal-hearing listeners was measured over a wide range of signal-to-noise ratios to derive speech reception thresholds (SRT). The results showed that processing artifacts in uniform time scaling, particularly time compression, reduced speech intelligibility. Inserting gaps in conversational speech improved the SRT by 1.3 dB, but this improvement might be a result of increased short-term signal-to-noise ratios during level normalization. Data from auditory chimeras indicated that the temporal envelope cue contributed more to the clear speech advantage at high signal-to-noise ratios, whereas the temporal fine structure cue contributed more at low signal-to-noise ratios. Taken together, these results suggest that acoustic cues for the clear speech advantage are multiple and distributed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号