首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study examined the perception and acoustics of a large corpus of vowels spoken in consonant-vowel-consonant syllables produced in citation-form (lists) and spoken in sentences at normal and rapid rates by a female adult. Listeners correctly categorized the speaking rate of sentence materials as normal or rapid (2% errors) but did not accurately classify the speaking rate of the syllables when they were excised from the sentences (25% errors). In contrast, listeners accurately identified the vowels produced in sentences spoken at both rates when presented the sentences and when presented the excised syllables blocked by speaking rate or randomized. Acoustical analysis showed that formant frequencies at syllable midpoint for vowels in sentence materials showed "target undershoot" relative to citation-form values, but little change over speech rate. Syllable durations varied systematically with vowel identity, speaking rate, and voicing of final consonant. Vowel-inherent-spectral-change was invariant in direction of change over rate and context for most vowels. The temporal location of maximum F1 frequency further differentiated spectrally adjacent lax and tense vowels. It was concluded that listeners were able to utilize these rate- and context-independent dynamic spectrotemporal parameters to identify coarticulated vowels, even when sentential information about speaking rate was not available.  相似文献   

2.
This study investigated the effects of age and hearing loss on perception of accented speech presented in quiet and noise. The relative importance of alterations in phonetic segments vs. temporal patterns in a carrier phrase with accented speech also was examined. English sentences recorded by a native English speaker and a native Spanish speaker, together with hybrid sentences that varied the native language of the speaker of the carrier phrase and the final target word of the sentence were presented to younger and older listeners with normal hearing and older listeners with hearing loss in quiet and noise. Effects of age and hearing loss were observed in both listening environments, but varied with speaker accent. All groups exhibited lower recognition performance for the final target word spoken by the accented speaker compared to that spoken by the native speaker, indicating that alterations in segmental cues due to accent play a prominent role in intelligibility. Effects of the carrier phrase were minimal. The findings indicate that recognition of accented speech, especially in noise, is a particularly challenging communication task for older people.  相似文献   

3.
Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels.  相似文献   

4.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

5.
6.
Three experiments were conducted to study the effect of segmental and suprasegmental corrections on the intelligibility and judged quality of deaf speech. By means of digital signal processing techniques, including LPC analysis, transformations of separate speech sounds, temporal structure, and intonation were carried out on 30 Dutch sentences spoken by ten deaf children. The transformed sentences were tested for intelligibility and acceptability by presenting them to inexperienced listeners. In experiment 1, LPC based reflection coefficients describing segmental characteristics of deaf speakers were replaced by those of hearing speakers. A complete segmental correction caused a dramatic increase in intelligibility from 24% to 72%, which, for a major part, was due to correction of vowels. Experiment 2 revealed that correction of temporal structure and intonation caused only a small improvement from 24% to about 34%. Combination of segmental and suprasegmental corrections yielded almost perfectly understandable sentences, due to a more than additive effect of the two corrections. Quality judgments, collected in experiment 3, were in close agreement with the intelligibility measures. The results show that, in order for these speakers to become more intelligible, improving their articulation is more important than improving their production of temporal structure and intonation.  相似文献   

7.
Several studies have shown that extensive training with synthetic speech sounds can result in substantial improvements in listeners' perception of intraphonemic differences. The purpose of the present study was to investigate the effects of listening experience on the perception of intraphonemic differences in the absence of specific training with the synthetic speech sounds being tested. Phonetically trained listeners, musicians, and untrained listeners were tested on a two-choice identification task, a three-choice identification task, and an ABX discrimination task using a synthetic [bi]-[phi] continuum and a synthetic [wei]-[rei] continuum. The three-choice identification task included the identification of stimuli with an "indefinite" or "ambiguous" quality in addition to clear instances of the opposing phonetic categories. Results included: (1) All three subject groups showed some ability to identify ambiguous stimuli; (2) phonetically trained listeners were better at identifying ambiguous stimuli than musicians and untrained listeners; (3) phonetically trained listeners performed better on the discrimination task than musicians and untrained listeners; (4) musicians and untrained listeners did not differ on any of the listening tasks; and (5) participation by the inexperienced listeners in a 10-week introductory phonetics course did not result in improvements in either the three-choice identification task or the discrimination task.  相似文献   

8.
Intonation perception of English speech was examined for English- and Chinese-native listeners. F0 contour was manipulated from falling to rising patterns for the final words of three sentences. Listener's task was to identify and discriminate the intonation of each sentence (question versus statement). English and Chinese listeners had significant differences in the identification functions such as the categorical boundary and the slope. In the discrimination functions, Chinese listeners showed greater peakedness than English peers. The cross-linguistic differences in intonation perception were similar to the previous findings in perception of lexical tones, likely due to listeners' language background differences.  相似文献   

9.
When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments.  相似文献   

10.
This paper reports acoustic measurements and results from a series of perceptual experiments on the voiced-voiceless distinction for syllable-final stop consonants in absolute final position and in the context of a following syllable beginning with a different stop consonant. The focus is on temporal cues to the distinction, with vowel duration and silent closure duration as the primary and secondary dimensions, respectively. The main results are that adding a second syllable to a monosyllable increases the number of voiced stop consonant responses, as does shortening of the closure duration in disyllables. Both of these effects are consistent with temporal regularities in speech production: Vowel durations are shorter in the first syllable of disyllables than in monosyllables, and closure durations are shorter for voiced than for voiceless stops in disyllabic utterances of this type. While the perceptual effects thus may derive from two separate sources of tacit phonetic knowledge available to listeners, the data are also consistent with an interpretation in terms of a single effect; one of temporal proximity of following context.  相似文献   

11.
The purpose of this study was to examine the contribution of information provided by vowels versus consonants to sentence intelligibility in young normal-hearing (YNH) and typical elderly hearing-impaired (EHI) listeners. Sentences were presented in three conditions, unaltered or with either the vowels or the consonants replaced with speech shaped noise. Sentences from male and female talkers in the TIMIT database were selected. Baseline performance was established at a 70 dB SPL level using YNH listeners. Subsequently EHI and YNH participants listened at 95 dB SPL. Participants listened to each sentence twice and were asked to repeat the entire sentence after each presentation. Words were scored correct if identified exactly. Average performance for unaltered sentences was greater than 94%. Overall, EHI listeners performed more poorly than YNH listeners. However, vowel-only sentences were always significantly more intelligible than consonant-only sentences, usually by a ratio of 2:1 across groups. In contrast to written English or words spoken in isolation, these results demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.  相似文献   

12.
The ability to recognize spoken words interrupted by silence was investigated with young normal-hearing listeners and older listeners with and without hearing impairment. Target words from the revised SPIN test by Bilger et al. [J. Speech Hear. Res. 27(1), 32-48 (1984)] were presented in isolation and in the original sentence context using a range of interruption patterns in which portions of speech were replaced with silence. The number of auditory "glimpses" of speech and the glimpse proportion (total duration glimpsed/word duration) were varied using a subset of the SPIN target words that ranged in duration from 300 to 600 ms. The words were presented in isolation, in the context of low-predictability (LP) sentences, and in high-predictability (HP) sentences. The glimpse proportion was found to have a strong influence on word recognition, with relatively little influence of the number of glimpses, glimpse duration, or glimpse rate. Although older listeners tended to recognize fewer interrupted words, there was considerable overlap in recognition scores across listener groups in all conditions, and all groups were affected by interruption parameters and context in much the same way.  相似文献   

13.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

14.
A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers.  相似文献   

15.
With as few as 10-20 sentences of exposure, listeners are able to adapt to speech that is highly distorted compared to that which is encountered in everyday conversation. The current study examines the extent to which adaptation to time-compressed speech can be impeded by disrupting the continuity of the exposure sentences, and whether this differs between young and older adult listeners when they are equated for starting accuracy. In separate sessions conducted one week apart, the degree of adaptation was assessed in four exposure conditions, all of which involved exposure to the same number of time-compressed sentences. A continuous exposure condition involved presentation of the time-compressed sentences without interruption. Two alternation conditions alternated time-compressed speech and uncompressed speech by single sentences or groups of four sentences. A fourth condition presented sentences that were separated by a period of silence but no uncompressed speech. For all conditions, neither young nor older adults' overall level of learning was influenced by disruptions to the exposure sentences. In addition, participants' performance showed reliable improvement across the first and subsequent sessions. These results support robust learning mechanisms in speech perception that remain functional throughout the lifespan.  相似文献   

16.
The reiterant speech of ten native speakers of French was analyzed to develop baseline measures for syllable and consonant/vowel timing for a series of two-, three-, four-, and five-syllable French words spoken in isolation. Ten native speakers of English, who learned French as a second language, produced reiterant versions of both the French words and a comparable set of English words. The native speakers of English were divided into two groups on the basis of their second language experience. The first group consisted of four university-level teachers, who were relatively experienced learners of French, and the second group of six less experienced learners of French. The French reiterant imitations of the two groups of native speakers of English were compared to the native French speakers' productions. The timing patterns of the experienced group of non-native speakers did not differ significantly from those of the native French speakers, whereas there was a significant difference between these two groups and the group of six less experienced second-language learners. Deviations from the French baseline measures produced by the less experienced group are discussed in terms of the influence of the timing patterns of English and the literature on a sensitive period for second language acquisition.  相似文献   

17.
The departure point of the present paper is our effort to characterize and understand the spatiotemporal structure of articulatory patterns in speech. To do so, we removed segmental variation as much as possible while retaining the spoken act's stress and prosodic structure. Subjects produced two sentences from the "rainbow passage" using reiterant speech in which normal syllables were replaced by /ba/ or /ma/. This task was performed at two self-selected rates, conversational and fast. Infrared LEDs were placed on the jaw and lips and monitored using a modified SELSPOT optical tracking system. As expected, when pauses marking major syntactic boundaries were removed, a high degree of rhythmicity within rate was observed, characterized by well-defined periodicities and small coefficients of variation. When articulatory gestures were examined geometrically on the phase plane, the trajectories revealed a scaling relation between a gesture's peak velocity and displacement. Further quantitative analysis of articulator movement as a function of stress and speaking rate was indicative of a language-modulated dynamical system with linear stiffness and equilibrium (or rest) position as key control parameters. Preliminary modeling was consonant with this dynamical perspective which, importantly, does not require that time per se be a controlled variable.  相似文献   

18.
In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986)], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also.  相似文献   

19.
In tone languages there are potential conflicts in the perception of lexical tone and intonation, as both depend mainly on the differences in fundamental frequency (F0) patterns. The present study investigated the acoustic cues associated with the perception of sentences as questions or statements in Cantonese, as a function of the lexical tone in sentence final position. Cantonese listeners performed intonation identification tasks involving complete sentences, isolated final syllables, and sentences without the final syllable (carriers). Sensitivity (d' scores) were similar for complete sentences and final syllables but were significantly lower for carriers. Sensitivity was also affected by tone identity. These findings show that the perception of questions and statements relies primarily on the F0 characteristics of the final syllables (local F0 cues). A measure of response bias (c) provided evidence for a general bias toward the perception of statements. Logistic regression analyses showed that utterances were accurately classified as questions or statements by using average F0 and F0 interval. Average F0 of carriers (global F0 cue) was also found to be a reliable secondary cue. These findings suggest that the use of F0 cues for the perception of intonation question in tonal languages is likely to be language-specific.  相似文献   

20.
The 36 basic "sentences" in the experiment were six-syllable nonsense sequences of the form DAS a LAS a GAS a or a DAS a BAS a LAS. Either (a) one vowel in the sentence was lengthened or shortened by about 50, 90, or 130 ms by computer-editing routines, or (b) the sentence was left intact (as spoken). The resulting perceptual impression after the vowel change was a change of tempo within the sentence. Vowel changes occurred systematically throughout the sentences, in one of syllables one through five. Reaction time (RT) was recorded to assigned target segments /b, d, or g/ in one of syllables one through six, and RT was compared to targets in tempo-changed versus intact sentences (these were acoustically identical except for the distorted vowel). The listeners responded to over 2000 versions of the sentences. The results were: (a) Tempo changes generally increased segment target RT. (b) Tempo-change effects were ubiquitous; for instance, vowel changes in the first syllable increased RT to targets in later syllables, and the reach of effects spanned four syllables. Both vowel shortening and lengthening increased target RT. (c) Effects attributed to precessing time decreased, whereas effects attributed to stimulus expectancy increased, with time into the sentence. (d) Tempo-change effects persisted throughout the experiment despite practice and familiarity with stimuli. The conclusions were: (a) The effects of time distortion of the stimulus on target RT were produced mainly by changes in stimulus-induced expectancy, not changes in processing time. (b) The expected input to perception is the acoustically intact utterance in both its rhythmic and segmental aspects; these aspects are not perceived independently.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号