首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Thresholds of vowel formant discrimination for F1 and F2 of isolated vowels with full and partial vowel spectra were measured for normal-hearing listeners at fixed and roving speech levels. Performance of formant discrimination was significantly better for fixed levels than for roving levels with both full and partial spectra. The effect of vowel spectral range was present only for roving levels, but not for fixed levels. These results, consistent with studies of profile analysis, indicated different perceptual mechanisms for listeners to discriminate vowel formant frequency at fixed and roving levels.  相似文献   

2.
Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels.  相似文献   

3.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

4.
A method is proposed to reduce the ambiguity of vowels in connected speech by normalizing the coarticulation effects. The method is applied to vowels in phonetic environments where great ambiguity would be likely to occur, taking as their features the first and second formant trajectories. The separability between vowel clusters is found to be greatly improved for the vowel samples. In addition, distribution of the vowels on a feature plane characterized by this method seems to reflect their perceptual nature when presented to listeners without isolation from their phonetic environments. The results suggest that the method proposed here is useful for automatic speech recognition and help infer some possible mechanisms underlying dynamic aspects of human speech recognition.  相似文献   

5.
The identification of front vowels was studied in normal-hearing listeners using stimuli whose spectra had been altered to approximate the spectrum of vowels processed by auditory filters similar to those that might accompany sensorineural hearing loss. In the first experiment, front vowels were identified with greater than 95% accuracy when the first formant was specified in a normal manner and the higher frequency formants were represented by a broad, flat spectral plateau ranging from approximately 1600 to 3500 Hz. In the second experiment, the bandwidth of the first formant was systematically widened for stimuli with already flattened higher frequency formants. Normal vowel identification was preserved until the first formant was widened to six times its normal bandwidth. These results may account for the coexistence of abnormal vowel masking patterns (indicating flattened auditory spectra) and normal vowel recognition.  相似文献   

6.
In face-to-face speech communication, the listener extracts and integrates information from the acoustic and optic speech signals. Integration occurs within the auditory modality (i.e., across the acoustic frequency spectrum) and across sensory modalities (i.e., across the acoustic and optic signals). The difficulties experienced by some hearing-impaired listeners in understanding speech could be attributed to losses in the extraction of speech information, the integration of speech cues, or both. The present study evaluated the ability of normal-hearing and hearing-impaired listeners to integrate speech information within and across sensory modalities in order to determine the degree to which integration efficiency may be a factor in the performance of hearing-impaired listeners. Auditory-visual nonsense syllables consisting of eighteen medial consonants surrounded by the vowel [a] were processed into four nonoverlapping acoustic filter bands between 300 and 6000 Hz. A variety of one, two, three, and four filter-band combinations were presented for identification in auditory-only and auditory-visual conditions: A visual-only condition was also included. Integration efficiency was evaluated using a model of optimal integration. Results showed that normal-hearing and hearing-impaired listeners integrated information across the auditory and visual sensory modalities with a high degree of efficiency, independent of differences in auditory capabilities. However, across-frequency integration for auditory-only input was less efficient for hearing-impaired listeners. These individuals exhibited particular difficulty extracting information from the highest frequency band (4762-6000 Hz) when speech information was presented concurrently in the next lower-frequency band (1890-2381 Hz). Results suggest that integration of speech information within the auditory modality, but not across auditory and visual modalities, affects speech understanding in hearing-impaired listeners.  相似文献   

7.
Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments.  相似文献   

8.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

9.
Spectral integration refers to the summation of activity beyond the bandwidth of the peripheral auditory filter. Several experimental lines have sought to determine the bandwidth of this "supracritical" band phenomenon. This paper reports on two experiments which tested the limit on spectral integration in the same listeners. Experiment I verified the critical separation of 3.5 bark in two-formant synthetic vowels as advocated by the center-of-gravity (COG) hypothesis. According to the COG effect, two formants are integrated into a single perceived peak if their separation does not exceed approximately 3.5 bark. With several modifications to the methods of a classic COG matching task, the present listeners responded to changes in pitch in two-formant synthetic vowels, not estimating their phonetic quality. By changing the amplitude ratio of the formants, the frequency of the perceived peak was closer to that of the stronger formant. This COG effect disappeared with larger formant separation. In a second experiment, auditory spectral resolution bandwidths were measured for the same listeners using common-envelope, two-tone complex signals. Results showed that the limits of spectral averaging in two-formant vowels and two-tone spectral resolution bandwidth were related for two of the three listeners. The third failed to perform the discrimination task. For the two subjects who completed both tasks, the results suggest that the critical region in vowel task and the complex-tone discriminability estimates are linked to a common mechanism, i.e., to an auditory spectral resolving power. A signal-processing model is proposed to predict the COG effect in two-formant synthetic vowels. The model introduces two modifications to Hermansky's [J. Acoust. Soc. Am. 87, 1738-1752 (1990)] perceptual linear predictive (PLP) model. The model predictions are generally compatible with the present experimental results and with the predictions of several earlier models accounting for the COG effect.  相似文献   

10.
To clarify the role of formant frequency in the perception of pitch in whispering, we conducted a preliminary experiment to determine (1.) whether speakers change their pitch during whispering; (2.) whether listeners can perceive differences in pitch; and (3.) what the acoustical features are when speakers change their pitch. The listening test of whispered Japanese speech demonstrates that one can determine the perceived pitch of vowel /a/ as ordinary, high, or low. Acoustical analysis revealed that the perception of pitch corresponds to some formant frequencies. Further data with synthesized whispered voice are necessary to confirm the importance of the formant frequencies in detail for perceived pitch of whispered vowels.  相似文献   

11.
Previous work has demonstrated that normal-hearing individuals use fine-grained phonetic variation, such as formant movement and duration, when recognizing English vowels. The present study investigated whether these cues are used by adult postlingually deafened cochlear implant users, and normal-hearing individuals listening to noise-vocoder simulations of cochlear implant processing. In Experiment 1, subjects gave forced-choice identification judgments for recordings of vowels that were signal processed to remove formant movement and/or equate vowel duration. In Experiment 2, a goodness-optimization procedure was used to create perceptual vowel space maps (i.e., best exemplars within a vowel quadrilateral) that included F1, F2, formant movement, and duration. The results demonstrated that both cochlear implant users and normal-hearing individuals use formant movement and duration cues when recognizing English vowels. Moreover, both listener groups used these cues to the same extent, suggesting that postlingually deafened cochlear implant users have category representations for vowels that are similar to those of normal-hearing individuals.  相似文献   

12.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

13.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners.  相似文献   

14.
The purpose of this study is to determine the relative impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners. Sentences were presented in two conditions wherein reverberant consonant segments were replaced with clean consonants, and in another condition wherein reverberant vowel segments were replaced with clean vowels. The underlying assumption is that self-masking effects would dominate in the first condition, whereas overlap-masking effects would dominate in the second condition. Results indicated that the degradation of speech intelligibility in reverberant conditions is caused primarily by self-masking effects that give rise to flattened formant transitions.  相似文献   

15.
This investigation examined whether listeners with mild-moderate sensorineural hearing impairment have a deficit in the ability to integrate synchronous spectral information in the perception of speech. In stage 1, the bandwidth of filtered speech centered either on 500 or 2500 Hz was varied adaptively to determine the width required for approximately 15%-25% correct recognition. In stage 2, these criterion bandwidths were presented simultaneously and percent correct performance was determined in fixed block trials. Experiment 1 tested normal-hearing listeners in quiet and in masking noise. The main findings were (1) there was no correlation between the criterion bandwidths at 500 and 2500 Hz; (2) listeners achieved a high percent correct in stage 2 (approximately 80%); and (3) performance in quiet and noise was similar. Experiment 2 tested listeners with mild-moderate sensorineural hearing impairment. The main findings were (1) the impaired listeners showed high variability in stage 1, with some listeners requiring narrower and others requiring wider bandwidths than normal, and (2) hearing-impaired listeners achieved percent correct performance in stage 2 that was comparable to normal. The results indicate that listeners with mild-moderate sensorineural hearing loss do not have an essential deficit in the ability to integrate across-frequency speech information.  相似文献   

16.
This study was designed to examine the role of duration in vowel perception by testing listeners on the identification of CVC syllables generated at different durations. Test signals consisted of synthesized versions of 300 utterances selected from a large, multitalker database of /hVd/ syllables [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Four versions of each utterance were synthesized: (1) an original duration set (vowel duration matched to the original utterance), (2) a neutral duration set (duration fixed at 272 ms, the grand mean across all vowels), (3) a short duration set (duration fixed at 144 ms, two standard deviations below the mean), and (4) a long duration set (duration fixed at 400 ms, two standard deviations above the mean). Experiment 1 used a formant synthesizer, while a second experiment was an exact replication using a sinusoidal synthesis method that represented the original vowel spectrum more precisely than the formant synthesizer. Findings included (1) duration had a small overall effect on vowel identity since the great majority of signals were identified correctly at their original durations and at all three altered durations; (2) despite the relatively small average effect of duration, some vowels, especially [see text] and [see text], were significantly affected by duration; (3) some vowel contrasts that differ systematically in duration, such as [see text], and [see text], were minimally affected by duration; (4) a simple pattern recognition model appears to be capable of accounting for several features of the listening test results, especially the greater influence of duration on some vowels than others; and (5) because a formant synthesizer does an imperfect job of representing the fine details of the original vowel spectrum, results using the formant-synthesized signals led to a slight overestimate of the role of duration in vowel recognition, especially for the shortened vowels.  相似文献   

17.
For each of five vowels [i e a o u] following [t], a continuum from non-nasal to nasal was synthesized. Nasalization was introduced by inserting a pole-zero pair in the vicinity of the first formant in an all-pole transfer function. The frequencies and spacing of the pole and zero were systematically varied to change the degree of nasalization. The selection of stimulus parameters was determined from acoustic theory and the results of pilot experiments. The stimuli were presented for identification and discrimination to listeners whose language included a non-nasal--nasal vowel opposition (Gujarati, Hindi, and Bengali) and to American listeners. There were no significant differences between language groups in the 50% crossover points of the identification functions. Some vowels were more influenced by range and context effects than were others. The language groups showed some differences in the shape of the discrimination functions for some vowels. On the basis of the results, it is postulated that (1) there is a basic acoustic property of nasality, independent of the vowel, to which the auditory system responds in a distinctive way regardless of language background; and (2) there are one or more additional acoustic properties that may be used to various degrees in different languages to enhance the contrast between a nasal vowel and its non-nasal congener. A proposed candidate for the basic acoustic property is a measure of the degree of prominence of the spectral peak in the vicinity of the first formant. Additional secondary properties include shifts in the center of gravity of the low-frequency spectral prominence, leading to a change in perceived vowel height, and changes in overall spectral balance.  相似文献   

18.
Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee.  相似文献   

19.
Formant dynamics in vowel nuclei contribute to vowel classification in English. This study examined listeners' ability to discriminate dynamic second formant transitions in synthetic high front vowels. Acoustic measurements were made from the nuclei (steady state and 20% and 80% of vowel duration) for the vowels /i, I, e, epsilon, ae/ spoken by a female in /bVd/ context. Three synthesis parameters were selected to yield twelve discrimination conditions: initial frequency value for F2 (2525, 2272, or 2068 Hz), slope direction (rising or falling), and duration (110 or 165 ms). F1 frequency was roved. In the standard stimuli, F0 and F1-F4 were steady state. In the comparison stimuli only F2 frequency varied linearly to reach a final frequency. Five listeners were tested under adaptive tracking to estimate the threshold for frequency extent, the minimal detectable difference in frequency between the initial and final F2 values, called deltaF extent. Analysis showed that initial F2 frequency and direction of movement for some F2 frequencies contributed to significant differences in deltaF extent. Results suggested that listeners attended to differences in the stimulus property of frequency extent (hertz), not formant slope (hertz/second). Formant extent thresholds were at least four times smaller than extents measured in the natural speech tokens, and 18 times smaller than for the diphthongized vowel /e/.  相似文献   

20.
This study examines the perception of short and long vowels in Arabic and Japanese by three groups of listeners differing in their first languages (L1): Arabic, Japanese, and Persian. While Persian uses the same alphabet as Arabic and Iranian students learn Arabic in school, the two languages are typologically unrelated. Further, unlike Arabic or Japanese, vowel length may no longer be contrastive in modern Persian. In this study, a question of interest was whether Persian listeners' foreign language learning experience or Japanese listeners' L1 phonological experience might help them to accurately process short and long vowels in Arabic. In Experiment 1, Arabic and Japanese listeners were more accurate than Persian listeners in discriminating vowel length contrasts in their own L1 only. In Experiment 2, Arabic and Japanese listeners were more accurate than Persian listeners in identifying the length categories in the "other" unknown language as well as in their own L1. The difference in the listeners' perceptual performance between the two experiments supports the view that long-term L1 representations may be invoked to a greater extent in the identification than discrimination test. The present results highlight the importance of selecting the appropriate test for assessing cross-language speech perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号