首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Vowel identification was tested in quiet, noise, and reverberation with 20 normal-hearing subjects and 20 hearing-impaired subjects. Stimuli were 15 English vowels spoken in a /b-t/context by six male talkers. Each talker produced five tokens of each vowel. In quiet, all stimuli were identified by two judges as the intended targets. The stimuli were degraded by reverberation or speech-spectrum noise. Vowel identification scores depended upon talker, listening condition, and subject type. The relationship between identification errors and spectral details of the vowels is discussed.  相似文献   

2.
The extent to which it is necessary to model the dynamic behavior of vowel formants to enable vowel separation has been the subject of debate in recent years. To investigate this issue, a study has been made on the vowels of 132 Australian English speakers (male and female). The degree of vowel separation from the formant values at the target was contrasted to that from modeling the formant contour with discrete cosine transform coefficients. The findings are that, although it is necessary to model the formant contour to separate out the diphthongs, the formant values at the target, plus vowel duration are sufficient to separate out the monophthongs. However, further analysis revealed that there are formant contour differences which benefit the within-class separation of the tense/lax monophthong pairs.  相似文献   

3.
Vowel identification in quiet, noise, and reverberation was tested with 40 subjects who varied in age and hearing level. Stimuli were 15 English vowels spoken in a (b-t) context in a carrier sentence, which were degraded by reverberation or noise (a babble of 12 voices). Vowel identification scores were correlated with various measures of hearing loss and with age. The mean of four hearing levels at 0.5, 1, 2, and 4 kHz, termed HTL4, produced the highest correlation coefficients in all three listening conditions. The correlation with age was smaller than with HTL4 and significant only for the degraded vowels. Further analyses were performed for subjects assigned to four groups on the basis of the amount of hearing loss. In noise, performance of all four groups was significantly different, whereas, in both quiet and reverberation, only the group with the greatest hearing loss performed differently from the other groups. The relationship among hearing loss, age, and number and type of errors is discussed in light of acoustic cues available for vowel identification.  相似文献   

4.
English consonant recognition in undegraded and degraded listening conditions was compared for listeners whose primary language was either Japanese or American English. There were ten subjects in each of the two groups, termed the non-native (Japanese) and the native (American) subjects, respectively. The Modified Rhyme Test was degraded either by a babble of voices (S/N = -3 dB) or by a room reverberation (reverberation time, T = 1.2 s). The Japanese subjects performed at a lower level than the American subjects in both noise and reverberation, although the performance difference in the undegraded, quiet condition was relatively small. There was no difference between the scores obtained in noise and in reverberation for either group. A limited-error analysis revealed some differences in type of errors for the groups of listeners. Implications of the results are discussed in terms of the effects of degraded listening conditions on non-native listeners' speech perception.  相似文献   

5.
Effects of noise on speech production: acoustic and perceptual analyses   总被引:4,自引:0,他引:4  
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.  相似文献   

6.
An extensive developmental acoustic study of the speech patterns of children and adults was reported by Lee and colleagues [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. This paper presents a reexamination of selected fundamental frequency and formant frequency data presented in their report for ten monophthongs by investigating sex-specific and developmental patterns using two different approaches. The first of these includes the investigation of age- and sex-specific formant frequency patterns in the monophthongs. The second, the investigation of fundamental frequency and formant frequency data using the critical band rate (bark) scale and a number of acoustic-phonetic dimensions of the monophthongs from an age- and sex-specific perspective. These acoustic-phonetic dimensions include: vowel spaces and distances from speaker centroids; frequency differences between the formant frequencies of males and females; vowel openness/closeness and frontness/backness; the degree of vocal effort; and formant frequency ranges. Both approaches reveal both age- and sex-specific development patterns which also appear to be dependent on whether vowels are peripheral or nonperipheral. The developmental emergence of these sex-specific differences are discussed with reference to anatomical, physiological, sociophonetic, and culturally determined factors. Some directions for further investigation into the age-linked sex differences in speech across the lifespan are also proposed.  相似文献   

7.
Two effects of reverberation on the identification of consonants were evaluated for ten normal-hearing subjects: (1) the overlap of energy of a preceding consonant on the following consonant, called "overlap-masking"; and (2) the internal temporal smearing of energy within each consonant, called "self-masking." The stimuli were eight consonants/p,t,k,f,m,n,l,w/. The consonants were spoken in /s-at/context (experiment 1) and generated by a speech synthesizer in /s-at/ and/-at/contexts (experiment 2). In both experiments, identification of consonants was tested in four conditions: (1) quiet, without degradations; (2) with a babble of voices; (3) with noise that was shaped like either natural or synthetic/s/ for the two experiments, respectively; and (4) with room reverberation. The results for the natural and synthetic syllables indicated that the effect of reverberation on identification of consonants following/s/ was not comparable to masking by either the /s/ -spectrum-shaped noise or the babble. In addition, the results for the synthetic syllables indicated that most of the errors in reverberation for the /s-at/context were similar to a sum of errors in two conditions: (1) with /s/-shaped noise causing overlap masking; and (2) with reverberation causing self-masking within each consonant.  相似文献   

8.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

9.
In order to investigate the group characteristics of Putonghua monophthong for-mants, the tokens of 90 female students were surveyed. The formants were measured using LPC method. The averaged values and spread of formant frequencies were given with statistical meaning. The results show the difference from the previous measurements by other researchers decades ago. For all monophthongs, F4/F3 and F5/F4 are generally around 1.4. To discriminate monophthongs, F2/F1 and F3/F2 are possibly the two new parameters besides the first three formants.  相似文献   

10.
Formant discrimination for isolated vowels presented in noise was investigated for normal-hearing listeners. Discrimination thresholds for F1 and F2, for the seven American English vowels /i, I, epsilon, ae, [symbol see text], a, u/, were measured under two types of noise, long-term speech-shaped noise (LTSS) and multitalker babble, and also under quiet listening conditions. Signal-to-noise ratios (SNR) varied from -4 to +4 dB in steps of 2 dB. All three factors, formant frequency, signal-to-noise ratio, and noise type, had significant effects on vowel formant discrimination. Significant interactions among the three factors showed that threshold-frequency functions depended on SNR and noise type. The thresholds at the lowest levels of SNR were highly elevated by a factor of about 3 compared to those in quiet. The masking functions (threshold vs SNR) were well described by a negative exponential over F1 and F2 for both LTSS and babble noise. Speech-shaped noise was a slightly more effective masker than multitalker babble, presumably reflecting small benefits (1.5 dB) due to the temporal variation of the babble.  相似文献   

11.
Perceptual distances among single tokens of American English vowels were established for nonreverberant and reverberant conditions. Fifteen vowels in the phonetic context (b-t), embedded in the sentence "Mark the (b-t) again" were recorded by a male talker. For the reverberant condition, the sentences were played through a room with a reverberation time of 1.2 s. The CVC syllables were removed from the sentences and presented in pairs to ten subjects with audiometrically normal hearing, who judged the similarity of the syllable pairs separately for the nonreverberant and reverberant conditions. The results were analyzed by multidimensional scaling procedures, which showed that the perceptual data were accounted for by a three-dimensional vowel space. Correlations were obtained between the coordinates of the vowels along each dimension and selected acoustic parameters. For both conditions, dimensions 1 and 2 were highly correlated with formant frequencies F2 and F1, respectively, and dimension 3 was correlated with the product of the duration of the vowels and the difference between F3 and F1 expressed on the Bark scale. These observations are discussed in terms of the influence of reverberation on speech perception.  相似文献   

12.
The study was designed to test the validity of the American Academy of Ophthalmology and Otolaryngology's (AAOO) 26-dB average hearing threshold level at 500, 1000, and 2000 Hz as a predictor of hearing handicap. To investigate this criterion the performance of a normal-hearing group was compared with that of two groups, categorized according to the AAOO [Trans. Am. Acad. Ophthal. Otolaryng. 63, 236-238 (1959)] guidelines as having no handicap. The latter groups, however, had significant hearing losses in the frequencies above 2000 Hz. Mean hearing threshold levels for 3000, 4000, and 6000 Hz were 54 dB for group II and 63 dB for group III. Two kinds of speech stimuli were presented at an A-weighted sound level of 60 dB in quiet and in three different levels of noise. The resulting speech recognition scores were significantly lower for the hearing-impaired groups than for the normal-hearing group on both kinds of speech materials and in all three noise conditions. Mean scores for group III were significantly lower than those of the normal-hearing group, even in the quiet condition. Speech recognition scores showed significantly better correlation with hearing levels for frequency combinations including frequencies above 2000 Hz than for the 500-, 1000-, and 2000-Hz combination. On the basis of these results the author recommends that the 26-dB fence should be somewhat lower, and that frequencies above 2000 Hz should be included in any scheme for evaluating hearing handicap.  相似文献   

13.
Two experiments investigating the effects of auditory stimulation delivered via a Nucleus multichannel cochlear implant upon vowel production in adventitiously deafened adult speakers are reported. The first experiment contrasts vowel formant frequencies produced without auditory stimulation (implant processor OFF) to those produced with auditory stimulation (processor ON). Significant shifts in second formant frequencies were observed for intermediate vowels produced without auditory stimulation; however, no significant shifts were observed for the point vowels. Higher first formant frequencies occurred in five of eight vowels when the processor was turned ON versus OFF. A second experiment contrasted productions of the word "head" produced with a FULL map, OFF condition, and a SINGLE channel condition that restricted the amount of auditory information received by the subjects. This experiment revealed significant shifts in second formant frequencies between FULL map utterances and the other conditions. No significant differences in second formant frequencies were observed between SINGLE channel and OFF conditions. These data suggest auditory feedback information may be used to adjust the articulation of some speech sounds.  相似文献   

14.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

15.
Because laboratory studies are conducted in optimal listening conditions, often with highly stylized stimuli that attenuate or eliminate some naturally occurring cues, results may have constrained applicability to the "real world." Such studies show that English-speaking adults weight vocalic duration greatly and formant offsets slightly in voicing decisions for word-final obstruents. Using more natural stimuli, Nittrouer [J. Acoust. Soc. Am. 115, 1777-1790 (2004)] found different results, raising questions about what would happen if experimental conditions were even more like the real world. In this study noise was used to simulate the real world. Edited natural words with voiced and voiceless final stops were presented in quiet and noise to adults and children (4 to 8 years) for labeling. Hypotheses tested were (1) Adults (and perhaps older children) would weight vocalic duration more in noise than in quiet; (2) Previously reported age-related differences in cue weighting might not be found in this real-world simulation; and (3) Children would experience greater masking than adults. Results showed: (1) no increase for any age listeners in the weighting of vocalic duration in noise; (2) age-related differences in the weighting of cues in both quiet and noise; and (3) masking effects for all listeners, but more so for children than adults.  相似文献   

16.
Two experiments investigated the effect of reverberation on listeners' ability to perceptually segregate two competing voices. Culling et al. [Speech Commun. 14, 71-96 (1994)] found that for competing synthetic vowels, masked identification thresholds were increased by reverberation only when combined with modulation of fundamental frequency (F0). The present investigation extended this finding to running speech. Speech reception thresholds (SRTs) were measured for a male voice against a single interfering female voice within a virtual room with controlled reverberation. The two voices were either (1) co-located in virtual space at 0 degrees azimuth or (2) separately located at +/-60 degrees azimuth. In experiment 1, target and interfering voices were either normally intonated or resynthesized with a fixed F0. In anechoic conditions, SRTs were lower for normally intonated and for spatially separated sources, while, in reverberant conditions, the SRTs were all the same. In experiment 2, additional conditions employed inverted F0 contours. Inverted F0 contours yielded higher SRTs in all conditions, regardless of reverberation. The results suggest that reverberation can seriously impair listeners' ability to exploit differences in F0 and spatial location between competing voices. The levels of reverberation employed had no effect on speech intelligibility in quiet.  相似文献   

17.
Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component.  相似文献   

18.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

19.
Speech intelligibility studies in classrooms   总被引:2,自引:0,他引:2  
Speech intelligibility tests and acoustical measurements were made in ten occupied classrooms. Octave-band measurements of background noise levels, early decay times, and reverberation times, as well as various early/late sound ratios, and the center time were obtained. Various octave-band useful/detrimental ratios were calculated along with the speech transmission index. The interrelationships of these measures were considered to evaluate which were most appropriate in classrooms, and the best predictors of speech intelligibility scores were identified. From these results ideal design goals for acoustical conditions for classrooms were determined either in terms of the 50-ms useful/detrimental ratios or from combinations of the reverberation time and background noise level.  相似文献   

20.
Listeners were given the task to identify the stop-consonant [t] in the test-word "stir" when the word was embedded in a carrier sentence. Reverberation was added to the test-word, but not to the carrier, and the ability to identify the [t] decreased because the amplitude modulations associated with the [t] were smeared. When a similar amount of reverberation was also added to the carrier sentence, the listeners' ability to identify the stop-consonant was restored. This phenomenon has in previous research been considered as evidence for an extrinsic compensation mechanism for reverberation in the human auditory system [Watkins (2005). J. Acoust. Soc. Am. 118, 249-262]. In the present study, the reverberant test-word was embedded in additional non-reverberant carriers, such as white noise, speech-shaped noise and amplitude modulated noise. In addition, a reference condition was included where the test-word was presented in isolation, i.e., without any carrier stimulus. In all of these conditions, the ability to identify the stop-consonant [t] was enhanced relative to the condition using the non-reverberant speech carrier. The results suggest that the non-reverberant speech carrier produces an interference effect that impedes the identification of the stop-consonant. These findings raise doubts about the existence of the compensation mechanism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号