首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 127 毫秒
1.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

2.
Speech coding in the auditory nerve: III. Voiceless fricative consonants   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers in anesthetized cats were recorded for synthetic voiceless fricative consonants. The four stimuli (/x/, /s/, /s/, and /f/) were presented at two levels corresponding to speech in which the levels of the vowels would be approximately 60 and 75 dB SPL, respectively. Discharge patterns were characterized in terms of PST histograms and their power spectra. For both stimulus levels, frequency regions in which the stimuli had considerable energy corresponded well with characteristic-frequency (CF) regions in which average discharge rates were the highest. At the higher level, the profiles of discharge rate against CF were more distinctive for the stimulus onset than for the central portion. Power spectra of PST histograms had large response components near fiber characteristic frequencies for CFs up to 3-4 kHz, as well as low-frequency components for all fibers. The relative amplitudes of these components varied for the different stimuli. In general, the formant frequencies of the fricatives did not correspond with the largest response components, except for formants below about 3 kHz. Processing schemes based on fine time patterns of discharge that were effective for vowel stimuli generally failed to extract the formant frequencies of fricatives.  相似文献   

3.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

4.
Several processing schemes by which phonetically important information for vowels can be extracted from responses of auditory-nerve fibers are analyzed. The schemes are based on power spectra of period histograms obtained in response to a set of nine two-formant, steady-state, vowel-like stimuli presented at 60 and 75 dB SPL. One class of "local filtering" schemes, which was originally proposed by Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)], consists of analyzing response patterns by filters centered at the characteristic frequencies (CF) of the fibers, so that a tonotopically arranged measure of synchronized response can be obtained. Various schemes in this class differ in the characteristics of the filter. For a wide range of filter bandwidths, formant frequencies correspond approximately to the CFs for which the response measure is maximal. If in addition, the bandwidths of the analyzing filters are made compatible with psychophysical measures of frequency selectivity, low-frequency harmonics of the stimulus fundamental are resolved in the output profile, so that fundamental frequency can also be estimated. In a second class of processing schemes, a dominant response component is defined for each fiber from a 1/6 octave spectral representation of the response pattern, and the formant frequencies are estimated from the most frequent values of the dominant component in the ensemble of auditory-nerve fibers. The local filtering schemes and the dominant component schemes can be related to "place" and "periodicity" models of auditory processing, respectively.  相似文献   

5.
Vowel matching and identification experiments were carried out to investigate the perceptual contribution of harmonics in the first formant region of synthetic front vowels. In the first experiment, listeners selected the best phonetic match from an F1 continuum, for reference stimuli in which a band of two to five adjacent harmonics of equal intensity replaced the F1 peak; F1 values of best matches were near the frequency of the highest frequency harmonic in the band. Attenuation of the highest harmonic in the band resulted in lower F1 matches. Attenuation of the lowest harmonic had no significant effects, except in the case of a 2-harmonic band, where higher F1 matches were selected. A second experiment investigated the shifts in matched F1 resulting from an intensity increment to either one of a pair of harmonics in the F1 region. These shifts were relatively invariant over different harmonic frequencies and proportional to the fundamental frequency. A third experiment used a vowel identification task to determine phoneme boundaries on an F1 continuum. These boundaries were not substantially altered when the stimuli comprised only the two most prominent harmonics in the F1 region, or these plus either the higher or lower frequency subset of the remaining F1 harmonics. The results are consistent with an estimation procedure for the F1 peak which assigns greatest weight to the two most prominent harmonics in the first formant region.  相似文献   

6.
Formant frequencies in an old Estonian folk song performed by two female voices were estimated for two back vowels /a/ and /u/, and for two front vowels /e/ and /i/. Comparison of these estimates with formant frequencies in spoken Estonian vowels indicates a trend of the vowels to be clustered into two sets of front and back ones in the F1/F2 plane. Similar clustering has previously been shown to occur in opera and choir singing, especially with increasing fundamental frequency. The clustering in the present song, however, may also be due to a tendency for a mid vowel to be realized as a higher-beginning diphthong, which is characteristic of the North-Estonian coastal dialect area where the singers come from. No evidence of a "singer's formant" was found.  相似文献   

7.
Psychophysical results using double vowels imply that subjects are able to use the temporal aspects of neural discharge patterns. To investigate the possible temporal cues available, the responses of fibers in the cochlear nerve of the anesthetized guinea pig to synthetic vowels were recorded at a range of sound levels up to 95 dB SPL. The stimuli were the single vowels /i/ [fundamental frequency (f0) 125 Hz], /a/ (f0, 100 Hz), and /c/ (f0, 100 Hz) and the double vowels were /a(100),i(125)/ and /c(100),i(125)/. Histograms synchronized to the period of the double vowels were constructed, and locking of the discharge to individual harmonics was estimated from them by Fourier transformation. One possible cue for identifying the f0's of the constituents of a double vowel is modulation of the neural discharge with a period of 1/f0. Such modulation was found at frequencies between the formant peaks of the double vowel, with modulation at the periods of 100 and 125 Hz occurring at different places in the fiber array. Generation of a population response based on synchronized responses [average localized synchronized rate (ALSR): see Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)] allowed estimation of the f0's by a variety of methods and subsampling the population response at the harmonics of the f0 of the constituent vowel achieved a good reconstruction of its spectrum. Other analyses using interval histograms and autocorrelation, which overcome some problems associated with the ALSR approach, also allowed f0 identification and vowel segregation. The present study has demonstrated unequivocally that the timing of the impulses in auditory-nerve fibers provides copious possible cues for the identification of the fundamental frequencies and spectra associated with each of the constituents of double vowels.  相似文献   

8.
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.  相似文献   

9.
The four experiments reported here measure listeners' accuracy and consistency in adjusting a formant frequency of one- or two-formant complex sounds to match the timbre of a target sound. By presenting the target and the adjustable sound on different fundamental frequencies, listeners are prevented from performing the task by comparing the absolute or relative levels of resolved spectral components. Experiment 1 uses two-formant vowellike sounds. When the two sounds have the same F0, the variability of matches (within-subject standard deviation) for either the first or the second formant is around 1%-3%, which is comparable to existing data on formant frequency discrimination thresholds. With a difference in F0, variability increases to around 8% for first-formant matches, but to only about 4% for second-formant matches. Experiment 2 uses sounds with a single formant at 1100 or 1200 Hz with both sounds on either low or high fundamental frequencies. The increase in variability produced by a difference in F0 is greater for high F0's (where the harmonics close to the formant peak are resolved) than it is for low F0's (where they are unresolved). Listeners also showed systematic errors in their mean matches to sounds with different high F0's. The direction of the systematic errors was towards the most intense harmonic. Experiments 3 and 4 showed that introduction of a vibratolike frequency modulation (FM) on F0 reduces the variability of matches, but does not reduce the systematic error. The experiments demonstrate, for the specific frequencies and FM used, that there is a perceptual cost to interpolating a spectral envelope across resolved harmonics.  相似文献   

10.
The sound level of the singer's formant in professional singing   总被引:2,自引:0,他引:2  
The relative sound level of the "singer's formant," measured in a 1/3-oct band with a center frequency of 2.5 kHz for males and of 3.16 kHz for females, has been investigated for 14 professional singers, nine different modes of singing, nine different vowels, variations in overall sound-pressure level, and fundamental frequencies ranging from 98 up to 880 Hz. Variation in the sound level of the singer's formant due to differences among male singers was small (4 dB), the factors vowels (16 dB) and fundamental frequency (9-14 dB) had an intermediate effect, while the largest variation was found for differences among female singers (24 dB), between modes of singing (vocal effort) (23 dB), and in overall sound-pressure level (more than 30 dB). In spite of this great potential variability, for each mode of singing the sound level of the singer's formant was remarkably constant up to F0 = 392 Hz, due to adaptation of vocal effort. This may be explained as the result of the perceptual demand of a constant voice quality. The definition of the singer's formant is discussed.  相似文献   

11.
12.
Auditory-nerve fiber spike trains were recorded in response to spoken English stop consonant-vowel syllables, both voiced (/b,d,g/) and unvoiced (/p,t,k/), in the initial position of syllables with the vowels /i,a,u/. Temporal properties of the neural responses and stimulus spectra are displayed in a spectrographic format. The responses were categorized in terms of the fibers' characteristic frequencies (CF) and spontaneous rates (SR). High-CF, high-SR fibers generally synchronize to formants throughout the syllables. High-CF, low/medium-SR fibers may also synchronize to formants; however, during the voicing, there may be sufficient low-frequency energy present to suppress a fiber's synchronized response to a formant near its CF. Low-CF fibers, from both SR groups, synchronize to energy associated with voicing. Several proposed acoustic correlates to perceptual features of stop consonant-vowel syllables, including the initial spectrum, formant transitions, and voice-onset time, are represented in the temporal properties of auditory-nerve fiber responses. Nonlinear suppression affects the temporal features of the responses, particularly those of low/medium-spontaneous-rate fibers.  相似文献   

13.
Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males).  相似文献   

14.
Phase-locked discharge patterns of single cat auditory-nerve fibers were analyzed in response to complex tones centered at fiber characteristic frequency (CF). Signals were octave-bandwidth harmonic complexes defined by a center frequency F and an intercomponent spacing factor N, such that F/N was the fundamental frequency. Parameters that were manipulated included the phase spectrum, the number of components, and the intensity of the center component. Analyses employed Fourier transforms of period histograms to assess the degree to which responses were synchronized to the frequencies present in the acoustic stimulus. Several nonlinearities were observed in the response as intensity was varied between threshold and 80-90 dB SPL. Response nonlinearities were strong for all signals except those with random phase spectra. The most commonly observed nonlinearity was an emphasis of one or more stimulus components in the response. The degree of nonlinearity usually increased with intensity and signal complexity and decreased with fiber frequency selectivity. Half-wave rectification introduced synchronization to the missing fundamental. The strength of the response at the fundamental was related to stimulus crest factor. Signals with low center frequencies and high crest factors often elicited instantaneous discharge rates at the theoretical maximum of pi CF. This suggests that the probability of spike generation approaches one during high-amplitude waveform segments. Response nonlinearity was interpreted as arising from three sources, namely, cochlear mechanics, compression of instantaneous discharge rate, and saturation of average discharge rate. At near-threshold intensities, fibers with high spontaneous rates exhibited responses that were linear functions of stimulus waveshape, whereas fibers with low spontaneous spike rates produced responses that were best described in terms of an expansive nonlinearity.  相似文献   

15.
We have recorded the responses of fibers in the cochlear nerve and cells in the cochlear nucleus of the anesthetized guinea pig to synthetic vowels [i], [a], and [u] at 60 and 80 dB SPL. Histograms synchronized to the pitch period of the vowel were constructed, and locking of the discharge to individual harmonics was estimated from these by Fourier transformation. In cochlear nerve fibers from the guinea pig, the responses were similar in all respects to those previously described for the cat. In particular, the average-localized-synchronized-rate functions (ALSR), computed from pooled data, had well-defined peaks corresponding to the formant frequencies of the three vowels at both sound levels. Analysis of the components dominating the discharge could also be used to determine the voice pitch and the frequency of the first formants. We have computed similar population measures over a sample of primarylike cochlear nucleus neurons. In these primarylike cochlear nucleus cell responses, the locking to the higher-frequency formants of the vowels is weaker than in the nerve. This results in a severe degradation of the peaks in the ALSR function at the second and third formant frequencies at least for [i] and [u]. This result is somewhat surprising in light of the reports that primarylike cochlear nucleus cells phaselock, as well as do cochlear nerve fibers.  相似文献   

16.
Responses of large populations of auditory-nerve fibers to synthesized steady-state vowels were recorded in anesthetized cats. Driven discharge rate to vowels, normalized by dividing by saturation rate (estimated from the driven rate to CF tones 50 dB above threshold), was plotted versus fiber CF for a number of vowel levels. For the vowels /I/ and /e/, such rate profiles showed a peak in the region of the first formant and another in the region of the second and third formants, for sound levels below about 70 dB SPL. For /a/ at levels below about 40 dB SPL there are peaks in the region of the first and second formants. At higher levels these peaks disappear for all the vowels because of a combination of rate saturation and two-tone suppression. This must be qualified by saying that rate profiles plotted separately for units with spontaneous rates less than one spike per second may retain peaks at higher levels. Rate versus level functions for units with CFs above the first formant can saturate at rates less than the saturation rate to CF to-es or they can be nonmonotonic; these effects are most likely produced by the same mechanism as that involved in two-tone suppression.  相似文献   

17.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

18.
The formant frequencies of Malaysian Malay children have not been well studied. This article investigates the first four formant frequencies of sustained vowels in 360 Malay children aged between 7 and 12 years using acoustical analysis. Generally, Malay female children had higher formant frequencies than those of their male counterparts. However, no significant differences in all four formant frequencies were observed between the Malay male and female children in most of the vowels and age groups. Significant differences in all formant frequencies were found across the Malay vowels in both Malay male and female children for all age groups except for F4 in female children aged 12 years. Generally, the Malaysian Malay children showed a nonsystematic decrement in formant frequencies with age. Low levels of significant differences in formant frequencies were observed across the age groups in most of the vowels for F1, F3, and F4 in Malay male children and F1 and F4 in Malay female children.  相似文献   

19.
Four multiple-channel cochlear implant patients were tested with synthesized versions of the words "hid, head, had, hud, hod, hood" containing 1, 2, or 3 formants, and with a natural 2-formant version of the same words. The formant frequencies were encoded in terms of the positions of electrical stimulation in the cochlea. Loudness, duration, and fundamental frequency were kept fixed within the synthetic stimulus sets. The average recognition scores were 47%, 61%, 62%, and 79% for the synthesized 1-, 2-, and 3-format vowels and the natural vowels, respectively. These scores showed that the place coding of the first and second formant frequencies accounted for a large part of the vowel recognition of cochlear implant patients using these coding schemes. The recognition of the natural stimuli was significantly higher than recognition of the synthetic stimuli, indicating that extra cues such as loudness, duration, and fundamental frequency contributed to recognition of the spoken words.  相似文献   

20.
The temporal fine structure of discharge patterns of single auditory-nerve fibers in adult cats was analyzed in response to signals consisting of a variable number of equal-intensity, in-phase harmonics of a common low-frequency fundamental. Two analytic methods were employed. The first method considered Fourier spectra of period histograms based on the period of the fundamental, and the second method considered Fourier spectra of interspike interval histograms (ISIH's). Both analyses provide information about fiber tuning properties, but Fourier spectra of ISIH's also allow estimates to be made of the degree of resolution of individual stimulus components. At low intensities (within 20-40 dB of threshold), indices of synchronization to individual components of complex tones were similar to those obtained for pure tones. This was true even when fibers were capable of responding to several signal components simultaneously. Response spectra obtained at low intensities resembled fibers' tuning curves, and fibers with low spontaneous discharge rates tended to provide better resolution of stimulus components than fibers with high spontaneous rates. Strongly nonlinear behavior existed at higher stimulus intensities. In this, information was transmitted about progressively fewer signal components and about frequencies not present in the acoustic stimulus, and the component eliciting the largest response shifted away from the fiber's characteristic frequency and toward the edges of the stimulus spectrum. This high-intensity "edge enhancement" can result from the combined effects of a compressive input-output nonlinearity, suppression, and the fortuitous addition of internally generated combination tones. The data indicate that sufficient information exists for the auditory system to determine the frequencies of narrowly spaced stimulus components from the temporal fine structure of nerve fiber's responses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号