首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Speech coding in the auditory nerve: III. Voiceless fricative consonants   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers in anesthetized cats were recorded for synthetic voiceless fricative consonants. The four stimuli (/x/, /s/, /s/, and /f/) were presented at two levels corresponding to speech in which the levels of the vowels would be approximately 60 and 75 dB SPL, respectively. Discharge patterns were characterized in terms of PST histograms and their power spectra. For both stimulus levels, frequency regions in which the stimuli had considerable energy corresponded well with characteristic-frequency (CF) regions in which average discharge rates were the highest. At the higher level, the profiles of discharge rate against CF were more distinctive for the stimulus onset than for the central portion. Power spectra of PST histograms had large response components near fiber characteristic frequencies for CFs up to 3-4 kHz, as well as low-frequency components for all fibers. The relative amplitudes of these components varied for the different stimuli. In general, the formant frequencies of the fricatives did not correspond with the largest response components, except for formants below about 3 kHz. Processing schemes based on fine time patterns of discharge that were effective for vowel stimuli generally failed to extract the formant frequencies of fricatives.  相似文献   

3.
Discharge patterns of auditory-nerve fibers in anesthetized cats were obtained for two stimulus levels in response to synthetic stimuli with dynamic characteristics appropriate for selected consonants. A set of stimuli was constructed by preceding a signal that was identified as /da/by another sound that was systematically manipulated so that the entire complex would sound like either /da/, /ada/, /na/, /sa/, /sa/, or others. Discharge rates of auditory-nerve fibers in response to the common /da/-like formant transitions depended on the preceding context. Average discharge rates during these transitions decreased most for fibers whose CFs were in frequency regions where the context had considerable energy. Some effect of the preceding context on fine time patterns of response to the transitions was also found, but the identity of the largest response components (which often corresponded to the formant frequencies) was in general unaffected. Thus the response patterns during the formant transitions contain cues about both the nature of the transitions and the preceding context. A second set of stimuli sounding like /s/ and /c/ was obtained by varying the duration of the rise in amplitude at the onset of a filtered noise burst. At both 45 and 60 dB SPL, there were fibers which showed a more prominent peak in discharge rate at stimulus onset for /c/ than for /s/, but the CF regions that reflected the clearest distinctions depended on stimulus level. The peaks in discharge rate that occur in response to rapid changes in amplitude or spectrum might be used by the central processor as pointers to portions of speech signals that are rich in phonetic information.  相似文献   

4.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

5.
6.
Psychophysical results using double vowels imply that subjects are able to use the temporal aspects of neural discharge patterns. To investigate the possible temporal cues available, the responses of fibers in the cochlear nerve of the anesthetized guinea pig to synthetic vowels were recorded at a range of sound levels up to 95 dB SPL. The stimuli were the single vowels /i/ [fundamental frequency (f0) 125 Hz], /a/ (f0, 100 Hz), and /c/ (f0, 100 Hz) and the double vowels were /a(100),i(125)/ and /c(100),i(125)/. Histograms synchronized to the period of the double vowels were constructed, and locking of the discharge to individual harmonics was estimated from them by Fourier transformation. One possible cue for identifying the f0's of the constituents of a double vowel is modulation of the neural discharge with a period of 1/f0. Such modulation was found at frequencies between the formant peaks of the double vowel, with modulation at the periods of 100 and 125 Hz occurring at different places in the fiber array. Generation of a population response based on synchronized responses [average localized synchronized rate (ALSR): see Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)] allowed estimation of the f0's by a variety of methods and subsampling the population response at the harmonics of the f0 of the constituent vowel achieved a good reconstruction of its spectrum. Other analyses using interval histograms and autocorrelation, which overcome some problems associated with the ALSR approach, also allowed f0 identification and vowel segregation. The present study has demonstrated unequivocally that the timing of the impulses in auditory-nerve fibers provides copious possible cues for the identification of the fundamental frequencies and spectra associated with each of the constituents of double vowels.  相似文献   

7.
Responses of large populations of auditory-nerve fibers to synthesized steady-state vowels were recorded in anesthetized cats. Driven discharge rate to vowels, normalized by dividing by saturation rate (estimated from the driven rate to CF tones 50 dB above threshold), was plotted versus fiber CF for a number of vowel levels. For the vowels /I/ and /e/, such rate profiles showed a peak in the region of the first formant and another in the region of the second and third formants, for sound levels below about 70 dB SPL. For /a/ at levels below about 40 dB SPL there are peaks in the region of the first and second formants. At higher levels these peaks disappear for all the vowels because of a combination of rate saturation and two-tone suppression. This must be qualified by saying that rate profiles plotted separately for units with spontaneous rates less than one spike per second may retain peaks at higher levels. Rate versus level functions for units with CFs above the first formant can saturate at rates less than the saturation rate to CF to-es or they can be nonmonotonic; these effects are most likely produced by the same mechanism as that involved in two-tone suppression.  相似文献   

8.
Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males).  相似文献   

9.
Several processing schemes by which phonetically important information for vowels can be extracted from responses of auditory-nerve fibers are analyzed. The schemes are based on power spectra of period histograms obtained in response to a set of nine two-formant, steady-state, vowel-like stimuli presented at 60 and 75 dB SPL. One class of "local filtering" schemes, which was originally proposed by Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)], consists of analyzing response patterns by filters centered at the characteristic frequencies (CF) of the fibers, so that a tonotopically arranged measure of synchronized response can be obtained. Various schemes in this class differ in the characteristics of the filter. For a wide range of filter bandwidths, formant frequencies correspond approximately to the CFs for which the response measure is maximal. If in addition, the bandwidths of the analyzing filters are made compatible with psychophysical measures of frequency selectivity, low-frequency harmonics of the stimulus fundamental are resolved in the output profile, so that fundamental frequency can also be estimated. In a second class of processing schemes, a dominant response component is defined for each fiber from a 1/6 octave spectral representation of the response pattern, and the formant frequencies are estimated from the most frequent values of the dominant component in the ensemble of auditory-nerve fibers. The local filtering schemes and the dominant component schemes can be related to "place" and "periodicity" models of auditory processing, respectively.  相似文献   

10.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

11.
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.  相似文献   

12.
We have recorded the responses of fibers in the cochlear nerve and cells in the cochlear nucleus of the anesthetized guinea pig to synthetic vowels [i], [a], and [u] at 60 and 80 dB SPL. Histograms synchronized to the pitch period of the vowel were constructed, and locking of the discharge to individual harmonics was estimated from these by Fourier transformation. In cochlear nerve fibers from the guinea pig, the responses were similar in all respects to those previously described for the cat. In particular, the average-localized-synchronized-rate functions (ALSR), computed from pooled data, had well-defined peaks corresponding to the formant frequencies of the three vowels at both sound levels. Analysis of the components dominating the discharge could also be used to determine the voice pitch and the frequency of the first formants. We have computed similar population measures over a sample of primarylike cochlear nucleus neurons. In these primarylike cochlear nucleus cell responses, the locking to the higher-frequency formants of the vowels is weaker than in the nerve. This results in a severe degradation of the peaks in the ALSR function at the second and third formant frequencies at least for [i] and [u]. This result is somewhat surprising in light of the reports that primarylike cochlear nucleus cells phaselock, as well as do cochlear nerve fibers.  相似文献   

13.
Formant frequencies in an old Estonian folk song performed by two female voices were estimated for two back vowels /a/ and /u/, and for two front vowels /e/ and /i/. Comparison of these estimates with formant frequencies in spoken Estonian vowels indicates a trend of the vowels to be clustered into two sets of front and back ones in the F1/F2 plane. Similar clustering has previously been shown to occur in opera and choir singing, especially with increasing fundamental frequency. The clustering in the present song, however, may also be due to a tendency for a mid vowel to be realized as a higher-beginning diphthong, which is characteristic of the North-Estonian coastal dialect area where the singers come from. No evidence of a "singer's formant" was found.  相似文献   

14.
Formant discrimination for isolated vowels presented in noise was investigated for normal-hearing listeners. Discrimination thresholds for F1 and F2, for the seven American English vowels /i, I, epsilon, ae, [symbol see text], a, u/, were measured under two types of noise, long-term speech-shaped noise (LTSS) and multitalker babble, and also under quiet listening conditions. Signal-to-noise ratios (SNR) varied from -4 to +4 dB in steps of 2 dB. All three factors, formant frequency, signal-to-noise ratio, and noise type, had significant effects on vowel formant discrimination. Significant interactions among the three factors showed that threshold-frequency functions depended on SNR and noise type. The thresholds at the lowest levels of SNR were highly elevated by a factor of about 3 compared to those in quiet. The masking functions (threshold vs SNR) were well described by a negative exponential over F1 and F2 for both LTSS and babble noise. Speech-shaped noise was a slightly more effective masker than multitalker babble, presumably reflecting small benefits (1.5 dB) due to the temporal variation of the babble.  相似文献   

15.
An analysis is presented of regional variation patterns in the vowel system of Standard Dutch as spoken in the Netherlands (Northern Standard Dutch) and Flanders (Southern Standard Dutch). The speech material consisted of read monosyllabic utterances in a neutral consonantal context (i.e., /sVs/). The analyses were based on measurements of the duration and the frequencies of the first two formants of the vowel tokens. Recordings were made for 80 Dutch and 80 Flemish speakers, who were stratified for the social factors gender and region. These 160 speakers were distributed across four regions in the Netherlands and four regions in Flanders. Differences between regional varieties were found for duration, steady-state formant frequencies, and spectral change of formant frequencies. Variation patterns in the spectral characteristics of the long mid vowels /e o ?/ and the diphthongal vowels /ei oey bacwards c u/ were in accordance with a recent theory of pronunciation change in Standard Dutch. Finally, it was found that regional information was present in the steady-state formant frequency measurements of vowels produced by professional language users.  相似文献   

16.

Background  

The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain.  相似文献   

17.
The purpose of this study was to examine the role of formant frequency movements in vowel recognition. Measurements of vowel duration, fundamental frequency, and formant contours were taken from a database of acoustic measurements of 1668 /hVd/ utterances spoken by 45 men, 48 women, and 46 children [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. A 300-utterance subset was selected from this database, representing equal numbers of 12 vowels and approximately equal numbers of tokens produced by men, women, and children. Listeners were asked to identify the original, naturally produced signals and two formant-synthesized versions. One set of "original formant" (OF) synthetic signals was generated using the measured formant contours, and a second set of "flat formant" (FF) signals was synthesized with formant frequencies fixed at the values measured at the steadiest portion of the vowel. Results included: (a) the OF synthetic signals were identified with substantially greater accuracy than the FF signals; and (b) the naturally produced signals were identified with greater accuracy than the OF synthetic signals. Pattern recognition results showed that a simple approach to vowel specification based on duration, steady-state F0, and formant frequency measurements at 20% and 80% of vowel duration accounts for much but by no means all of the variation in listeners' labeling of the three types of stimuli.  相似文献   

18.
Previous studies suggest that speakers are systematically inaccurate, or biased, when imitating self-produced vowels. The direction of these biases in formant space and their variation may offer clues about the organization of the vowel perceptual space. To examine these patterns, three male speakers were asked to imitate 45 self-produced vowels that were systematically distributed in F1/F2 space. All three speakers showed imitation bias, and the bias magnitudes were significantly larger than those predicted by a model of articulatory noise. Each speaker showed a different pattern of bias directions, but the pattern was unrelated to the locations of prototypical vowels produced by that speaker. However, there were substantial quantitative regularities: (1) The distribution of imitation variability and bias magnitudes were similar for all speakers, (2) the imitation variability was independent of the bias magnitudes, and (3) the imitation variability (a production measure) was commensurate with the formant discrimination limen (a perceptual measure). These results indicate that there is additive Gaussian noise in the imitation process that independently affects each formant and that there are speaker-dependent and potentially nonlinguistic biases in vowel perception and production.  相似文献   

19.
This study examined whether individuals with a wide range of first-language vowel systems (Spanish, French, German, and Norwegian) differ fundamentally in the cues that they use when they learn the English vowel system (e.g., formant movement and duration). All subjects: (1) identified natural English vowels in quiet; (2) identified English vowels in noise that had been signal processed to flatten formant movement or equate duration; (3) perceptually mapped best exemplars for first- and second-language synthetic vowels in a five-dimensional vowel space that included formant movement and duration; and (4) rated how natural English vowels assimilated into their L1 vowel categories. The results demonstrated that individuals with larger and more complex first-language vowel systems (German and Norwegian) were more accurate at recognizing English vowels than were individuals with smaller first-language systems (Spanish and French). However, there were no fundamental differences in what these individuals learned. That is, all groups used formant movement and duration to recognize English vowels, and learned new aspects of the English vowel system rather than simply assimilating vowels into existing first-language categories. The results suggest that there is a surprising degree of uniformity in the ways that individuals with different language backgrounds perceive second language vowels.  相似文献   

20.
Formant dynamics in vowel nuclei contribute to vowel classification in English. This study examined listeners' ability to discriminate dynamic second formant transitions in synthetic high front vowels. Acoustic measurements were made from the nuclei (steady state and 20% and 80% of vowel duration) for the vowels /i, I, e, epsilon, ae/ spoken by a female in /bVd/ context. Three synthesis parameters were selected to yield twelve discrimination conditions: initial frequency value for F2 (2525, 2272, or 2068 Hz), slope direction (rising or falling), and duration (110 or 165 ms). F1 frequency was roved. In the standard stimuli, F0 and F1-F4 were steady state. In the comparison stimuli only F2 frequency varied linearly to reach a final frequency. Five listeners were tested under adaptive tracking to estimate the threshold for frequency extent, the minimal detectable difference in frequency between the initial and final F2 values, called deltaF extent. Analysis showed that initial F2 frequency and direction of movement for some F2 frequencies contributed to significant differences in deltaF extent. Results suggested that listeners attended to differences in the stimulus property of frequency extent (hertz), not formant slope (hertz/second). Formant extent thresholds were at least four times smaller than extents measured in the natural speech tokens, and 18 times smaller than for the diphthongized vowel /e/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号