首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The responses of populations of auditory-nerve fibers to both a 1.0-kHz tone, and 1.0-kHz tone in broadband noise, have been measured. Period histograms were generated from fiber spike trains and discrete Fourier transforms (DFTs) with a resolution of 125 Hz were computed from each histogram. Sample mean and sample variance statistics were generated for period histograms of response and for temporal response measures derived from discrete Fourier transforms. It is demonstrated how the statistical properties of auditory-nerve fiber response determine the strategy for the estimation and discrimination of particular stimulus components. When the tone is presented alone, the entire population of auditory-nerve fibers provides statistically reliable estimates of the 1.0-kHz tone. Upon addition of the broadband noise stimulus only those units with characteristic frequencies which are close in frequency to the 1.0-kHz stimulus provide spectral estimates which have high signal-to-noise ratios (mean-squared-to-variance ratios). Estimates of the 1.0-kHz-tone stimulus derived from auditory-nerve fibers with characteristic frequencies which are far from the 1.0-kHz stimulus are statistically unreliable. Based on the responses of the population of auditory-nerve fibers, the strategy for estimating the 1.0-kHz-tone stimulus is to derive estimates of the 1.0-kHz stimulus from the subpopulation of neurons with characteristic frequencies close to the 1.0-kHz stimulus. It is concluded that neurons which are tuned close to 1.0-kHz provide the central nervous system (CNS) with the most salient information about the 1.0-kHz stimulus in the presence of the broadband background.  相似文献   

2.
Vowels are mainly classified by the positions of peaks in their frequency spectra, the formants. For normal-hearing subjects, change detection and direction discrimination were measured for linear glides in the center frequency (CF) of formantlike sounds. A CF rove was used to prevent subjects from using either the start or end points of the glides as cues. In addition, change detection and starting-phase (start-direction) discrimination were measured for similar stimuli with a sinusoidal 5-Hz formant-frequency modulation. The stimuli consisted of single formants generated using a number of different stimulus parameters including fundamental frequency, spectral slope, frequency region, and position of the formant relative to the harmonic spectrum. The change detection thresholds were in good agreement with the predictions of a model which analyzed and combined the effects of place-of-excitation and temporal cues. For most stimuli, thresholds were approximately equal for change detection and start-direction discrimination. Exceptions were found for stimuli that consisted of only one or two harmonics. In a separate experiment, it was shown that change detection and start-direction discrimination of linear and sinusoidal formant-frequency modulations were impaired by off-frequency frequency-modulated interferers. This frequency modulation detection interference was larger for formants with shallow than for those with steep spectral slopes.  相似文献   

3.
The perceptual significance of the cochlear amplifier was evaluated by predicting level-discrimination performance based on stochastic auditory-nerve (AN) activity. Performance was calculated for three models of processing: the optimal all-information processor (based on discharge times), the optimal rate-place processor (based on discharge counts), and a monaural coincidence-based processor that uses a non-optimal combination of rate and temporal information. An analytical AN model included compressive magnitude and level-dependent-phase responses associated with the cochlear amplifier, and high-, medium-, and low-spontaneous-rate (SR) fibers with characteristic frequencies (CFs) spanning the AN population. The relative contributions of nonlinear magnitude and nonlinear phase responses to level encoding were compared by using four versions of the model, which included and excluded the nonlinear gain and phase responses in all possible combinations. Nonlinear basilar-membrane (BM) phase responses are robustly encoded in near-CF AN fibers at low frequencies. Strongly compressive BM responses at high frequencies near CF interact with the high thresholds of low-SR AN fibers to produce large dynamic ranges. Coincidence performance based on a narrow range of AN CFs was robust across a wide dynamic range at both low and high frequencies, and matched human performance levels. Coincidence performance based on all CFs demonstrated the "near-miss" to Weber's law at low frequencies and the high-frequency "mid-level bump." Monaural coincidence detection is a physiologically realistic mechanism that is extremely general in that it can utilize AN information (average-rate, synchrony, and nonlinear-phase cues) from all SR groups.  相似文献   

4.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

5.
Responses of chinchilla auditory-nerve fibers to synthesized stop consonants differing in voice onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and with chinchilla subjects. Average discharge rates of neurons tuned to the frequency region near the first formant generally increased at the onset of voicing, for VOTs longer than 20 ms. These rate increases were closely related to spectral amplitude changes associated with the onset of voicing and with the activation of the first formant; as a result, they provided accurate information about VOT. Neurons tuned to frequency regions near the second and third formants did not encode VOT in their average discharge rates. Modulations in the average rates of these neurons reflected spectral variations that were independent of VOT. The results are compared to other measurements of the peripheral encoding of speech sounds and to psychophysical observations suggesting that syllables with large variations in VOT are heard as belonging to one of only two phonemic categories.  相似文献   

6.
Several processing schemes by which phonetically important information for vowels can be extracted from responses of auditory-nerve fibers are analyzed. The schemes are based on power spectra of period histograms obtained in response to a set of nine two-formant, steady-state, vowel-like stimuli presented at 60 and 75 dB SPL. One class of "local filtering" schemes, which was originally proposed by Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)], consists of analyzing response patterns by filters centered at the characteristic frequencies (CF) of the fibers, so that a tonotopically arranged measure of synchronized response can be obtained. Various schemes in this class differ in the characteristics of the filter. For a wide range of filter bandwidths, formant frequencies correspond approximately to the CFs for which the response measure is maximal. If in addition, the bandwidths of the analyzing filters are made compatible with psychophysical measures of frequency selectivity, low-frequency harmonics of the stimulus fundamental are resolved in the output profile, so that fundamental frequency can also be estimated. In a second class of processing schemes, a dominant response component is defined for each fiber from a 1/6 octave spectral representation of the response pattern, and the formant frequencies are estimated from the most frequent values of the dominant component in the ensemble of auditory-nerve fibers. The local filtering schemes and the dominant component schemes can be related to "place" and "periodicity" models of auditory processing, respectively.  相似文献   

7.
Classical conditioning of respiration was used to obtain psychometric functions for pulsed tone level discrimination in the goldfish (Carassius auratus). Conditioned respiratory suppression is a graded response that has some properties of a confidence rating measure. These properties were used to obtain receiver operating characteristics (ROC) and psychometric functions using a blocked method of constant stimuli. Empirical ROCs and neurometric functions were also obtained for single auditory-nerve fibers using spike count as the decision variable in order to evaluate a simple rate code for level discrimination. Psychometric and neurometric functions for level discrimination are similar in showing the same general form (summarized by Weibull functions) that is independent of signal duration. The lower slope of neurometric functions compared with behavioral functions for level discrimination is in accord with similar data on sound detection and vision in nonhuman mammals. Both neural and psychophysical level discrimination thresholds decline with increasing duration (20 to 320 ms), with similar slopes except at short signal durations (20 to 50 ms). At these durations, the animal's use of a channel-selection strategy and neural information following stimulus offset could reduce the difference between neural and psychophysical thresholds. The slopes of the neural and psychophysical duration functions are similar to those for human observers, but the majority of auditory-nerve fibers sampled have lower level discrimination thresholds than the behaving animal. Since human observers perform better than the majority of neurons in level discrimination, well-trained human listeners may be able to select channels with superior information, or to combine information across channels in ways that the goldfish and other animals do not. In general, one is encouraged to believe that neural mechanisms need not be more complex or sensitive than those considered here to account for pure-tone level discrimination in fishes, humans, and other vertebrates.  相似文献   

8.
The benefit of supplementing speechreading with information about the frequencies of the first and second formants from the voiced sections of the speech signal was studied by presenting short sentences to 18 normal-hearing listeners under the following three conditions: (a) speechreading combined with listening to the formant-frequency information, (b) speechreading only, and (c) formant-frequency information only. The formant frequencies were presented either as pure tones or as a complex speechlike signal, obtained by filtering a periodic pulse sequence of 250 Hz by a cascade of four second-order bandpass filters (with constant bandwidth); the center frequencies of two of these filters followed the frequencies of the first and second formants, whereas the frequencies of the others remained constant. The percentage of correctly identified syllables increased from 22.8 in the case of speechreading only to 82.0 in the case of speechreading while listening to the complex speechlike signal. Listening to the formant information only scored 33.2% correct. However, comparison with the best-scoring condition of our previous study [Breeuwer and Plomp, J. Acoust. Soc. Am. 76, 686-691 (1984)] indicates that information about the sound-pressure levels in two one-octave filter bands with center frequencies of 500 and 3160 Hz is a more effective supplement to speechreading than the formant-frequency information.  相似文献   

9.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

10.
The temporal representation of speechlike stimuli in the auditory-nerve output of a guinea pig cochlea model is described. The model consists of a bank of dual resonance nonlinear filters that simulate the vibratory response of the basilar membrane followed by a model of the inner hair cell/auditory nerve complex. The model is evaluated by comparing its output with published physiological auditory nerve data in response to single and double vowels. The evaluation includes analyses of individual fibers, as well as ensemble responses over a wide range of best frequencies. In all cases the model response closely follows the patterns in the physiological data, particularly the tendency for the temporal firing pattern of each fiber to represent the frequency of a nearby formant of the speech sound. In the model this behavior is largely a consequence of filter shapes; nonlinear filtering has only a small contribution at low frequencies. The guinea pig cochlear model produces a useful simulation of the measured physiological response to simple speech sounds and is therefore suitable for use in more advanced applications including attempts to generalize these principles to the response of human auditory system, both normal and impaired.  相似文献   

11.
Discharge patterns of auditory-nerve fibers in anesthetized cats were obtained for two stimulus levels in response to synthetic stimuli with dynamic characteristics appropriate for selected consonants. A set of stimuli was constructed by preceding a signal that was identified as /da/by another sound that was systematically manipulated so that the entire complex would sound like either /da/, /ada/, /na/, /sa/, /sa/, or others. Discharge rates of auditory-nerve fibers in response to the common /da/-like formant transitions depended on the preceding context. Average discharge rates during these transitions decreased most for fibers whose CFs were in frequency regions where the context had considerable energy. Some effect of the preceding context on fine time patterns of response to the transitions was also found, but the identity of the largest response components (which often corresponded to the formant frequencies) was in general unaffected. Thus the response patterns during the formant transitions contain cues about both the nature of the transitions and the preceding context. A second set of stimuli sounding like /s/ and /c/ was obtained by varying the duration of the rise in amplitude at the onset of a filtered noise burst. At both 45 and 60 dB SPL, there were fibers which showed a more prominent peak in discharge rate at stimulus onset for /c/ than for /s/, but the CF regions that reflected the clearest distinctions depended on stimulus level. The peaks in discharge rate that occur in response to rapid changes in amplitude or spectrum might be used by the central processor as pointers to portions of speech signals that are rich in phonetic information.  相似文献   

12.
Responses of auditory-nerve fibers in anesthetized cats to nine different spoken stop- and nasal-consonant/vowel syllables presented at 70 dB SPL in various levels of speech-shaped noise [signal-to-noise (S/N) ratios of 30, 20, 10, and 0 dB] are reported. The temporal aspects of speech encoding were analyzed using spectrograms. The responses of the "lower-spontaneous-rate" fibers (less than 20/s) were found to be more limited than those of the high-spontaneous-rate fibers. The lower-spontaneous-rate fibers did not encode noise-only portions of the stimulus at the lowest noise level (S/N = 30 dB) and only responded to the consonant if there was a formant or major spectral peak near its characteristic frequency. The fibers' responses at the higher noise levels were compared to those obtained at the lowest noise level using the covariance as a quantitative measure of signal degradation. The lower-spontaneous-rate fibers were found to preserve more of their initial temporal encoding than high-spontaneous-rate fibers of the same characteristic frequency. The auditory-nerve fibers' responses were also analyzed for rate-place encoding of the stimuli. The results are similar to those found for temporal encoding.  相似文献   

13.
Psychophysical results using double vowels imply that subjects are able to use the temporal aspects of neural discharge patterns. To investigate the possible temporal cues available, the responses of fibers in the cochlear nerve of the anesthetized guinea pig to synthetic vowels were recorded at a range of sound levels up to 95 dB SPL. The stimuli were the single vowels /i/ [fundamental frequency (f0) 125 Hz], /a/ (f0, 100 Hz), and /c/ (f0, 100 Hz) and the double vowels were /a(100),i(125)/ and /c(100),i(125)/. Histograms synchronized to the period of the double vowels were constructed, and locking of the discharge to individual harmonics was estimated from them by Fourier transformation. One possible cue for identifying the f0's of the constituents of a double vowel is modulation of the neural discharge with a period of 1/f0. Such modulation was found at frequencies between the formant peaks of the double vowel, with modulation at the periods of 100 and 125 Hz occurring at different places in the fiber array. Generation of a population response based on synchronized responses [average localized synchronized rate (ALSR): see Young and Sachs [J. Acoust. Soc. Am. 66, 1381-1403 (1979)] allowed estimation of the f0's by a variety of methods and subsampling the population response at the harmonics of the f0 of the constituent vowel achieved a good reconstruction of its spectrum. Other analyses using interval histograms and autocorrelation, which overcome some problems associated with the ALSR approach, also allowed f0 identification and vowel segregation. The present study has demonstrated unequivocally that the timing of the impulses in auditory-nerve fibers provides copious possible cues for the identification of the fundamental frequencies and spectra associated with each of the constituents of double vowels.  相似文献   

14.
15.
Activity of isolated auditory-nerve fibers in tree frogs (Eleutherodactylus coqui) exposed to continuous 3-min tones of different intensities at their characteristic frequencies (CFs) was recorded. Period histograms show a retardation in the preferred phase of discharge during and after the cessation of the exposure. Postexposure phase shift is concomitant with an elevation in CF thresholds and related to the level of tone exposure above threshold. Vector strength does not show comparable trends of change; postexposure shifts are related to preexposure CF thresholds. Recovery of phase retardation is rapid; units exposed to successive 3-min tones of the same intensities with intervals of 10-14 min between exposures experienced similar changes in their patterns of temporal discharge. Micromechanical changes affecting stereocilia stiffness or structural alterations in the tectorial membrane of the amphibian papilla may underly the transitory phase shifts observed in traumatized anuran auditory fibers.  相似文献   

16.
Frequency difference limens for pure tones preceded by a forward masker or followed by a backward masker were obtained across a wide range of signal levels. Relkin and Doucet [Hear. Res. 55, 215-222 (1991)] have shown that at a masker-signal delay of 100 ms, the thresholds of high-SR (spontaneous rate) auditory-nerve fibers are recovered, while the low-SR fiber thresholds are not. Therefore, forward-masked frequency discrimination potentially offers a method to investigate the role of low-SR fibers in the coding of frequency. It has been shown that when an intense forward masker is presented 100 ms before a pure-tone signal, intensity difference limens are elevated for mid-level signals [Zeng et al., Hear. Res. 55, 223-230 (1991)]. However, Plack and Viemeister [J. Acoust. Soc. Am. 92, 3097-3101 (1992)] have shown that a similar elevation in the intensity difference limen is obtained under conditions of backward masking, where selective adaptation of the auditory neurons would not be expected to occur. A condition of backward-masked frequency discrimination was therefore included to investigate the role of interference resulting from adding additional stimuli to a discrimination task. For signals at 1000 and 6000 Hz, there was no effect of a forward masker upon frequency difference limens. For the backward-masked conditions, an elevation of the frequency difference limen was observed at all signal levels, demonstrating that the effects of forward and backward maskers upon frequency discrimination are dissimilar and suggesting that cognitive effects are present in backward-masked discrimination tasks.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

17.
This study focuses on correlating speech confusion patterns, defined as consonant-vowel confusion as a function of the speech-to-noise ratio, and a model acoustic feature (AF) representation called the AI gram, defined as the articulation index density in the spectrotemporal domain. By collecting many responses from many talkers and listeners, the AF and psychophysical feature (event) is shown to be correlated via the AI-gram model and the confusion matrices at the utterance level, thereby explaining the listener confusion. Consonant /t/ is used as an example to identify its primary robust-to-noise feature, and a precise correlation of the acoustic information with the listeners' confusions is used to label the event. The main spectrotemporal cue defining the /t/ event is an across-frequency temporal coincidence, wherein frequency spread and robustness vary across utterances, while the event remains invariant. The cross-frequency timing event is shown to be the key perceptual feature for consonants in a vowel following context. Coincidences are found to form the basic element of the auditory object. Neural circuits used for coincidence in binaural processing for localization across ears are proposed to be used within one ear across channels. It is further concluded that the event is based on the audibility of the /t/ burst rather than on any superthreshold property.  相似文献   

18.
Auditory-nerve fiber spike trains were recorded in response to spoken English stop consonant-vowel syllables, both voiced (/b,d,g/) and unvoiced (/p,t,k/), in the initial position of syllables with the vowels /i,a,u/. Temporal properties of the neural responses and stimulus spectra are displayed in a spectrographic format. The responses were categorized in terms of the fibers' characteristic frequencies (CF) and spontaneous rates (SR). High-CF, high-SR fibers generally synchronize to formants throughout the syllables. High-CF, low/medium-SR fibers may also synchronize to formants; however, during the voicing, there may be sufficient low-frequency energy present to suppress a fiber's synchronized response to a formant near its CF. Low-CF fibers, from both SR groups, synchronize to energy associated with voicing. Several proposed acoustic correlates to perceptual features of stop consonant-vowel syllables, including the initial spectrum, formant transitions, and voice-onset time, are represented in the temporal properties of auditory-nerve fiber responses. Nonlinear suppression affects the temporal features of the responses, particularly those of low/medium-spontaneous-rate fibers.  相似文献   

19.
Responses of single auditory-nerve fibers in anesthetized cat to spoken nasal consonant-vowel syllables were recorded. Analyses in the form of spectrograms and of three-dimensional spatial-time and spatial-frequency plots were made. Among other features, formant transitions are clearly represented in the fibers' response synchronization properties. During vocalic segments, especially those in /mu/and/ma/, at a stimulus level near 75 dB SPL, a strong dominance in the responses by frequencies near the second formant (F2) is found for most fibers whose characteristic frequencies (CFs) are at or above F2. In contrast, at more moderate levels, the same fibers may show response synchrony to frequencies closer to their own CFs. There are significant differences in the response properties of high and low/medium-spontaneous-rate fibers.  相似文献   

20.
Changes in magnitude and variability of duration, fundamental frequency, formant frequencies, and spectral envelope of children's speech are investigated as a function of age and gender using data obtained from 436 children, ages 5 to 17 years, and 56 adults. The results confirm that the reduction in magnitude and within-subject variability of both temporal and spectral acoustic parameters with age is a major trend associated with speech development in normal children. Between ages 9 and 12, both magnitude and variability of segmental durations decrease significantly and rapidly, converging to adult levels around age 12. Within-subject fundamental frequency and formant-frequency variability, however, may reach adult range about 2 or 3 years later. Differentiation of male and female fundamental frequency and formant frequency patterns begins at around age 11, becoming fully established around age 15. During that time period, changes in vowel formant frequencies of male speakers is approximately linear with age, while such a linear trend is less obvious for female speakers. These results support the hypothesis of uniform axial growth of the vocal tract for male speakers. The study also shows evidence for an apparent overshoot in acoustic parameter values, somewhere between ages 13 and 15, before converging to the canonical levels for adults. For instance, teenagers around age 14 differ from adults in that, on average, they show shorter segmental durations and exhibit less within-subject variability in durations, fundamental frequency, and spectral envelope measures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号