首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Speech coding in the auditory nerve: III. Voiceless fricative consonants   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers in anesthetized cats were recorded for synthetic voiceless fricative consonants. The four stimuli (/x/, /s/, /s/, and /f/) were presented at two levels corresponding to speech in which the levels of the vowels would be approximately 60 and 75 dB SPL, respectively. Discharge patterns were characterized in terms of PST histograms and their power spectra. For both stimulus levels, frequency regions in which the stimuli had considerable energy corresponded well with characteristic-frequency (CF) regions in which average discharge rates were the highest. At the higher level, the profiles of discharge rate against CF were more distinctive for the stimulus onset than for the central portion. Power spectra of PST histograms had large response components near fiber characteristic frequencies for CFs up to 3-4 kHz, as well as low-frequency components for all fibers. The relative amplitudes of these components varied for the different stimuli. In general, the formant frequencies of the fricatives did not correspond with the largest response components, except for formants below about 3 kHz. Processing schemes based on fine time patterns of discharge that were effective for vowel stimuli generally failed to extract the formant frequencies of fricatives.  相似文献   

2.
3.
Nonlinear phenomena as observed in the ear canal and at the auditory nerve   总被引:1,自引:0,他引:1  
We report here several measures of nonlinear effects in the mammalian ear made in the external auditory meatus and in single neurons of the auditory nerve. We have measured the 2f1-f2 and the f2-f1 distortion products and we have found that the neural distortion product threshold curve for 2f1-f2 mirrors the low-frequency side of the frequency threshold curve, when the neural distortion product threshold curve of 2f1-f2 is plotted versus log(f2/f1) its slope is about 50 dB/oct and its intercept is 10-20 dB above the frequency threshold at the characteristic frequency CF, substantial 2f1-f2 distortion was seen in all animals studied while the f2-f1 distortion product was only rarely found at substantial levels, and the distortion product pressure observed in the ear canal was at a level equal to that detected at threshold by the neural units under study. We have also made measurements of two-tone rate suppression thresholds using two new and consistent threshold paradigms. We find that for high and intermediate characteristic frequency neural units the suppression threshold is independent of frequency and at a level of about 70 dB SPL, the suppression above CF is much less than below CF, and the tip of the frequency tuning curve can be suppressed by up to 40 dB by a low-frequency suppressor.  相似文献   

4.
We have recorded the responses of fibers in the cochlear nerve and cells in the cochlear nucleus of the anesthetized guinea pig to synthetic vowels [i], [a], and [u] at 60 and 80 dB SPL. Histograms synchronized to the pitch period of the vowel were constructed, and locking of the discharge to individual harmonics was estimated from these by Fourier transformation. In cochlear nerve fibers from the guinea pig, the responses were similar in all respects to those previously described for the cat. In particular, the average-localized-synchronized-rate functions (ALSR), computed from pooled data, had well-defined peaks corresponding to the formant frequencies of the three vowels at both sound levels. Analysis of the components dominating the discharge could also be used to determine the voice pitch and the frequency of the first formants. We have computed similar population measures over a sample of primarylike cochlear nucleus neurons. In these primarylike cochlear nucleus cell responses, the locking to the higher-frequency formants of the vowels is weaker than in the nerve. This results in a severe degradation of the peaks in the ALSR function at the second and third formant frequencies at least for [i] and [u]. This result is somewhat surprising in light of the reports that primarylike cochlear nucleus cells phaselock, as well as do cochlear nerve fibers.  相似文献   

5.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

6.
Two synthetic vowels /i/ and /ae/ with a fundamental frequency of 100 Hz served as maskers for brief (5 or 15 ms) sinusoidal signals. Threshold was measured as a function of signal frequency, for signals presented immediately following the masker (forward masking, FM) or just before the cessation of the masker (simultaneous masking, SM). Three different overall masker levels were used: 50, 70, and 90 dB SPL. In order to compare the data from simultaneous and forward masking, and to compensate for the nonlinear characteristics of forward masking, each signal threshold was expressed as the level of a flat-spectrum noise which would give the same masking. The internal representation of the formant structure of the vowels, as inferred from the transformed masking patterns, was enhanced in FM and "blurred" in SM in comparison to the physical spectra, suggesting that suppression plays a role in enhancing spectral contrasts. The first two or three formants were usually visible in the masking patterns and the representation of the formant structure was impaired only slightly at high masker levels. For high levels, filtering out the relatively intense low-frequency components enhanced the representation of the higher formants in FM but not in SM, indicating a broadly tuned remote suppression from lower formants towards higher ones. The relative phase of the components in the masker had no effect on thresholds in forward masking, indicating that the detailed temporal structure of the masker waveform is not important.  相似文献   

7.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

8.
The goal of this study was to measure the ability of normal-hearing listeners to discriminate formant frequency for vowels in isolation and sentences at three signal levels. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. The signal level indicated a rollover effect, especially for F2, in which formant thresholds at 85 dB SPL were lower than thresholds at 70 or 100 dB SPL in both isolated vowels and sentences. This rollover level effect could be due to reduced frequency selectivity and forward/backward masking in sentence at high signal levels for normal-hearing listeners.  相似文献   

9.
The identification of front vowels was studied in normal-hearing listeners using stimuli whose spectra had been altered to approximate the spectrum of vowels processed by auditory filters similar to those that might accompany sensorineural hearing loss. In the first experiment, front vowels were identified with greater than 95% accuracy when the first formant was specified in a normal manner and the higher frequency formants were represented by a broad, flat spectral plateau ranging from approximately 1600 to 3500 Hz. In the second experiment, the bandwidth of the first formant was systematically widened for stimuli with already flattened higher frequency formants. Normal vowel identification was preserved until the first formant was widened to six times its normal bandwidth. These results may account for the coexistence of abnormal vowel masking patterns (indicating flattened auditory spectra) and normal vowel recognition.  相似文献   

10.
Multicomponent stimuli consisting of two to seven tones were used to study suppression of basilar-membrane vibration at the 3-4-mm region of the chinchilla cochlea with a characteristic frequency between 6.5 and 8.5 kHz. Three-component stimuli were amplitude-modulated sinusoids (AM) with modulation depth varied between 0.25 and 2 and modulation frequency varied between 100 and 2000 Hz. For five-component stimuli of equal amplitude, frequency separation between adjacent components was the same as that used for AM stimuli. An additional manipulation was to position either the first, third, or fifth component at the characteristic frequency (CF). This allowed the study of the basilar-membrane response to off-CF stimuli. CF suppression was as high as 35 dB for two-tone combinations, while for equal-amplitude stimulus components CF suppression never exceeded 20 dB. This latter case occurred for both two-tone stimuli where the suppressor was below CF and for multitone stimuli with the third component=CF. Suppression was least for the AM stimuli, including when the three AM components were equal. Maximum suppression was both level- and frequency dependent, and occurred for component frequency separations of 500 to 600 Hz. Suppression decreased for multicomponent stimuli with component frequency spacing greater than 600 Hz. Mutual suppression occurred whenever stimulus components were within the compressive region of the basilar membrane.  相似文献   

11.
The effects of stimulus frequency on two-tone suppression were investigated in single auditory-nerve fibers of anesthetized cats and compared with human psychophysical data. In the physiological experiment, both average discharge rate and phase-locked activity were measured in response to one- and two-tone stimuli. The first component f1 produced an increase in rate above spontaneous activity when presented alone. The second tone f2 was always well below the fiber's characteristic frequency and was held at a fixed sound pressure level appropriate to produce two-tone suppression. Responses were plotted as a function of stimulus level of the first tone both alone and in the presence of f2. For different values of f1 with f2 fixed, suppression was maximum with f1 near fiber CF. In the psychophysical experiment, similar stimulus parameters of f1 and f2 were used as the masker in a forward-masker paradigm. In this experiment the addition of the second masker tone at frequency f2 could produce less masking of the signal. When f1 was varied with f2 fixed, the relative decrease in masking, analogous to suppression, was greatest when f1 was equal to the signal frequency.  相似文献   

12.
Vowel perception strategies were assessed for two "average" and one "star" single-channel 3M/House and three "average" and one "star" Nucleus 22-channel cochlear implant patients and six normal-hearing control subjects. All subjects were tested by computer with real and synthetic speech versions of [symbol: see text], presented randomly. Duration, fundamental frequency, and first, second, and third formant frequency cues to the vowels were the vowels were systematically manipulated. Results showed high accuracy for the normal-hearing subjects in all conditions but that of the first formant alone. "Average" single-channel patients classified only real speech [hVd] syllables differently from synthetic steady state syllables. The "star" single-channel patient identified the vowels at much better than chance levels, with a results pattern suggesting effective use of first formant and duration information. Both "star" and "average" Nucleus users showed similar response patterns, performing better than chance in most conditions, and identifying the vowels using duration and some frequency information from all three formants.  相似文献   

13.
Auditory-nerve fiber spike trains were recorded in response to spoken English stop consonant-vowel syllables, both voiced (/b,d,g/) and unvoiced (/p,t,k/), in the initial position of syllables with the vowels /i,a,u/. Temporal properties of the neural responses and stimulus spectra are displayed in a spectrographic format. The responses were categorized in terms of the fibers' characteristic frequencies (CF) and spontaneous rates (SR). High-CF, high-SR fibers generally synchronize to formants throughout the syllables. High-CF, low/medium-SR fibers may also synchronize to formants; however, during the voicing, there may be sufficient low-frequency energy present to suppress a fiber's synchronized response to a formant near its CF. Low-CF fibers, from both SR groups, synchronize to energy associated with voicing. Several proposed acoustic correlates to perceptual features of stop consonant-vowel syllables, including the initial spectrum, formant transitions, and voice-onset time, are represented in the temporal properties of auditory-nerve fiber responses. Nonlinear suppression affects the temporal features of the responses, particularly those of low/medium-spontaneous-rate fibers.  相似文献   

14.
Two studies were conducted to assess the sensitivity of perioral muscles to vowel-like auditory stimuli. In one study, normal young adults produced an isometric lip rounding gesture while listening to a frequency modulated tone (FMT). The fundamental of the FMT was modulated over time in a sinusoidal fashion near the frequency ranges of the first and second formants of the vowels /u/ and /i/ (rate of modulation = 4.5 or 7 Hz). In another study, normal young adults produced an isometric lip rounding gesture while listening to synthesized vowels whose formant frequencies were modulated over time in a sinusoidal fashion to simulate repetitive changes from the vowel /u/ to /i/ (rate of modulation = 2 or 4 Hz). The FMTs and synthesized vowels were presented binaurally via headphones at 75 and 60 dB SL, respectively. Muscle activity from the orbicularis oris superior and inferior and from lip retractors was recorded with surface electromyography (EMG). Signal averaging and spectral analysis of the rectified and smoothed EMG failed to show perioral muscle responses to the auditory stimuli. Implications for auditory feedback theories of speech control are discussed.  相似文献   

15.
Spectral integration refers to the summation of activity beyond the bandwidth of the peripheral auditory filter. Several experimental lines have sought to determine the bandwidth of this "supracritical" band phenomenon. This paper reports on two experiments which tested the limit on spectral integration in the same listeners. Experiment I verified the critical separation of 3.5 bark in two-formant synthetic vowels as advocated by the center-of-gravity (COG) hypothesis. According to the COG effect, two formants are integrated into a single perceived peak if their separation does not exceed approximately 3.5 bark. With several modifications to the methods of a classic COG matching task, the present listeners responded to changes in pitch in two-formant synthetic vowels, not estimating their phonetic quality. By changing the amplitude ratio of the formants, the frequency of the perceived peak was closer to that of the stronger formant. This COG effect disappeared with larger formant separation. In a second experiment, auditory spectral resolution bandwidths were measured for the same listeners using common-envelope, two-tone complex signals. Results showed that the limits of spectral averaging in two-formant vowels and two-tone spectral resolution bandwidth were related for two of the three listeners. The third failed to perform the discrimination task. For the two subjects who completed both tasks, the results suggest that the critical region in vowel task and the complex-tone discriminability estimates are linked to a common mechanism, i.e., to an auditory spectral resolving power. A signal-processing model is proposed to predict the COG effect in two-formant synthetic vowels. The model introduces two modifications to Hermansky's [J. Acoust. Soc. Am. 87, 1738-1752 (1990)] perceptual linear predictive (PLP) model. The model predictions are generally compatible with the present experimental results and with the predictions of several earlier models accounting for the COG effect.  相似文献   

16.
Previous research with speechlike signals has suggested that upward spread of masking from the first formant (F 1) may interfere with the identification of place of articulation information signaled by changes in the upper formants. This suggestion was tested by presenting two-formant stop consonant--vowel syllables varying along a/ba--/da/--/ga/ continuum to hearing-impaired listeners grouped according to etiological basis of the disorder. The syllables were presented monaurally at 80 dB and 100 dB SPL when formant amplitudes were equal and when F 1 amplitude was reduced by 6, 12, and 18 dB. Noise-on-tone masking patterns were also generated using narrow bands of noise at 80 and 100 dB SPL to assess the extent of upward spread of masking. Upward spread of masking could be demonstrated in both speech and nonspeech tasks, irrespective of the subject's age, audiometric configuration, or etiology of hearing impairment. Attenuation of F 1 had different effects on phonetic identification in different subject groups: While listeners with noise-induced hearing loss showed substantial improvement in identifying place of articulation, upward spread of masking did not consistently account for poor place identification in other types of sensorineural hearing impairment.  相似文献   

17.
This study sought to compare formant frequencies estimated from natural phonation to those estimated using two methods of artificial laryngeal stimulation: (1) stimulation of the vocal tract using an artificial larynx placed on the neck and (2) stimulation of the vocal tract using an artificial larynx with an attached tube placed in the oral cavity. Twenty males between the ages of 18 and 45 performed the following three tasks on the vowels /a/ and /i/: (1) 4 seconds of sustained vowel, (2) 2 seconds of sustained vowel followed by 2 seconds of artificial phonation via a neck placement, and (3) 4 seconds of sustained vowel, the last two of which were accompanied by artificial phonation via an oral placement. Frequencies for formants 1-4 were measured for each task at second 1 and second 3 using linear predictive coding. These measures were compared across second 1 and second 3, as well as across all three tasks. Neither of the methods of artificial laryngeal stimulation tested in this study yielded formant frequency estimates that consistently agreed with those obtained from natural phonation for both vowels and all formants. However, when estimating mean formant frequency data for samples of large N, each of the methods agreed with mean estimations obtained from natural phonation for specific vowels and formants. The greatest agreement was found for a neck placement of the artificial larynx on the vowel /a/.  相似文献   

18.
The intelligibility of speech is sustained at lower signal-to-noise ratios when the speech has a different interaural configuration from the noise. This paper argues that the advantage arises in part because listeners combine evidence of the spectrum of speech in the across-frequency profile of interaural decorrelation with evidence in the across-frequency profile of intensity. To support the argument, three experiments examined the ability of listeners to integrate and segregate evidence of vowel formants in these two profiles. In experiment 1, listeners achieved accurate identification of the members of a small set of vowels whose first formant was defined by a peak in one profile and whose second formant was defined by a peak in the other profile. This result demonstrates that integration is possible. Experiment 2 demonstrated that integration is not mandatory, insofar as listeners could report the identity of a vowel defined entirely in one profile despite the presence of a competing vowel in the other profile. The presence of the competing vowel reduced accuracy of identification, however, showing that segregation was incomplete. Experiment 3 demonstrated that segregation of the binaural vowel, in particular, can be increased by the introduction of an onset asynchrony between the competing vowels. The results of experiments 2 and 3 show that the intrinsic cues for segregation of the profiles are relatively weak. Overall, the results are compatible with the argument that listeners can integrate evidence of spectral peaks from the two profiles.  相似文献   

19.
Auditory feedback influences human speech production, as demonstrated by studies using rapid pitch and loudness changes. Feedback has also been investigated using the gradual manipulation of formants in adaptation studies with whispered speech. In the work reported here, the first formant of steady-state isolated vowels was unexpectedly altered within trials for voiced speech. This was achieved using a real-time formant tracking and filtering system developed for this purpose. The first formant of vowel /epsilon/ was manipulated 100% toward either /ae/ or /I/, and participants responded by altering their production with average Fl compensation as large as 16.3% and 10.6% of the applied formant shift, respectively. Compensation was estimated to begin <460 ms after stimulus onset. The rapid formant compensations found here suggest that auditory feedback control is similar for both F0 and formants.  相似文献   

20.
Vowel equalization is a technique that can be used by singers to achieve a more balanced vocal resonance, or chiaroscuro, by balancing corresponding front and back vowels, which share approximate tongue heights, and also high and low vowels by means of a more neutral or centralized lingual posture. The goal of this single group study was to quantify acoustic changes in vowels after a brief training session in vowel equalization. Fifteen young adults with amateur singing experience sang a passage and sustained isolated vowels both before and after a 15-minute training session in vowel equalization. The first two formants of the target vowels /e, i, ɑ, o, u/ were measured from microphone recordings. An analysis of variance was used to test for changes in formant values after the training session. These formant values mostly changed in a manner reflective of a more central tongue posture. For the sustained vowels, all formant changes suggested a more neutral tongue position after the training session. The vowels in the singing passage mostly changed in the expected direction, with exceptions possibly attributable to coarticulation. The changes in the vowel formants indicated that even a brief training session can result in significant changes in vowel acoustics. Further work to explore the perceptual consequences of vowel equalization is warranted.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号