首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The two experiments described here use a formant-matching task to investigate what abstract representations of sound are available to listeners. The first experiment examines how veridically and reliably listeners can adjust the formant frequency of a single-formant sound to match the timbre of a target single-formant sound that has a different bandwidth and either the same or a different fundamental frequency (F0). Comparison with previous results [Dissard and Darwin, J. Acoust. Soc. Am. 106, 960-969 (2000)] shows that (i) for sounds on the same F0, introducing a difference in bandwidth increases the variability of matches regardless of whether the harmonics close to the formant are resolved or unresolved; (ii) for sounds on different F0's, introducing a difference in bandwidth only increases variability for sounds that have unresolved harmonics close to the formant. The second experiment shows that match variability for sounds differing in F0, but with the same bandwidth and with resolved harmonics near the formant peak, is not influenced by the harmonic spacing or by the alignment of harmonics with the formant peak. Overall, these results indicate that match variability increases when the match cannot be made on the basis of the excitation pattern, but match variability does not appear to depend on whether ideal matching performance requires simply interpolation of a spectral envelope or also the extraction of the envelope's peak frequency.  相似文献   

2.
3.
Experiment 1 measured frequency modulation detection thresholds (FMTs) for harmonic complex tones as a function of modulation rate. Six complexes were used, with fundamental frequencies (F0s) of either 88 or 250 Hz, bandpass filtered into a LOW (125-625 Hz), MID (1375-1875 Hz) or HIGH (3900-5400 Hz) frequency region. The FMTs were about an order of magnitude greater for the three complexes whose harmonics were unresolved by the peripheral auditory system (F0 = 88 Hz in the MID region and both F0s in the HIGH region) than for the other three complexes, which contained some resolved harmonics. Thresholds increased with increases in FM rate above 2 Hz for all conditions. The increase was larger when the F0 was 88 Hz than when it was 250 Hz, and was also larger in the LOW than in the MID and HIGH regions. Experiment 2 measured thresholds for detecting mistuning produced by modulating the F0s of two simultaneously presented complexes out of phase by 180 degrees. The size of the resulting mistuning oscillates at a rate equal to the rate of FM applied to the two carriers. At low FM rates, thresholds were lowest when the harmonics were either resolved for both complexes or unresolved for both complexes, and highest when resolvability differed across complexes. For pairs of complexes with resolved harmonics, mistuning thresholds increased dramatically as the FM rate was increased above 2-5 Hz, in a way which could not be accounted for by the effect of modulation rate on the FMTs for the individual complexes. A third experiment, in which listeners detected constant ("static") mistuning between pairs of frequency-modulated complexes, provided evidence that this deterioration was due the harmonics in one of the two "resolved" complexes becoming unresolved at high FM rates, when analyzed over some finite time window. It is concluded that the detection of time-varying mistuning between groups of harmonics is limited by factors that are not apparent in FM detection data.  相似文献   

4.
Vowel matching and identification experiments were carried out to investigate the perceptual contribution of harmonics in the first formant region of synthetic front vowels. In the first experiment, listeners selected the best phonetic match from an F1 continuum, for reference stimuli in which a band of two to five adjacent harmonics of equal intensity replaced the F1 peak; F1 values of best matches were near the frequency of the highest frequency harmonic in the band. Attenuation of the highest harmonic in the band resulted in lower F1 matches. Attenuation of the lowest harmonic had no significant effects, except in the case of a 2-harmonic band, where higher F1 matches were selected. A second experiment investigated the shifts in matched F1 resulting from an intensity increment to either one of a pair of harmonics in the F1 region. These shifts were relatively invariant over different harmonic frequencies and proportional to the fundamental frequency. A third experiment used a vowel identification task to determine phoneme boundaries on an F1 continuum. These boundaries were not substantially altered when the stimuli comprised only the two most prominent harmonics in the F1 region, or these plus either the higher or lower frequency subset of the remaining F1 harmonics. The results are consistent with an estimation procedure for the F1 peak which assigns greatest weight to the two most prominent harmonics in the first formant region.  相似文献   

5.
Three experiments investigated the relationship between harmonic number, harmonic resolvability, and the perception of harmonic complexes. Complexes with successive equal-amplitude sine- or random-phase harmonic components of a 100- or 200-Hz fundamental frequency (f0) were presented dichotically, with even and odd components to opposite ears, or diotically, with all harmonics presented to both ears. Experiment 1 measured performance in discriminating a 3.5%-5% frequency difference between a component of a harmonic complex and a pure tone in isolation. Listeners achieved at least 75% correct for approximately the first 10 and 20 individual harmonics in the diotic and dichotic conditions, respectively, verifying that only processes before the binaural combination of information limit frequency selectivity. Experiment 2 measured fundamental frequency difference limens (f0 DLs) as a function of the average lowest harmonic number. Similar results at both f0's provide further evidence that harmonic number, not absolute frequency, underlies the order-of-magnitude increase observed in f0 DLs when only harmonics above about the 10th are presented. Similar results under diotic and dichotic conditions indicate that the auditory system, in performing f0 discrimination, is unable to utilize the additional peripherally resolved harmonics in the dichotic case. In experiment 3, dichotic complexes containing harmonics below the 12th, or only above the 15th, elicited pitches of the f0 and twice the f0, respectively. Together, experiments 2 and 3 suggest that harmonic number, regardless of peripheral resolvability, governs the transition between two different pitch percepts, one based on the frequencies of individual resolved harmonics and the other based on the periodicity of the temporal envelope.  相似文献   

6.
Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F0-dependent distortion. The problem is severe at high F0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F0-independency observed in vowel identification.  相似文献   

7.
Better place-coding of the fundamental frequency in cochlear implants   总被引:1,自引:0,他引:1  
In current cochlear implant systems, the fundamental frequency F0 of a complex sound is encoded by temporal fluctuations in the envelope of the electrical signals presented on the electrodes. In normal hearing, the lower harmonics of a complex sound are resolved, in contrast with a cochlear implant system. In the present study, it is investigated whether "place-coding" of the first harmonic improves the ability of an implantee to discriminate complex sounds with different fundamental frequencies. Therefore, a new filter bank was constructed, for which the first harmonic is always resolved in two adjacent filters, and the balance between both filter outputs is directly related to the frequency of the first harmonic. The new filter bank was compared with a filter bank that is typically used in clinical processors, both with and without the presence of temporal cues in the stimuli. Four users of the LAURA cochlear implant participated in a pitch discrimination task to determine detection thresholds for F0 differences. The results show that these thresholds decrease noticeably for the new filter bank, if no temporal cues are present in the stimuli. If temporal cues are included, the differences between the results for both filter banks become smaller, but a clear advantage is still observed for the new filter bank. This demonstrates the feasibility of using place-coding for the fundamental frequency.  相似文献   

8.
The discrimination of the fundamental frequency (fo) of pairs of complex tones with no common harmonics is worse than the discrimination of fo for tones with all harmonics in common. These experiments were conducted to assess whether this effect is a result of pitch shifts between pairs of tones without common harmonics or whether it reflects influences of spectral differences (timbre) on the accuracy of pitch perception. In experiment 1, pitch matches were obtained between sounds drawn from the following types: (1) pure tones (P) with frequencies 100, 200, or 400 Hz; (2) a multiple-component complex tone, designated A, with harmonics 3, 4, 8, 9, 10, 14, 15, and fo = 100, 200, or 400 Hz; (3) A multiple-component complex tone, designated B, with harmonics 5, 6, 7, 11, 12, 13, 16, and with fo = 100, 200 or 400 Hz. The following matches were made; A vs A, B vs B, A vs P, B vs P and P vs P. Pitch shifts were found between the pure tones and the complex tones (A vs P and B vs P), but not between the A and B tones (A vs B). However, the variability of the A vs B matches was significantly greater than that of the A vs A or B vs B matches. Also, the variability of the A vs P and B vs P matches was greater than that for the A vs B matches. In a second experiment, frequency difference limens (DLCs) were measured for the A vs A, B vs B, and A vs B pairs of sounds. The DLCs were larger for the A vs B pair than for A vs A or B vs B. The results suggest that the poor frequency discrimination of tones with no common harmonics does not result from pitch shifts between the tones. Rather, it seems that spectral differences between tones interfere with judgements of their relative pitch.  相似文献   

9.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

10.
The sound level of the singer's formant in professional singing   总被引:2,自引:0,他引:2  
The relative sound level of the "singer's formant," measured in a 1/3-oct band with a center frequency of 2.5 kHz for males and of 3.16 kHz for females, has been investigated for 14 professional singers, nine different modes of singing, nine different vowels, variations in overall sound-pressure level, and fundamental frequencies ranging from 98 up to 880 Hz. Variation in the sound level of the singer's formant due to differences among male singers was small (4 dB), the factors vowels (16 dB) and fundamental frequency (9-14 dB) had an intermediate effect, while the largest variation was found for differences among female singers (24 dB), between modes of singing (vocal effort) (23 dB), and in overall sound-pressure level (more than 30 dB). In spite of this great potential variability, for each mode of singing the sound level of the singer's formant was remarkably constant up to F0 = 392 Hz, due to adaptation of vocal effort. This may be explained as the result of the perceptual demand of a constant voice quality. The definition of the singer's formant is discussed.  相似文献   

11.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

12.
Three experiments examined the ability of listeners to identify steady-state synthetic vowel-like sounds presented concurrently in pairs to the same ear. Experiment 1 confirmed earlier reports that listeners identify the constituents of such pairs more accurately when they differ in fundamental frequency (f0) by about a half semitone or more, compared to the condition where they have the same f0. When the constituents have different f0's, corresponding harmonics of the two vowels are misaligned in frequency and corresponding pitch periods are asynchronous in time. These differences provide cues that might aid identification. Experiments 2 and 3 determined whether listeners can use these cues, divorced from a difference in f0, to improve their accuracy of identification. Harmonic misalignment was beneficial when the constituents had an f0 of 200 Hz so that the harmonics of each constituent were well separated in frequency. Pitch-period asynchrony was beneficial when the constituents had an f0 of 50 Hz so that the onsets of the pitch periods of each constituent were well separated in time. Neither cue was beneficial when both constituents had an f0 of 100 Hz. It is unlikely, therefore, that either cue contributed to the improvement in performance found in Experiment 1 where the constituents were given different f0's close to 100 Hz. Rather, it is argued that performance improved in Experiment 1 primarily because the two f0's specified two pitches that could be used to segregate the contributions of each vowel in the composite waveform.  相似文献   

13.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

14.
A melodic pitch experiment was performed to demonstrate the importance of time-interval resolution for pitch strength. The experiments show that notes with a low fundamental (75 Hz) and relatively few resolved harmonics support better performance than comparable notes with a higher fundamental (300 Hz) and more resolved harmonics. Two four note melodies were presented to listeners and one note in the second melody was changed by one or two semitones. Listeners were required to identify the note that changed. There were three orthogonal stimulus dimensions: F0 (75 and 300 Hz); lowest frequency component (3, 7, 11, or 15); and number of harmonics (4 or 8). Performance decreased as the frequency of the lowest component increased for both F0's, but performance was better for the lower F0. The spectral and temporal information in the stimuli were compared using a time-domain model of auditory perception. It is argued that the distribution of time intervals in the auditory nerve can explain the decrease in performance as F0, and spectral resolution increase. Excitation patterns based on the same time-interval information do not contain sufficient resolution to explain listener's performance on the melody task.  相似文献   

15.
Structure-borne noise originating from a heat pump unit was selected to study the influence on subjective annoyance of low frequency noise (LFN) combined with additional sound. Paired comparison test was used for evaluating the subjective annoyance of LFN combined with different sound pressure levels (SPL) of pink noise, frequency-modulated pure tones (FM pure tones) and natural sounds. The results showed that, with pink noise of 250-1000 Hz combined with the original LFN, the subjective annoyance value (SAV) first dropped then rose with increasing SPL. When SPL of the pink noise was 15-25 dB, SAV was lower than that of the original LFN. With pink noise of frequency 250-20,000 Hz added to LFN, SAV increased linearly with increasing SPL. SAV and the psychoacoustic annoyance value (PAV) obtained by semi-theoretical formulas were well correlated. The determination coefficient (R2) was 0.966 and 0.881, respectively, when the frequency range of the pink noise was 250-1000 and 250-20,000 Hz. When FM pure tones with central frequencies of 500, 2000 and 8000 Hz, or natural sounds (including the sound of singing birds, flowing water, wind or ticking clock) were, respectively, added to the original sound, the SAV increased as the SPL of the added sound increased. However, when a FM pure tone of 15 dB with a central frequency of 2000 Hz and a modulation frequency of 10 Hz was added, the SAV was lower than that of the original LFN. With SPL and central frequency held invariable, the SAV declined primarily when modulation frequency increased. With SPL and modulation frequency held invariable, the SAV became lowest when the central frequency was 2000 Hz. This showed a preferable correlation between SAV and fluctuation extent of FM pure tones.  相似文献   

16.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

17.
The ability of baboons to discriminate changes in the formant structures of a synthetic baboon grunt call and an acoustically similar human vowel (/epsilon/) was examined to determine how comparable baboons are to humans in discriminating small changes in vowel sounds, and whether or not any species-specific advantage in discriminability might exist when baboons discriminate their own vocalizations. Baboons were trained to press and hold down a lever to produce a pulsed train of a standard sound (e.g., /epsilon/ or a baboon grunt call), and to release the lever only when a variant of the sound occurred. Synthetic variants of each sound had the same first and third through fifth formants (F1 and F3-5), but varied in the location of the second formant (F2). Thresholds for F2 frequency changes were 55 and 67 Hz for the grunt and vowel stimuli, respectively, and were not statistically different from one another. Baboons discriminated changes in vowel formant structures comparable to those discriminated by humans. No distinct advantages in discrimination performances were observed when the baboons discriminated these synthetic grunt vocalizations.  相似文献   

18.
When all of the components in a harmonic complex tone are shifted in frequency by delta f, the pitch of the complex shifts roughly in proportion to delta f. For tones with a small number of components, the shift is usually somewhat larger than predicted from pitch theories, which has been attributed to the influence of combination tones [Smoorenburg, J. Acoust. Soc. Am. 48, 924-941 (1970)]. Experiment 1 assessed whether combination tones influence the pitch of complex tones with more than five harmonics, by using noise to mask the combination tones. The matching stimulus was a harmonic complex. Test complexes were bandpass filtered with passbands centered on harmonic numbers 5 (resolved), 11 (intermediate), or 16 (unresolved) and fundamental frequencies (FOs) were 100, 200, or 400 Hz. For the intermediate and unresolved conditions, the matching stimuli were filtered with the same passband to minimize differences in the excitation patterns of the test and matching stimuli. For the resolved condition, the matching stimulus had a passband centered above that of the test stimulus, to avoid common partials. For resolved and intermediate conditions, pitch shifts were observed that could generally be predicted from the frequencies of the partials. The shifts were unaffected by addition of noise to mask combination tones. For the unresolved condition, no pitch shift was observed, which suggests that pitch is not based on temporal fine structure for stimuli containing only high unresolved harmonics. Experiment 2 used three-component complexes resembling those of Schouten [J. Acoust. Soc. Am. 34, 1418-1424 (1962)]. Nominal harmonic numbers were 3, 4, 5 (resolved), 8, 9, 10 (intermediate), or 13, 14, 15 (unresolved) and F0s were 50, 100, 200, or 400 Hz. Clear shifts in the matches were found for all conditions, including unresolved. For the latter, subjects may have matched the "center of gravity" of the excitation patterns of the test and matching stimuli.  相似文献   

19.
Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males).  相似文献   

20.
That singers under certain circumstances adjust the articulation of the vocal tract (formant tuning) to enhance acoustic output is both apparent from measurements and understood in theory. The precise effect of a formant on an approaching (retreating) harmonic as the latter varies in frequency during actual singing, however, is difficult to isolate. In this study variations in amplitude of radiated sound components as well as supraglottal and subglottal (esophageal) pressures accompanying the vibrato-related sweep of voice harmonics were used as a basis for estimating the effective center frequencies and bandwidths of the first and second formants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号