首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Two experiments investigated the role of the regularity of the frequency spacing of harmonics, as a separate factor from harmonicity, on the perception of the virtual pitch of a harmonic series. The first experiment compared the shifts produced by mistuning the 3rd, 4th, and 5th harmonics in the pitch of two harmonic series: the odd-H and the all-H tones. The odd-H tone contained odd harmonics 1 to 11, plus the 4th harmonic; the all-H tone contained harmonics 1 to 12. Both tones had a fundamental frequency of 155 Hz. Pitch shifts produced by mistuning the 3rd harmonic, but not the 4th and 5th harmonics, were found to be significantly larger for the odd-H tone than for the all-H tone. This finding was consistent with the idea that grouping by spectral regularity affects pitch perception since an odd harmonic made a larger contribution than an adjacent even harmonic to the pitch of the odd-H tone. However, an alternative explanation was that the 3rd mistuned harmonic produced larger pitch shifts within the odd-H tone than the 4th mistuned harmonic because of differences in the partial masking of these harmonics by adjacent harmonics. The second experiment tested these explanations by measuring pitch shifts for a modified all-H tone in which each mistuned odd harmonic was tested in the presence of the 4th harmonic, but in the absence of its other even-numbered neighbor. The results showed that, for all mistuned harmonics, pitch shifts for the modified all-H tone were not significantly different from those for the odd-H tone. These findings suggest that the harmonic relations among frequency components, rather than the regularity of their frequency spacing, is the primary factor for the perception of the virtual pitch of complex sounds.  相似文献   

2.
When a low harmonic in a harmonic complex tone is mistuned from its harmonic value by a sufficient amount it is heard as a separate tone, standing out from the complex as a whole. This experiment estimated the degree of mistuning required for this phenomenon to occur, for complex tones with 10 or 12 equal-amplitude components (60 dB SPL per component). On each trial the subject was presented with a complex tone which either had all its partials at harmonic frequencies or had one partial mistuned from its harmonic frequency. The subject had to indicate whether he heard a single complex tone with one pitch or a complex tone plus a pure tone which did not "belong" to the complex. An adaptive procedure was used to track the degree of mistuning required to achieve a d' value of 1. Threshold was determined for each ot the first six harmonics of each complex tone. In one set of conditions stimulus duration was held constant at 410 ms, and the fundamental frequency was either 100, 200, or 400 Hz. For most conditions the thresholds fell between 1% and 3% of the harmonic frequency, depending on the subject. However, thresholds tended to be greater for the first two harmonics of the 100-Hz fundamental and, for some subjects, thresholds increased for the fifth and sixth harmonics. In a second set of conditions fundamental frequency was held constant at 200 Hz, and the duration was either 50, 110, 410, or 1610 ms. Thresholds increased by a factor of 3-5 as duration was decreased from 1610 ms to 50 ms. The results are discussed in terms of a hypothetical harmonic sieve and mechanisms for the formation of perceptual streams.  相似文献   

3.
When a partial of a periodic complex is mistuned, its change in pitch is greater than expected. Two experiments examined whether these partial-pitch shifts are related to the computation of global pitch. In experiment 1, stimuli were either harmonic or frequency-shifted (25% of F0) complexes. One partial was mistuned by +/- 4% and played with leading and lagging portions of 500 ms each, relative to the other components (1 s), in both monaural and dichotic contexts. Subjects indicated whether the mistuned partial was higher or lower in pitch when concurrent with the other components. Responses were positively correlated with the direction of mistuning in all conditions. In experiment 2, stimuli from each condition were compared with synchronous equivalents. Subjects matched a pure tone to the pitch of the mistuned partial (component 4). The results showed that partial-pitch shifts are not reduced in size by asynchrony. Similar asynchronies are known to produce a near-exclusion of a mistuned partial from the global-pitch computation. This mismatch indicates that global and partial pitch are derived from different processes. The similarity of the partial-pitch shifts observed for harmonic and frequency-shifted stimuli suggests that they arise from a grouping mechanism that is sensitive to spectral regularity.  相似文献   

4.
It is unclear whether the perceptual segregation of a mistuned harmonic from a periodic complex tone depends specifically on harmonic relations between the other components. A procedure used previously for harmonic complexes [W. M. Hartmann et al., J. Acoust. Soc. Am. 88, 1712-1724 (1990)] was adapted and extended to regular inharmonic complexes. On each trial, subjects heard a 12-component complex followed by a pure tone in a continuous loop. In experiment 1, a mistuning of +/- 4% was applied to one of the components 2-11. The complex was either harmonic, frequency shifted, or spectrally stretched. Subjects adjusted the pure tone to match the pitch of the mistuned component. Near matches were taken to indicate segregation, and were almost as frequent in the inharmonic conditions as in the harmonic case. Also, small but consistent mismatches, pitch shifts, were found in all conditions. These were similar in direction and size to earlier findings for harmonic complexes. Using a range of mistunings, experiment 2 showed that the segregation of components from regular inharmonic complexes could be sensitive to mistunings of 1.5% or less. These findings are consistent with the proposal that aspects of spectral regularity other than harmonic relations can also influence auditory grouping.  相似文献   

5.
Mistuning one partial of a complex harmonic tone makes that partial easier to hear as a tone separate from the complex. At the same time, two pitch shifts may be observed. First, the low pitch of the complex is shifted in the direction of the mistuning, as if it were "pulled" by the partial. Second, the mistuning of the partial is perceptually exaggerated, as if the pitch of the partial were "pushed" away from the harmonic series defined by the complex. This paper shows how the latter effect can emerge within a hypothetical neural circuit. The circuit involves a gating neuron fed by three pathways, one direct and excitatory and the other two delayed and inhibitory. The neuron responds to any excitatory input spike unless it is accompanied by an inhibitory input spike on either delayed input, thus acting as a kind of "anticoincidence counter." The first delay is fixed and tuned to the period of the background harmonic complex. Its purpose is to weaken correlates of in-tune components and allow the mistuned partial to stand out. The second delay is variable and used to estimate the period of the mistuned partial, by searching for a minimum output as a function of delay. With an appropriate choice of parameters, the estimate is subject to shifts that are of the same sign as the mistuning and that peak at about 4% mistuning and decrease beyond, as observed experimentally.  相似文献   

6.
To clarify the role of formant frequency in the perception of pitch in whispering, we conducted a preliminary experiment to determine (1.) whether speakers change their pitch during whispering; (2.) whether listeners can perceive differences in pitch; and (3.) what the acoustical features are when speakers change their pitch. The listening test of whispered Japanese speech demonstrates that one can determine the perceived pitch of vowel /a/ as ordinary, high, or low. Acoustical analysis revealed that the perception of pitch corresponds to some formant frequencies. Further data with synthesized whispered voice are necessary to confirm the importance of the formant frequencies in detail for perceived pitch of whispered vowels.  相似文献   

7.
The identification of front vowels was studied in normal-hearing listeners using stimuli whose spectra had been altered to approximate the spectrum of vowels processed by auditory filters similar to those that might accompany sensorineural hearing loss. In the first experiment, front vowels were identified with greater than 95% accuracy when the first formant was specified in a normal manner and the higher frequency formants were represented by a broad, flat spectral plateau ranging from approximately 1600 to 3500 Hz. In the second experiment, the bandwidth of the first formant was systematically widened for stimuli with already flattened higher frequency formants. Normal vowel identification was preserved until the first formant was widened to six times its normal bandwidth. These results may account for the coexistence of abnormal vowel masking patterns (indicating flattened auditory spectra) and normal vowel recognition.  相似文献   

8.
The goal of this study was to measure the ability of normal-hearing listeners to discriminate formant frequency for vowels in isolation and sentences at three signal levels. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. The signal level indicated a rollover effect, especially for F2, in which formant thresholds at 85 dB SPL were lower than thresholds at 70 or 100 dB SPL in both isolated vowels and sentences. This rollover level effect could be due to reduced frequency selectivity and forward/backward masking in sentence at high signal levels for normal-hearing listeners.  相似文献   

9.
An experiment was carried out, investigating the relationship between the just noticeable difference of fundamental frequency (jndfo) of three stationary synthesized vowel sounds in noise and the signal-to-noise ratio. To this end the S/N ratios were measured at which listeners could just discriminate a series of changes in fo in the range from 10% to 0.5%. Similar measurements were obtained for pulse trains and for pure tones as a reference for the results. A measure of S/N ratio based on an approximation of the critical bandwidth appeared to provide a fairly good predictor of the masked threshold of each signal, measured in a second experiment. Using this measure, it was found that a given change in the fundamental of a pulse train could be discriminated at a lower S/N ratio than in a pure tone with a frequency equal to that fundamental. The results for the vowel sounds were found to be in between those for a low-frequency pure tone and those for a pulse train. Owing to the signal-generation method (viz., changing fo by changing the sampling frequency), three cues could in principle be used to discriminate a change in the fundamental of a vowel: A change in the residue pitch, a change in the pitch of a single prominent harmonic, or a change in the spectral envelope of the signal. It can be inferred from the results that the subjects used that particular cue which yielded best performance. Which cue was optimal depended not only on the vowel but also on fo and on the presented change in fo.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

10.
The goal of this study was to measure the ability of adult hearing-impaired listeners to discriminate formant frequency for vowels in isolation, syllables, and sentences. Vowel formant discrimination for F1 and F2 for the vowels /I epsilon ae / was measured. Four experimental factors were manipulated including linguistic context (isolated vowels, syllables, and sentences), signal level (70 and 95 dB SPL), formant frequency, and cognitive load. A complex identification task was added to the formant discrimination task only for sentences to assess effects of cognitive load. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. Higher signal level also elevated formant thresholds primarily for F2. However, no effect of the additional identification task on the formant discrimination was observed. In comparable conditions, these hearing-impaired listeners had elevated thresholds for formant discrimination compared to young normal-hearing listeners primarily for F2. Altogether, poorer performance for formant discrimination for these adult hearing-impaired listeners was mainly caused by hearing loss rather than cognitive difficulty for tasks implemented in this study.  相似文献   

11.
We have recorded the responses of fibers in the cochlear nerve and cells in the cochlear nucleus of the anesthetized guinea pig to synthetic vowels [i], [a], and [u] at 60 and 80 dB SPL. Histograms synchronized to the pitch period of the vowel were constructed, and locking of the discharge to individual harmonics was estimated from these by Fourier transformation. In cochlear nerve fibers from the guinea pig, the responses were similar in all respects to those previously described for the cat. In particular, the average-localized-synchronized-rate functions (ALSR), computed from pooled data, had well-defined peaks corresponding to the formant frequencies of the three vowels at both sound levels. Analysis of the components dominating the discharge could also be used to determine the voice pitch and the frequency of the first formants. We have computed similar population measures over a sample of primarylike cochlear nucleus neurons. In these primarylike cochlear nucleus cell responses, the locking to the higher-frequency formants of the vowels is weaker than in the nerve. This results in a severe degradation of the peaks in the ALSR function at the second and third formant frequencies at least for [i] and [u]. This result is somewhat surprising in light of the reports that primarylike cochlear nucleus cells phaselock, as well as do cochlear nerve fibers.  相似文献   

12.
The ability to separate simultaneous auditory objects is crucial to infant auditory development. Music in particular relies on the ability to separate musical notes, chords, and melodic lines. Little research addresses how infants process simultaneous sounds. The present study used a conditioned head-turn procedure to examine whether 6-month-old infants are able to discriminate a complex tone (240 Hz, 500 ms, six harmonics in random phase with a 6 dB roll-off per octave) from a version with the third harmonic mistuned. Adults perceive such stimuli as containing two auditory objects, one with the pitch of the mistuned harmonic and the other with pitch corresponding to the fundamental of the complex tone. Adult thresholds were between 1% and 2% mistuning. Infants performed above chance levels for 8%, 6%, and 4% mistunings, with no significant difference between conditions. However, performance was not significantly different from chance for 2% mistuning and significantly worse for 2% compared to all larger mistunings. These results indicate that 6-month-old infants are sensitive to violations of harmonic structure and suggest that they are able to separate two simultaneously sounding objects.  相似文献   

13.
The ability of baboons to discriminate changes in the formant structures of a synthetic baboon grunt call and an acoustically similar human vowel (/epsilon/) was examined to determine how comparable baboons are to humans in discriminating small changes in vowel sounds, and whether or not any species-specific advantage in discriminability might exist when baboons discriminate their own vocalizations. Baboons were trained to press and hold down a lever to produce a pulsed train of a standard sound (e.g., /epsilon/ or a baboon grunt call), and to release the lever only when a variant of the sound occurred. Synthetic variants of each sound had the same first and third through fifth formants (F1 and F3-5), but varied in the location of the second formant (F2). Thresholds for F2 frequency changes were 55 and 67 Hz for the grunt and vowel stimuli, respectively, and were not statistically different from one another. Baboons discriminated changes in vowel formant structures comparable to those discriminated by humans. No distinct advantages in discrimination performances were observed when the baboons discriminated these synthetic grunt vocalizations.  相似文献   

14.
Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males).  相似文献   

15.
Ciocca and Darwin [V. Ciocca and C. J. Darwin, J. Acoust. Soc. Am. 105, 2421-2430 (1999)] reported that the shift in residue pitch caused by mistuning a single harmonic (the fourth out of the first 12) was the same when the mistuned harmonic was presented after the remainder of the complex as when it was simultaneous, even though subjects were asked to ignore the pure-tone percept. The present study tried to replicate this result, and investigated the role of the presence of the nominally mistuned harmonic in the matching sound. Subjects adjusted a "matching" sound so that its pitch equaled that of a subsequent 90-ms complex tone (12 harmonics of a 155-Hz F0), whose mistuned (+/-3%) third harmonic was presented either simultaneously with or after the remaining harmonics. In experiment 1, the matching sound was a harmonic complex whose third harmonic was either present or absent. In experiments 2A and 2B, the target and matching sound had nonoverlapping spectra. Pitch shifts were reduced both when the mistuned component was nonsimultaneous, and when the third harmonic was absent in the matching sound. The results indicate a shorter than originally estimated time window for obligatory integration of nonsimultaneous components into a virtual pitch.  相似文献   

16.
The purpose of this study was to determine whether children give more perceptual weight than do adults to dynamic spectral cues versus static cues. Listeners were 10 children between the ages of 3;8 and 4;1 (mean 3;11) and ten adults between the ages of 23;10 and 32;0 (mean 25;11). Three experimental stimulus conditions were presented, with each containing stimuli of 30 ms duration. The first experimental condition consisted of unchanging formant onset frequencies ranging in value from frequencies for [i] to those for [a], appropriate for a bilabial stop consonant context. The second two experimental conditions consisted of either an [i] or [a] onset frequency with a 25 ms portion of a formant transition whose trajectory was toward one of a series of target frequencies ranging from those for [i] to those for [a]. Results indicated that the children attended differently than the adults on both the [a] and [i] formant onset frequency cue to identify the vowels. The adults gave more equal weight to the [i]-onset and [a]-onset dynamic cues as reflected in category boundaries than the children did. For the [i]-onset condition, children were not as confident compared to adults in vowel perception, as reflected in slope analyses.  相似文献   

17.
Peta White   《Journal of voice》1999,13(4):570-582
High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children's spoken or sung vowel formants. Twenty-nine 11-year-old Swedish children were asked to produce 4 sustained spoken and sung vowels. In order to circumvent the problem of wide harmonic spacing, F1 and F2 measurements were taken from vowels produced with a sweeping F0. Experienced choir singers were selected as subjects in order to minimize the larynx height adjustments associated with pitch variation in less skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in girls than in boys suggesting longer vocal tracts in these preadolescent boys. Furthermore, formant scaling demonstrated vowel dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.  相似文献   

18.
19.
This study represents a first step toward understanding the contribution formant frequency makes to the perception of female voice categories. The effects of formant frequency and pitch on the perception of voice category were examined by constructing a perceptual study that used two sets of synthetic stimuli at various pitches throughout the female singing range. The first set was designed to test the effects of systematically varying formants 1 through 4. The second set was designed to test the relative effects of lower frequency formants (F1 and F2) versus higher frequency formants (F3 and F4) through construction of mixed stimuli. Generally, as the frequencies of all four formants decreased, perception of soprano voice category decreased at all but the highest pitch, A5. However, perception of soprano voice category also increased as a function of pitch. Listeners appeared to need agreement between all four formants to perceive voice categories. When upper and lower formants are inconsistent in frequency, listeners were unable to judge voice category, but they could use the inconsistent patterns to form perceptions about degree of jaw opening.  相似文献   

20.
The effects of age, sex, and vocal tract configuration on the glottal excitation signal in speech are only partially understood, yet understanding these effects is important for both recognition and synthesis of speech as well as for medical purposes. In this paper, three acoustic measures related to the voice source are analyzed for five vowels from 3145 CVC utterances spoken by 335 talkers (8-39 years old) from the CID database [Miller et al., Proceedings of ICASSP, 1996, Vol. 2, pp. 849-852]. The measures are: the fundamental frequency (F0), the difference between the "corrected" (denoted by an asterisk) first two spectral harmonic magnitudes, H1* - H2* (related to the open quotient), and the difference between the "corrected" magnitudes of the first spectral harmonic and that of the third formant peak, H1* - A3* (related to source spectral tilt). The correction refers to compensating for the influence of formant frequencies on spectral magnitude estimation. Experimental results show that the three acoustic measures are dependent to varying degrees on age and vowel. Age dependencies are more prominent for male talkers, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction. All talkers show a dependency of F0 on sex and on F3, and of H1* - A3* on vowel type. For low-pitched talkers (F0 < or = 175 Hz), H1* - H2* is positively correlated with F0 while for high-pitched talkers, H1* - H2* is dependent on F1 or vowel height. For high-pitched talkers there were no significant sex dependencies of H1* - H2* and H1* - A3*. The statistical significance of these results is shown.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号