首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The harmonic sieve has been proposed as a mechanism for excluding extraneous frequency components from the estimate of the pitch of a complex sound. The experiments reported here examine whether a harmonic sieve could also determine whether a particular harmonic contributes to the phonetic quality of a vowel. Mistuning a harmonic in the first formant region of vowels from an /I/-/e/ continuum gave shifts in the phoneme boundary that could be explained by (i) phase effects for small amounts of mistuning and (ii) a harmonic sievelike grouping mechanism for larger amounts of mistuning. Similar grouping criteria to those suggested for pitch may operate for the determination of first formant frequency in voiced speech.  相似文献   

2.
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception.  相似文献   

3.
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter.  相似文献   

4.
Vowel matching and identification experiments were carried out to investigate the perceptual contribution of harmonics in the first formant region of synthetic front vowels. In the first experiment, listeners selected the best phonetic match from an F1 continuum, for reference stimuli in which a band of two to five adjacent harmonics of equal intensity replaced the F1 peak; F1 values of best matches were near the frequency of the highest frequency harmonic in the band. Attenuation of the highest harmonic in the band resulted in lower F1 matches. Attenuation of the lowest harmonic had no significant effects, except in the case of a 2-harmonic band, where higher F1 matches were selected. A second experiment investigated the shifts in matched F1 resulting from an intensity increment to either one of a pair of harmonics in the F1 region. These shifts were relatively invariant over different harmonic frequencies and proportional to the fundamental frequency. A third experiment used a vowel identification task to determine phoneme boundaries on an F1 continuum. These boundaries were not substantially altered when the stimuli comprised only the two most prominent harmonics in the F1 region, or these plus either the higher or lower frequency subset of the remaining F1 harmonics. The results are consistent with an estimation procedure for the F1 peak which assigns greatest weight to the two most prominent harmonics in the first formant region.  相似文献   

5.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

6.
Two signal-processing algorithms, designed to separate the voiced speech of two talkers speaking simultaneously at similar intensities in a single channel, were compared and evaluated. Both algorithms exploit the harmonic structure of voiced speech and require a difference in fundamental frequency (F0) between the voices to operate successfully. One attenuates the interfering voice by filtering the cepstrum of the combined signal. The other uses the method of harmonic selection [T. W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)] to resynthesize the target voice from fragmentary spectral information. Two perceptual evaluations were carried out. One involved the separation of pairs of vowels synthesized on static F0's; the other involved the recovery of consonant-vowel (CV) words masked by a synthesized vowel. Normal-hearing listeners and four listeners with moderate-to-severe, bilateral, symmetrical, sensorineural hearing impairments were tested. All listeners showed increased accuracy of identification when the target voice was enhanced by processing. The vowel-identification data show that intelligibility enhancement is possible over a range of F0 separations between the target and interfering voice. The recovery of CV words demonstrates that the processing is valid not only for spectrally static vowels but also for less intense time-varying voiced consonants. The results for the impaired listeners suggest that the algorithms may be applicable as components of a noise-reduction system in future digital signal-processing hearing aids. The vowel-separation test, and subjective listening, suggest that harmonic selection, which is the more computationally expensive method, produces the more effective voice separation.  相似文献   

7.
F1 structure provides information for final-consonant voicing   总被引:1,自引:0,他引:1  
Previous research has shown that F1 offset frequencies are generally lower for vowels preceding voiced consonants than for vowels preceding voiceless consonants. Furthermore, it has been shown that listeners use these differences in offset frequency in making judgments about final-consonant voicing. A recent production study [W. Summers, J. Acoust. Soc. Am. 82, 847-863 (1987)] reported that F1 frequency differences due to postvocalic voicing are not limited to the final transition or offset region of the preceding vowel. Vowels preceding voiced consonants showed lower F1 onset frequencies and lower F1 steady-state frequencies than vowels preceding voiceless consonants. The present study examined whether F1 frequency differences in the initial transition and steady-state regions of preceding vowels affect final-consonant voicing judgments in perception. The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.  相似文献   

8.
Previous studies suggest that speakers are systematically inaccurate, or biased, when imitating self-produced vowels. The direction of these biases in formant space and their variation may offer clues about the organization of the vowel perceptual space. To examine these patterns, three male speakers were asked to imitate 45 self-produced vowels that were systematically distributed in F1/F2 space. All three speakers showed imitation bias, and the bias magnitudes were significantly larger than those predicted by a model of articulatory noise. Each speaker showed a different pattern of bias directions, but the pattern was unrelated to the locations of prototypical vowels produced by that speaker. However, there were substantial quantitative regularities: (1) The distribution of imitation variability and bias magnitudes were similar for all speakers, (2) the imitation variability was independent of the bias magnitudes, and (3) the imitation variability (a production measure) was commensurate with the formant discrimination limen (a perceptual measure). These results indicate that there is additive Gaussian noise in the imitation process that independently affects each formant and that there are speaker-dependent and potentially nonlinguistic biases in vowel perception and production.  相似文献   

9.
The perception of subphonemic differences between vowels was investigated using multidimensional scaling techniques. Three experiments were conducted with natural-sounding synthetic stimuli generated by linear predictive coding (LPC) formant synthesizers. In the first experiment, vowel sets near the pairs (i-I), (epsilon-ae), or (u-U) were synthesized containing 11 vowels each. Listeners judged the dissimilarities between all pairs of vowels within a set several times. These perceptual differences were mapped into distances between the vowels in an n-dimensional space using two-way multidimensional scaling. Results for each vowel set showed that the physical stimulus space, which was specified by the two parameters F1 and F2, was always mapped into a two-dimensional perceptual space. The best metric for modeling the perceptual distances was the Euclidean distance between F1 and F2 in barks. The second experiment investigated the perception of the same vowels from the first experiment, but embedded in a consonantal context. Following the same procedures as experiment 1, listeners' perception of the (bv) dissimilarities was not different from their perception of the isolated vowel dissimilarities. The third experiment investigated dissimilarity judgments for the three vowels (ae-alpha-lambda) located symmetrically in the F1 X F2 vowel space. While the perceptual space was again two dimensional, the influence of phonetic identity on vowel difference judgments was observed. Implications for determining metrics for subphonemic vowel differences using multidimensional scaling are discussed.  相似文献   

10.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.  相似文献   

11.
Previous work has demonstrated that normal-hearing individuals use fine-grained phonetic variation, such as formant movement and duration, when recognizing English vowels. The present study investigated whether these cues are used by adult postlingually deafened cochlear implant users, and normal-hearing individuals listening to noise-vocoder simulations of cochlear implant processing. In Experiment 1, subjects gave forced-choice identification judgments for recordings of vowels that were signal processed to remove formant movement and/or equate vowel duration. In Experiment 2, a goodness-optimization procedure was used to create perceptual vowel space maps (i.e., best exemplars within a vowel quadrilateral) that included F1, F2, formant movement, and duration. The results demonstrated that both cochlear implant users and normal-hearing individuals use formant movement and duration cues when recognizing English vowels. Moreover, both listener groups used these cues to the same extent, suggesting that postlingually deafened cochlear implant users have category representations for vowels that are similar to those of normal-hearing individuals.  相似文献   

12.
Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments.  相似文献   

13.
Auditory-perceptual interpretation of the vowel   总被引:1,自引:0,他引:1  
  相似文献   

14.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

15.
Speech coding in the auditory nerve: V. Vowels in background noise   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios.  相似文献   

16.
Simultaneous tones that are harmonically related tend to be grouped perceptually to form a unitary auditory image. A partial that is mistuned stands out from the other tones, and harmonic complexes with different fundamental frequencies can readily be perceived as separate auditory objects. These phenomena are evidence for the strong role of harmonicity in perceptual grouping and segregation of sounds. This study measured the discriminability of harmonicity directly. In a two interval, two alternative forced-choice (2I2AFC) paradigm, the listener chose which of two sounds, signal or foil, was composed of tones that more closely matched an exact harmonic relationship. In one experiment, the signal was varied from perfectly harmonic to highly inharmonic by adding frequency perturbation to each component. The foil always had 100% perturbation. Group mean performance decreased from greater than 90% correct for 0% signal perturbation to near chance for 80% signal perturbation. In the second experiment, adding a masker presented simultaneously with the signals and foils disrupted harmonicity. Both monaural and dichotic conditions were tested. Signal level was varied relative to masker level to obtain psychometric functions from which slopes and midpoints were estimated. Dichotic presentation of these audible stimuli improved performance by 3-10 dB, due primarily to a release from "informational masking" by the perceptual segregation of the signal from the masker.  相似文献   

17.
Level and Center Frequency of the Singer''s Formant   总被引:2,自引:0,他引:2  
Johan Sundberg   《Journal of voice》2001,15(2):176-186
The "singer's formant" is a prominent spectrum envelope peak near 3 kHz, typically found in voiced sounds produced by classical operatic singers. According to previous research, it is mainly a resonatory phenomenon produced by a clustering of formants 3, 4, and 5. Its level relative to the first formant peak varies depending on vowel, vocal loudness, and other factors. Its dependence on vowel formant frequencies is examined. Applying the acoustic theory of voice production, the level difference between the first and third formant is calulated for some standard vowels. The difference between observed and calculated levels is determined for various voices. It is found to vary considerably more between vowels sung by professional singers than by untrained voices. The center frequency of the singer's formant as determined from long-term spectrum analysis of commercial recordings is found to increase slightly with the pitch range of the voice classification.  相似文献   

18.
Questions exist as to the intelligibility of vowels sung at extremely high fundamental frequencies and, especially, when the fundamental frequency (F0) produced is above the region where the first vowel formant (F1) would normally occur. Can such vowels be correctly identified and, if so, does context provide the necessary information or are acoustical elements also operative? To this end, 18 professional singers (5 males and 13 females) were recorded when singing 3 isolated vowels at high and low pitches at both loud and soft levels. Aural-perceptual studies employing four types of auditors were carried out to determine the identity of these vowels, and the nature of the confusions with other vowels. Subsequent acoustical analysis focused on the actual fundamental frequencies sung plus those defining the first 2 vowel formants. It was found that F0 change had a profound effect on vowel perception; one of the more important observations was that the target tended to shift toward vowels with an F1 just above the sung frequency.  相似文献   

19.
Previous research has indicated that frequency discrimination performance is poorer for tones presented near the sharp spectral edge of a low-pass noise than for tones presented near the edge of a high-pass noise, or for tones in the same low-pass noise with high-pass noise added [Emmerich et al., J. Acoust. Soc. Am. 80, 1668-1672 (1986)]. The present study extends these findings in order to investigate how the steepness of the spectral edges of low- and high-pass maskers influences the discriminability of tones presented near these edges. Frequency discrimination was measured in each of three high- and low-pass noise backgrounds (which differed in the steepness of their filter skirts). The following results were obtained: (1) In the low-pass noise background, frequency discrimination performance improved as the filter skirt became more gradual; (2) in the high-pass noise background, performance first improved and then became poorer as the filter skirt became shallower; and (3) performance in low-pass noise was poorer than that in high-pass noise for the two steepest slopes employed (96 and 72 dB/oct) but not for the shallower slope (36 dB/oct). Results are discussed in the context of lateral suppression and edge pitch effects, and of a trade-off between possible edge effects and masking.  相似文献   

20.
The spectral envelope is a major determinant of the perceptual identity of many classes of sound including speech. When sounds are transmitted from the source to the listener, the spectral envelope is invariably and diversely distorted, by factors such as room reverberation. Perceptual compensation for spectral-envelope distortion was investigated here. Carrier sounds were distorted by spectral envelope difference filters whose frequency response is the spectral envelope of one vowel minus the spectral envelope of another. The filter /I/ minus /e/ and its inverse were used. Subjects identified a test sound that followed the carrier. The test sound was drawn from an /Itch/ to /etch/ continuum. Perceptual compensation produces a phoneme boundary difference between /I/ minus /e/ and its inverse. Carriers were the phrase "the next word is" spoken by the same (male) speaker as the test sounds, signal-correlated noise derived from this phrase, the same phrase spoken by a female speaker, male and female versions played backwards, and a repeated end-point vowel. The carrier and test were presented to the same ear, to different ears, and from different apparent directions (by varying interaural time delay). The results show that compensation is unlike peripheral phenomena, such as adaptation, and unlike phonetic perceptual phenomena. The evidence favors a central, auditory mechanism.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号