首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments.  相似文献   

4.
5.
The harmonic sieve has been proposed as a mechanism for excluding extraneous frequency components from the estimate of the pitch of a complex sound. The experiments reported here examine whether a harmonic sieve could also determine whether a particular harmonic contributes to the phonetic quality of a vowel. Mistuning a harmonic in the first formant region of vowels from an /I/-/e/ continuum gave shifts in the phoneme boundary that could be explained by (i) phase effects for small amounts of mistuning and (ii) a harmonic sievelike grouping mechanism for larger amounts of mistuning. Similar grouping criteria to those suggested for pitch may operate for the determination of first formant frequency in voiced speech.  相似文献   

6.
Spectral sharpness and vowel dissimilarity   总被引:1,自引:0,他引:1  
The effect of sharpening or smoothing the spectral envelopes of synthetic vowel-like sounds on the dissimilarities perceived among these sounds was investigated by means of triadic comparisons. When a spectral envelope (dB on a log-frequency scale) is considered the sum of a series of sinusoidal spectral modulations (or ripples) of different densities (the ripple spectrum), spectral sharpening or smoothing can be described as an amplification or attenuation of a part of the original ripple spectrum. For a set of nine sounds comprising different degrees of spectral sharpening of a single vowel, the perceived dissimilarities were found to be dominated by a specific part of the ripple spectrum, i.e., by spectral modulations with a density of about 2 ripples/oct. The possible role of lateral suppression in relation to this dominant region is discussed. For a set of 18 sounds comprising six vowels, each in three different versions (sharpened, normal, or smoothed), the dissimilarities were found to be determined mainly by the global shape of the spectral envelopes, i.e., by spectral modulations up to about 1.5-2 ripples/oct. Details of the spectral envelope (including the region of 2 ripples/oct where lateral suppression is effective) appear to be of minor influence on vowel dissimilarities.  相似文献   

7.
Acoustic measurements were conducted to determine the degree to which vowel duration, closure duration, and their ratio distinguish voicing of word-final stop consonants across variations in sentential and phonetic environments. Subjects read CVC test words containing three different vowels and ending in stops of three different places of articulation. The test words were produced either in nonphrase-final or phrase-final position and in several local phonetic environments within each of these sentence positions. Our measurements revealed that vowel duration most consistently distinguished voicing categories for the test words. Closure duration failed to consistently distinguish voicing categories across the contextual variables manipulated, as did the ratio of closure and vowel duration. Our results suggest that vowel duration is the most reliable correlate of voicing for word-final stops in connected speech.  相似文献   

8.
语音信号元音检测的新方法   总被引:1,自引:0,他引:1  
屈丹  王炳锡 《声学学报》2003,28(1):17-20
给出了语音信号元音检测的新方法。该方法基于语音声学信号的频谱分析,不需要任何学习过程,而且适用于多种语言。利用OGI多语占语音库的英语、汉语、日语、法语四种语音对该算法进行了检测,并给出了改进算法,以及两种算法的检测率。实验结果表明该方法是检测元音的一种有效方法。  相似文献   

9.
The first three formant frequencies for 778 steady-state tokens of 30 nonretroflex vowel types uttered by a female speaker are found to lie close to a piecewise-planar surface (expressed numerically as 0.634F1 +0.603F2 -- 0.485F3 -- 366 = 0, for F2 greater than 0.027F1 +1692 and 0.686F1 -- 0.528F2 -- 0.501F3 +1569 = 0, otherwise). The rms distance of the vowels from this surface is only 86 Hz. The intersection between the two planes is a line of nearly constant F2, corresponding closely to the F2 of a uniform vocal tract of the same length as our speaker's. The piecewise-planar representation also suggests a way to test the hypotheses of uniform and nonuniform formant-frequency scaling between speakers.  相似文献   

10.
11.
Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F0-dependent distortion. The problem is severe at high F0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F0-independency observed in vowel identification.  相似文献   

12.
A hybrid PARAFAC and principal-component model of tongue configuration in vowel production is presented, using a corpus of German vowels in multiple consonant contexts (fleshpoint data for seven speakers at two speech rates from electromagnetic articulography). The PARAFAC approach is attractive for explicitly separating speaker-independent and speaker-dependent effects within a parsimonious linear model. However, it proved impossible to derive a PARAFAC solution of the complete dataset (estimated to require three factors) due to complexities introduced by the consonant contexts. Accordingly, the final model was derived in two stages. First, a two-factor PARAFAC model was extracted. This succeeded; the result was treated as the basic vowel model. Second, the PARAFAC model error was subjected to a separate principal-component analysis for each subject. This revealed a further articulatory component mainly involving tongue-blade activity associated with the flanking consonants. However, the subject-specific details of the mapping from raw fleshpoint coordinates to this component were too complex to be consistent with the PARAFAC framework. The final model explained over 90% of the variance and gave a succinct and physiologically plausible articulatory representation of the German vowel space.  相似文献   

13.
Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.  相似文献   

14.
This paper presents a review of recent experiments in vowel perception done at the Pavlov Institute of Physiology in Leningrad. The data concern three topics: experimental procedures appropriate for the study of phonetic quality perception, processing of the auditory spectral shape of a vowel, and processing of the auditory dynamic spectrum of a vowel.  相似文献   

15.
This study was designed to examine the role of duration in vowel perception by testing listeners on the identification of CVC syllables generated at different durations. Test signals consisted of synthesized versions of 300 utterances selected from a large, multitalker database of /hVd/ syllables [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Four versions of each utterance were synthesized: (1) an original duration set (vowel duration matched to the original utterance), (2) a neutral duration set (duration fixed at 272 ms, the grand mean across all vowels), (3) a short duration set (duration fixed at 144 ms, two standard deviations below the mean), and (4) a long duration set (duration fixed at 400 ms, two standard deviations above the mean). Experiment 1 used a formant synthesizer, while a second experiment was an exact replication using a sinusoidal synthesis method that represented the original vowel spectrum more precisely than the formant synthesizer. Findings included (1) duration had a small overall effect on vowel identity since the great majority of signals were identified correctly at their original durations and at all three altered durations; (2) despite the relatively small average effect of duration, some vowels, especially [see text] and [see text], were significantly affected by duration; (3) some vowel contrasts that differ systematically in duration, such as [see text], and [see text], were minimally affected by duration; (4) a simple pattern recognition model appears to be capable of accounting for several features of the listening test results, especially the greater influence of duration on some vowels than others; and (5) because a formant synthesizer does an imperfect job of representing the fine details of the original vowel spectrum, results using the formant-synthesized signals led to a slight overestimate of the role of duration in vowel recognition, especially for the shortened vowels.  相似文献   

16.
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception.  相似文献   

17.
A geometrical method for computing overlap between vowel distributions, the spectral overlap assessment metric (SOAM), is applied to an investigation of spectral (F1, F2) and temporal (duration) relations in three different types of systems: one claimed to exhibit primary quality (American English), one primary quantity (Jamaican Creole), and one about which no claims have been made (Jamaican English). Shapes, orientations, and proximities of pairs of vowel distributions involved in phonological oppositions are modeled using best-fit ellipses (in F1 x F2 space) and ellipsoids (F1 x F2 x duration). Overlap fractions computed for each pair suggest that spectral and temporal features interact differently in the three varieties and oppositions. Under a two-dimensional analysis, two of three American English oppositions show no overlap; the third shows partial overlap. All Jamaican Creole oppositions exhibit complete overlap when F1 and F2 alone are modeled, but no or partial overlap with incorporation of a factor for duration. Jamaican English three-dimensional overlap fractions resemble two-dimensional results for American English. A multidimensional analysis tool such as SOAM appears to provide a more objective basis for simultaneously investigating spectral and temporal relations within vowel systems. Normalization methods and the SOAM method are described in an extended appendix.  相似文献   

18.
A question concerning the status of the Peterson-Barney vowel formant data is raised. Two machine-readable copies of the data were located, compared, and found to contain minor discrepancies. These discrepancies were resolved by comparison with a listing of the original data.  相似文献   

19.
20.
The identification of front vowels was studied in normal-hearing listeners using stimuli whose spectra had been altered to approximate the spectrum of vowels processed by auditory filters similar to those that might accompany sensorineural hearing loss. In the first experiment, front vowels were identified with greater than 95% accuracy when the first formant was specified in a normal manner and the higher frequency formants were represented by a broad, flat spectral plateau ranging from approximately 1600 to 3500 Hz. In the second experiment, the bandwidth of the first formant was systematically widened for stimuli with already flattened higher frequency formants. Normal vowel identification was preserved until the first formant was widened to six times its normal bandwidth. These results may account for the coexistence of abnormal vowel masking patterns (indicating flattened auditory spectra) and normal vowel recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号