首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An analysis is presented of regional variation patterns in the vowel system of Standard Dutch as spoken in the Netherlands (Northern Standard Dutch) and Flanders (Southern Standard Dutch). The speech material consisted of read monosyllabic utterances in a neutral consonantal context (i.e., /sVs/). The analyses were based on measurements of the duration and the frequencies of the first two formants of the vowel tokens. Recordings were made for 80 Dutch and 80 Flemish speakers, who were stratified for the social factors gender and region. These 160 speakers were distributed across four regions in the Netherlands and four regions in Flanders. Differences between regional varieties were found for duration, steady-state formant frequencies, and spectral change of formant frequencies. Variation patterns in the spectral characteristics of the long mid vowels /e o ?/ and the diphthongal vowels /ei oey bacwards c u/ were in accordance with a recent theory of pronunciation change in Standard Dutch. Finally, it was found that regional information was present in the steady-state formant frequency measurements of vowels produced by professional language users.  相似文献   

2.
Vowel perception strategies were assessed for two "average" and one "star" single-channel 3M/House and three "average" and one "star" Nucleus 22-channel cochlear implant patients and six normal-hearing control subjects. All subjects were tested by computer with real and synthetic speech versions of [symbol: see text], presented randomly. Duration, fundamental frequency, and first, second, and third formant frequency cues to the vowels were the vowels were systematically manipulated. Results showed high accuracy for the normal-hearing subjects in all conditions but that of the first formant alone. "Average" single-channel patients classified only real speech [hVd] syllables differently from synthetic steady state syllables. The "star" single-channel patient identified the vowels at much better than chance levels, with a results pattern suggesting effective use of first formant and duration information. Both "star" and "average" Nucleus users showed similar response patterns, performing better than chance in most conditions, and identifying the vowels using duration and some frequency information from all three formants.  相似文献   

3.
4.
Formant discrimination for isolated vowels presented in noise was investigated for normal-hearing listeners. Discrimination thresholds for F1 and F2, for the seven American English vowels /i, I, epsilon, ae, [symbol see text], a, u/, were measured under two types of noise, long-term speech-shaped noise (LTSS) and multitalker babble, and also under quiet listening conditions. Signal-to-noise ratios (SNR) varied from -4 to +4 dB in steps of 2 dB. All three factors, formant frequency, signal-to-noise ratio, and noise type, had significant effects on vowel formant discrimination. Significant interactions among the three factors showed that threshold-frequency functions depended on SNR and noise type. The thresholds at the lowest levels of SNR were highly elevated by a factor of about 3 compared to those in quiet. The masking functions (threshold vs SNR) were well described by a negative exponential over F1 and F2 for both LTSS and babble noise. Speech-shaped noise was a slightly more effective masker than multitalker babble, presumably reflecting small benefits (1.5 dB) due to the temporal variation of the babble.  相似文献   

5.
This study was designed to examine the role of duration in vowel perception by testing listeners on the identification of CVC syllables generated at different durations. Test signals consisted of synthesized versions of 300 utterances selected from a large, multitalker database of /hVd/ syllables [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. Four versions of each utterance were synthesized: (1) an original duration set (vowel duration matched to the original utterance), (2) a neutral duration set (duration fixed at 272 ms, the grand mean across all vowels), (3) a short duration set (duration fixed at 144 ms, two standard deviations below the mean), and (4) a long duration set (duration fixed at 400 ms, two standard deviations above the mean). Experiment 1 used a formant synthesizer, while a second experiment was an exact replication using a sinusoidal synthesis method that represented the original vowel spectrum more precisely than the formant synthesizer. Findings included (1) duration had a small overall effect on vowel identity since the great majority of signals were identified correctly at their original durations and at all three altered durations; (2) despite the relatively small average effect of duration, some vowels, especially [see text] and [see text], were significantly affected by duration; (3) some vowel contrasts that differ systematically in duration, such as [see text], and [see text], were minimally affected by duration; (4) a simple pattern recognition model appears to be capable of accounting for several features of the listening test results, especially the greater influence of duration on some vowels than others; and (5) because a formant synthesizer does an imperfect job of representing the fine details of the original vowel spectrum, results using the formant-synthesized signals led to a slight overestimate of the role of duration in vowel recognition, especially for the shortened vowels.  相似文献   

6.
A stratified random sample of 20 males and 20 females matched for physiologic factors and cultural-linguistic markers was examined to determine differences in formant frequencies during prolongation of three vowels: [a], [i], and [u]. The ethnic and gender breakdown included four sets of 5 male and 5 female subjects comprised of Caucasian and African American speakers of Standard American English, native Hindi Indian speakers, and native Mandarin Chinese speakers. Acoustic measures were analyzed using the Computerized Speech Lab (4300B) from which formant histories were extracted from a 200-ms sample of each vowel token to obtain first formant (F1), second formant (F2), and third formant (F3) frequencies. Significant group differences for the main effect of culture and race were found. For the main effect gender, sexual dimorphism in vowel formants was evidenced for all cultures and races across all three vowels. The acoustic differences found are attributed to cultural-linguistic factors.  相似文献   

7.
The goal of this study is to investigate coarticulatory resistance and aggressiveness for the jaw in Catalan consonants and vowels and, more specifically, for the alveolopalatal nasal //[symbol see text]/ and for dark /l/ for which there is little or no data on jaw position and coarticulation. Jaw movement data for symmetrical vowel-consonant-vowel sequences with the consonants /p, n, l, s, ∫, [ symbol see text], k/ and the vowels /i, a, u/ were recorded by three Catalan speakers with a midsagittal magnetometer. Data reveal that jaw height is greater for /s, ∫/ than for /p, [see text]/, which is greater than for /n, l, k/ during the consonant, and for /i, u/ than for /a/ during the vowel. Differences in coarticulatory variability among consonants and vowels are inversely related to differences in jaw height, i.e., fricatives and high vowels are most resistant, and /n, l, k/ and the low vowel are least resistant. Moreover, coarticulation resistant phonetic segments exert more prominent effects and, thus, are more aggressive than segments specified for a lower degree of coarticulatory resistance. Data are discussed in the light of the degree of articulatory constraint model of coarticulation.  相似文献   

8.
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).  相似文献   

9.
Moderately to profoundly hearing-impaired (n = 30) and normal-hearing (n = 6) listeners identified [p, k, t, f, theta, s] in [symbol; see text], and [symbol; see text]s tokens extracted from spoken sentences. The [symbol; see text]s were also identified in the sentences. The hearing-impaired group distinguished stop/fricative manner more poorly for [symbol; see text] in sentences than when extracted. Further, the group's performance for extracted [symbol; see text] was poorer than for extracted [symbol; see text] and [symbol; see text]. For the normal-hearing group, consonant identification was similar among the syllable and sentence contexts.  相似文献   

10.
Vowel formants play an important role in speech theories and applications; however, the same formant values measured for the steady-state part of a vowel can correspond to different vowel categories. Experimental evidence indicates that dynamic information can also contribute to vowel characterization. Hence, dynamically modeling formant transitions may lead to quantitatively testable predictions in vowel categorization. Because the articulatory strategy used to manage different speaking rates and contrastive stress may depend on speaker and situation, the parameter values of a dynamic formant model may vary with speaking rate and stress. In most experiments speaking rate is rarely controlled, only two or three rates are tested, and most corpora contain just a few repetitions of each item. As a consequence, the dependence of dynamic models on those factors is difficult to gauge. This article presents a study of 2300 [iai] or [i epsilon i] stimuli produced by two speakers at nine or ten speaking rates in a carrier sentence for two contrastive stress patterns. The corpus was perceptually evaluated by naive listeners. Formant frequencies were measured during the steady-state parts of the stimuli, and the formant transitions were dynamically and kinematically modeled. The results indicate that (1) the corpus was characterized by a contextual assimilation instead of a centralization effect; (2) dynamic or kinematic modeling was equivalent as far as the analysis of the model parameters was concerned; (3) the dependence of the model parameter estimates on speaking rate and stress suggests that the formant transitions were sharper for high speaking rate, but no consistent trend was found for contrastive stress; (4) the formant frequencies measured in the steady-state parts of the vowels were sufficient to explain the perceptual results while the dynamic parameters of the models were not.  相似文献   

11.
There is extensive evidence that in the same phonetic environment the voice fundamental frequency (Fo) of vowels varies directly with vowel "height." This Fo difference between vowels could be caused by acoustic interaction between the first vowel formant and the vibrating vocal folds. Since higher vowels have lower first formants than low vowels the acoustic interaction should be greatest for high vowels whose first formant frequencies are closer in frequency to Fo. Ten speakers were used to see if acoustic interaction could cause the Fo differences. The consonant [m] was recorded in the utterances [umu] and [ama]. Although the formant structure of [m] in [umu] and [ama] should not differ significantly, the Fo of each [m] allophone was significantly different. However, the Fo of each [m] allophone did not differ significantly from the Fo of the following vowel. These results did not support acoustic interaction. However, it is quite reasonable to conclude that the Fo variation of [m] was caused by coarticulatory anticipation of the tongue and jaw for the following vowel. Another experiment is offered in order to help explain the physical causes of intrinsic vowel Fo. In this experiment Fo lowering was found at the beginning of vowels following Arabic pharyngeal approximants. This finding indicates that the Fo of pharyngeal constricting vowels, e.g., [ae] and [a], might be lowered as a result of similar articulary movements, viz. tongue compression and active pharyngeal constriction.  相似文献   

12.
Three adult male baboons were trained on a psychophysical procedure to discriminate five synthetic, steady-state vowel sounds [a), (ae), (c), (U), and (epsilon] from one another. A pulsed train of one vowel comprised the reference stimulus during a session. Animals were trained to press a lever and release the lever only when this reference vowel sound changed to one of the comparison vowels. All animals learned the vowel discriminations rapidly and, once learned, performed the discriminations at the 95%-100% correct level. During the initial acquisition of the discriminations, however, percent correct detections were higher for those vowels with greater spectral differences from the reference vowel. For some cases, the detection scores correlated closely with differences between first formant peaks, while for others the detection scores correlated more closely with differences between second formant peaks. Once the discriminations were acquired, no discriminability differences were apparent among the different vowels. Underlying discriminability differences were still present, however, and could be revealed by giving a minor tranquilizer (diazepam) that lowered discrimination performances. These drug-induced decrements in vowel discriminability were also correlated with spectral differences, with lower vowel discriminability scores found for those vowels with smaller spectral differences from the reference vowel.  相似文献   

13.
The current investigation studied whether adults, children with normally developing language aged 4-5 years, and children with specific language impairment, aged 5-6 years identified vowels on the basis of steady-state or transitional formant frequencies. Four types of synthetic tokens, created with a female voice, served as stimuli: (1) steady-state centers for the vowels [i] and [ae]; (2) voweless tokens with transitions appropriate for [bib] and [baeb]; (3) "congruent" tokens that combined the first two types of stimuli into [bib] and [baeb]; and (4) "conflicting" tokens that combined the transitions from [bib] with the vowel from [baeb] and vice versa. Results showed that children with language impairment identified the [i] vowel more poorly than other subjects for both the voweless and congruent tokens. Overall, children identified vowels most accurately in steady-state centers and congruent stimuli (ranging between 94%-96%). They identified the vowels on the basis of transitions only from "voweless" tokens with 89% and 83.5% accuracy for the normally developing and language impaired groups, respectively. Children with normally developing language used steady-state cues to identify vowels in 87% of the conflicting stimuli, whereas children with language impairment did so for 79% of the stimuli. Adults were equally accurate for voweless, steady-state, and congruent tokens (ranging between 99% to 100% accuracy) and used both steady-state and transition cues for vowel identification. Results suggest that most listeners prefer the steady state for vowel identification but are capable of using the onglide/offglide transitions for vowel identification. Results were discussed with regard to Nittrouer's developmental weighting shift hypothesis and Strange and Jenkin's dynamic specification theory.  相似文献   

14.
The purpose of this study was to use vocal tract simulation and synthesis as means to determine the acoustic and perceptual effects of changing both the cross-sectional area and location of vocal tract constrictions for six different vowels: Area functions at and near vocal tract constrictions are considered critical to the acoustic output and are also the central point of hypotheses concerning speech targets. Area functions for the six vowels, [symbol: see text] were perturbed by changing the cross-sectional area of the constriction (Ac) and the location of the constriction (Xc). Perturbations for Ac were performed for different values of Xc, producing several series of acoustic continua for the different vowels. Acoustic simulations for the different area functions were made using a frequency domain model of the vocal tract. Each simulated vowel was then synthesized as a 1-s duration steady-state segment. The phoneme boundaries of the perturbed synthesized vowels were determined by formal perception tests. Results of the perturbation analyses showed that formants for each of the vowels were more sensitive to changes in constriction cross-sectional area than changes in constriction location. Vowel perception, however, was highly resistant to both types of changes. Results are discussed in terms of articulatory precision and constriction-related speech production strategies.  相似文献   

15.
An important problem in speech perception is to determine how humans extract the perceptually invariant place of articulation information in the speech wave across variable acoustic contexts. Although analyses have been developed that attempted to classify the voiced stops /b/ versus /d/ from stimulus onset information, most of the human perceptual research to date suggests that formant transition information is more important than onset information. The purpose of the present study was to determine if animal subjects, specifically Japanese macaque monkeys, are capable of categorizing /b/ versus /d/ in synthesized consonant-vowel (CV) syllables using only formant transition information. Three monkeys were trained to differentiate CV syllables with a "go-left" versus a "go-right" label. All monkeys first learned to differentiate a /za/ versus /da/ manner contrast and easily transferred to three new vowel contexts /[symbol: see text], epsilon, I/. Next, two of the three monkeys learned to differentiate a /ba/ versus /da/ stop place contrast, but were unable to transfer it to the different vowel contexts. These results suggest that animals may not use the same mechanisms as humans do for classifying place contrasts, and call for further investigation of animal perception of formant transition information versus stimulus onset information in place contrasts.  相似文献   

16.
The goal of this study was to measure the ability of adult hearing-impaired listeners to discriminate formant frequency for vowels in isolation, syllables, and sentences. Vowel formant discrimination for F1 and F2 for the vowels /I epsilon ae / was measured. Four experimental factors were manipulated including linguistic context (isolated vowels, syllables, and sentences), signal level (70 and 95 dB SPL), formant frequency, and cognitive load. A complex identification task was added to the formant discrimination task only for sentences to assess effects of cognitive load. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. Higher signal level also elevated formant thresholds primarily for F2. However, no effect of the additional identification task on the formant discrimination was observed. In comparable conditions, these hearing-impaired listeners had elevated thresholds for formant discrimination compared to young normal-hearing listeners primarily for F2. Altogether, poorer performance for formant discrimination for these adult hearing-impaired listeners was mainly caused by hearing loss rather than cognitive difficulty for tasks implemented in this study.  相似文献   

17.
Thresholds for formant frequency discrimination have been established using optimal listening conditions. In normal conversation, the ability to discriminate formant frequency is probably substantially degraded. The purpose of the present study was to change the listening procedures in several substantial ways from optimal towards more ordinary listening conditions, including a higher level of stimulus uncertainty, increased levels of phonetic context, and with the addition of a sentence identification task. Four vowels synthesized from a female talker were presented in isolation, or in the phonetic context of /bVd/ syllables, three-word phrases, or nine-word sentences. In the first experiment, formant resolution was estimated under medium stimulus uncertainty for three levels of phonetic context. Some undesirable training effects were obtained and led to the design of a new protocol for the second experiment to reduce this problem and to manipulate both length of phonetic context and level of difficulty in the simultaneous sentence identification task. Similar results were obtained in both experiments. The effect of phonetic context on formant discrimination is reduced as context lengthens such that no difference was found between vowels embedded in the phrase or sentence contexts. The addition of a challenging sentence identification task to the discrimination task did not degrade performance further and a stable pattern for formant discrimination in sentences emerged. This norm for the resolution of vowel formants under these more ordinary listening conditions was shown to be nearly a constant at 0.28 barks. Analysis of vowel spaces from 16 American English talkers determined that the closest vowels, on average, were 0.56 barks apart, that is, a factor of 2 larger than the norm obtained in these vowel formant discrimination tasks.  相似文献   

18.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

19.
The contribution of the nasal murmur and the vocalic formant transitions to perception of the [m]-[n] distinction in utterance-initial position preceding [i,a,u] was investigated, extending the recent work of Kurowski and Blumstein [J. Acoust. Soc. Am. 76, 383-390 (1984)]. A variety of waveform-editing procedures were applied to syllables produced by six different talkers. Listeners' judgments of the edited stimuli confirmed that the nasal murmur makes a significant contribution to place of articulation perception. Murmur and transition information appeared to be integrated at a genuinely perceptual, not an abstract cognitive, level. This was particularly evident in [-i] context, where only the simultaneous presence of murmur and transition components permitted accurate place of articulation identification. The perceptual information seemed to be purely relational in this case. It also seemed to be context specific, since the spectral change from the murmur to the vowel onset did not follow an invariant pattern across front and back vowels.  相似文献   

20.
The effects of age, sex, and vocal tract configuration on the glottal excitation signal in speech are only partially understood, yet understanding these effects is important for both recognition and synthesis of speech as well as for medical purposes. In this paper, three acoustic measures related to the voice source are analyzed for five vowels from 3145 CVC utterances spoken by 335 talkers (8-39 years old) from the CID database [Miller et al., Proceedings of ICASSP, 1996, Vol. 2, pp. 849-852]. The measures are: the fundamental frequency (F0), the difference between the "corrected" (denoted by an asterisk) first two spectral harmonic magnitudes, H1* - H2* (related to the open quotient), and the difference between the "corrected" magnitudes of the first spectral harmonic and that of the third formant peak, H1* - A3* (related to source spectral tilt). The correction refers to compensating for the influence of formant frequencies on spectral magnitude estimation. Experimental results show that the three acoustic measures are dependent to varying degrees on age and vowel. Age dependencies are more prominent for male talkers, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction. All talkers show a dependency of F0 on sex and on F3, and of H1* - A3* on vowel type. For low-pitched talkers (F0 < or = 175 Hz), H1* - H2* is positively correlated with F0 while for high-pitched talkers, H1* - H2* is dependent on F1 or vowel height. For high-pitched talkers there were no significant sex dependencies of H1* - H2* and H1* - A3*. The statistical significance of these results is shown.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号