首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The acoustic characteristics of sustained vowel have been widely investigated across various languages and ethnic groups. These acoustic measures, including fundamental frequency (F0), jitter (Jitt), relative average perturbation (RAP), five-point period perturbation quotient (PPQ5), shimmer (Shim), and 11-point amplitude perturbation quotient (APQ11) are not well established for Malaysian Malay young adults. This article studies the acoustic measures of Malaysian Malay adults using acoustical analysis. The study analyzed six sustained Malay vowels of 60 normal native Malaysian Malay adults with a mean of 21.19 years. The F0 values of Malaysian Malay males and females were reported as 134.85 ± 18.54 and 238.27 ± 24.06 Hz, respectively. Malaysian Malay females had significantly higher F0 than that of males for all the vowels. However, no significant differences were observed between the genders for the perturbation measures in all the vowels, except RAP in /e/. No significant F0 differences between the vowels were observed. Significant differences between the vowels were reported for all perturbation measures in Malaysian Malay males. As for Malaysian Malay females, significant differences between the vowels were reported for Shim and APQ11. Multiethnic comparisons indicate that F0 varies between Malaysian Malay and other ethnic groups. However, the perturbation measures cannot be directly compared, where the measures vary significantly across different speech analysis softwares.  相似文献   

2.
A stratified random sample of 20 males and 20 females matched for physiologic factors and cultural-linguistic markers was examined to determine differences in formant frequencies during prolongation of three vowels: [a], [i], and [u]. The ethnic and gender breakdown included four sets of 5 male and 5 female subjects comprised of Caucasian and African American speakers of Standard American English, native Hindi Indian speakers, and native Mandarin Chinese speakers. Acoustic measures were analyzed using the Computerized Speech Lab (4300B) from which formant histories were extracted from a 200-ms sample of each vowel token to obtain first formant (F1), second formant (F2), and third formant (F3) frequencies. Significant group differences for the main effect of culture and race were found. For the main effect gender, sexual dimorphism in vowel formants was evidenced for all cultures and races across all three vowels. The acoustic differences found are attributed to cultural-linguistic factors.  相似文献   

3.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

4.
Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in children's vowels, because children have higher fundamental frequencies (f0's) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of children's vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in children's vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.  相似文献   

5.
A study was undertaken to explore the effects of fixing the mandible with a bite block on the formant frequencies of the vowels [i a u] produced by two groups of children aged 4 and 5 and 7 and 8 years. Vowels produced in both normal and bite-block conditions were submitted to LPC analysis with windows placed over the first glottal pulse and at the vowel midpoint. For both groups of children, no differences were found in the frequencies of either the first or second formant between the normal and bite-block conditions. Results are discussed in relation to theories of the acquisition of speech motor control.  相似文献   

6.
"Throaty" voice quality has been regarded by voice pedagogues as undesired and even harmful. This study attempts to identify acoustic and physiological correlates of this quality. One male and one female subject read a text habitually and with a throaty voice quality. Oral pressure during p-occlusion was measured as an estimate of subglottal pressure. Long-term average spectrum analysis described the average spectrum characteristics. Sixteen syllables, perceptually evaluated with regard to throaty quality by five experts, were selected for analysis. Formant frequencies and voice source characteristics were measured by means of inverse filtering, and the vocal tract shape of the throaty and normal versions of the vowels [a,u,i,ae] of the male subject were recorded by magnetic resonance imaging. From this material, area functions were derived and their resonance frequencies were determined. The throaty versions of these four vowels all showed a pharynx that was narrower than in the habitually produced versions. To test the relevance of formant frequencies to perceived throaty quality, experts rated degree of throatiness in synthetic vowel samples, in which the measured formant frequency values of the subject were used. The main acoustic correlates of throatiness seemed to be an increase of F1, a decrease of F4, and in front vowels a decrease of F2, which presumably results from a narrowing of the pharynx. In the male subject, voice source parameters suggested a more hyperfunctional voice in throaty samples.  相似文献   

7.
This study examines cross-linguistic variation in the location of shared vowels in the vowel space across five languages (Cantonese, American English, Greek, Japanese, and Korean) and three age groups (2-year-olds, 5-year-olds, and adults). The vowels /a/, /i/, and /u/ were elicited in familiar words using a word repetition task. The productions of target words were recorded and transcribed by native speakers of each language. For correctly produced vowels, first and second formant frequencies were measured. In order to remove the effect of vocal tract size on these measurements, a normalization approach that calculates distance and angular displacement from the speaker centroid was adopted. Language-specific differences in the location of shared vowels in the formant values as well as the shape of the vowel spaces were observed for both adults and children.  相似文献   

8.
Peta White   《Journal of voice》1999,13(4):570-582
High-pitched productions present difficulties in formant frequency analysis due to wide harmonic spacing and poorly defined formants. As a consequence, there is little reliable data regarding children's spoken or sung vowel formants. Twenty-nine 11-year-old Swedish children were asked to produce 4 sustained spoken and sung vowels. In order to circumvent the problem of wide harmonic spacing, F1 and F2 measurements were taken from vowels produced with a sweeping F0. Experienced choir singers were selected as subjects in order to minimize the larynx height adjustments associated with pitch variation in less skilled subjects. Results showed significantly higher formant frequencies for speech than for singing. Formants were consistently higher in girls than in boys suggesting longer vocal tracts in these preadolescent boys. Furthermore, formant scaling demonstrated vowel dependent differences between boys and girls suggesting non-uniform differences in male and female vocal tract dimensions. These vowel-dependent sex differences were not consistent with adult data.  相似文献   

9.
Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males).  相似文献   

10.
The sound level of the singer's formant in professional singing   总被引:2,自引:0,他引:2  
The relative sound level of the "singer's formant," measured in a 1/3-oct band with a center frequency of 2.5 kHz for males and of 3.16 kHz for females, has been investigated for 14 professional singers, nine different modes of singing, nine different vowels, variations in overall sound-pressure level, and fundamental frequencies ranging from 98 up to 880 Hz. Variation in the sound level of the singer's formant due to differences among male singers was small (4 dB), the factors vowels (16 dB) and fundamental frequency (9-14 dB) had an intermediate effect, while the largest variation was found for differences among female singers (24 dB), between modes of singing (vocal effort) (23 dB), and in overall sound-pressure level (more than 30 dB). In spite of this great potential variability, for each mode of singing the sound level of the singer's formant was remarkably constant up to F0 = 392 Hz, due to adaptation of vocal effort. This may be explained as the result of the perceptual demand of a constant voice quality. The definition of the singer's formant is discussed.  相似文献   

11.
Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels.  相似文献   

12.
13.
A database is presented of measurements of the fundamental frequency, the frequencies of the first three formants, and the duration of the 15 vowels of Standard Dutch as spoken in the Netherlands (Northern Standard Dutch) and in Belgium (Southern Standard Dutch). The speech material consisted of read monosyllabic utterances in a neutral consonantal context (i.e., /sVs/). Recordings were made for 20 female talkers and 20 male talkers, who were stratified for the factors age, gender, and region. Of the 40 talkers, 20 spoke Northern Standard Dutch and 20 spoke Southern Standard Dutch. The results indicated that the nine monophthongal Dutch vowels /a [see symbol in text] epsilon i I [see symbol in text] u y Y/ can be separated fairly well given their steady-state characteristics, while the long mid vowels /e o ?/ and three diphthongal vowels /epsilon I [see symbol in text]u oey/ also require information about their dynamic characteristics. The analysis of the formant values indicated that Northern Standard Dutch and Southern Standard Dutch differ little in the formant frequencies at steady-state for the nine monophthongal vowels. Larger differences between these two language varieties were found for the dynamic specifications of the three long mid vowels, and, to a lesser extent, of the three diphthongal vowels.  相似文献   

14.
Formant frequencies in an old Estonian folk song performed by two female voices were estimated for two back vowels /a/ and /u/, and for two front vowels /e/ and /i/. Comparison of these estimates with formant frequencies in spoken Estonian vowels indicates a trend of the vowels to be clustered into two sets of front and back ones in the F1/F2 plane. Similar clustering has previously been shown to occur in opera and choir singing, especially with increasing fundamental frequency. The clustering in the present song, however, may also be due to a tendency for a mid vowel to be realized as a higher-beginning diphthong, which is characteristic of the North-Estonian coastal dialect area where the singers come from. No evidence of a "singer's formant" was found.  相似文献   

15.
This study represents a first step toward understanding the contribution formant frequency makes to the perception of female voice categories. The effects of formant frequency and pitch on the perception of voice category were examined by constructing a perceptual study that used two sets of synthetic stimuli at various pitches throughout the female singing range. The first set was designed to test the effects of systematically varying formants 1 through 4. The second set was designed to test the relative effects of lower frequency formants (F1 and F2) versus higher frequency formants (F3 and F4) through construction of mixed stimuli. Generally, as the frequencies of all four formants decreased, perception of soprano voice category decreased at all but the highest pitch, A5. However, perception of soprano voice category also increased as a function of pitch. Listeners appeared to need agreement between all four formants to perceive voice categories. When upper and lower formants are inconsistent in frequency, listeners were unable to judge voice category, but they could use the inconsistent patterns to form perceptions about degree of jaw opening.  相似文献   

16.
Formant dynamics in vowel nuclei contribute to vowel classification in English. This study examined listeners' ability to discriminate dynamic second formant transitions in synthetic high front vowels. Acoustic measurements were made from the nuclei (steady state and 20% and 80% of vowel duration) for the vowels /i, I, e, epsilon, ae/ spoken by a female in /bVd/ context. Three synthesis parameters were selected to yield twelve discrimination conditions: initial frequency value for F2 (2525, 2272, or 2068 Hz), slope direction (rising or falling), and duration (110 or 165 ms). F1 frequency was roved. In the standard stimuli, F0 and F1-F4 were steady state. In the comparison stimuli only F2 frequency varied linearly to reach a final frequency. Five listeners were tested under adaptive tracking to estimate the threshold for frequency extent, the minimal detectable difference in frequency between the initial and final F2 values, called deltaF extent. Analysis showed that initial F2 frequency and direction of movement for some F2 frequencies contributed to significant differences in deltaF extent. Results suggested that listeners attended to differences in the stimulus property of frequency extent (hertz), not formant slope (hertz/second). Formant extent thresholds were at least four times smaller than extents measured in the natural speech tokens, and 18 times smaller than for the diphthongized vowel /e/.  相似文献   

17.
Changes in magnitude and variability of duration, fundamental frequency, formant frequencies, and spectral envelope of children's speech are investigated as a function of age and gender using data obtained from 436 children, ages 5 to 17 years, and 56 adults. The results confirm that the reduction in magnitude and within-subject variability of both temporal and spectral acoustic parameters with age is a major trend associated with speech development in normal children. Between ages 9 and 12, both magnitude and variability of segmental durations decrease significantly and rapidly, converging to adult levels around age 12. Within-subject fundamental frequency and formant-frequency variability, however, may reach adult range about 2 or 3 years later. Differentiation of male and female fundamental frequency and formant frequency patterns begins at around age 11, becoming fully established around age 15. During that time period, changes in vowel formant frequencies of male speakers is approximately linear with age, while such a linear trend is less obvious for female speakers. These results support the hypothesis of uniform axial growth of the vocal tract for male speakers. The study also shows evidence for an apparent overshoot in acoustic parameter values, somewhere between ages 13 and 15, before converging to the canonical levels for adults. For instance, teenagers around age 14 differ from adults in that, on average, they show shorter segmental durations and exhibit less within-subject variability in durations, fundamental frequency, and spectral envelope measures.  相似文献   

18.
The current investigation studied whether adults, children with normally developing language aged 4-5 years, and children with specific language impairment, aged 5-6 years identified vowels on the basis of steady-state or transitional formant frequencies. Four types of synthetic tokens, created with a female voice, served as stimuli: (1) steady-state centers for the vowels [i] and [ae]; (2) voweless tokens with transitions appropriate for [bib] and [baeb]; (3) "congruent" tokens that combined the first two types of stimuli into [bib] and [baeb]; and (4) "conflicting" tokens that combined the transitions from [bib] with the vowel from [baeb] and vice versa. Results showed that children with language impairment identified the [i] vowel more poorly than other subjects for both the voweless and congruent tokens. Overall, children identified vowels most accurately in steady-state centers and congruent stimuli (ranging between 94%-96%). They identified the vowels on the basis of transitions only from "voweless" tokens with 89% and 83.5% accuracy for the normally developing and language impaired groups, respectively. Children with normally developing language used steady-state cues to identify vowels in 87% of the conflicting stimuli, whereas children with language impairment did so for 79% of the stimuli. Adults were equally accurate for voweless, steady-state, and congruent tokens (ranging between 99% to 100% accuracy) and used both steady-state and transition cues for vowel identification. Results suggest that most listeners prefer the steady state for vowel identification but are capable of using the onglide/offglide transitions for vowel identification. Results were discussed with regard to Nittrouer's developmental weighting shift hypothesis and Strange and Jenkin's dynamic specification theory.  相似文献   

19.
An analysis is presented of regional variation patterns in the vowel system of Standard Dutch as spoken in the Netherlands (Northern Standard Dutch) and Flanders (Southern Standard Dutch). The speech material consisted of read monosyllabic utterances in a neutral consonantal context (i.e., /sVs/). The analyses were based on measurements of the duration and the frequencies of the first two formants of the vowel tokens. Recordings were made for 80 Dutch and 80 Flemish speakers, who were stratified for the social factors gender and region. These 160 speakers were distributed across four regions in the Netherlands and four regions in Flanders. Differences between regional varieties were found for duration, steady-state formant frequencies, and spectral change of formant frequencies. Variation patterns in the spectral characteristics of the long mid vowels /e o ?/ and the diphthongal vowels /ei oey bacwards c u/ were in accordance with a recent theory of pronunciation change in Standard Dutch. Finally, it was found that regional information was present in the steady-state formant frequency measurements of vowels produced by professional language users.  相似文献   

20.
The effect of diminished auditory feedback on monophthong and diphthong production was examined in postlingually deafened Australian-English speaking adults. The participants were 4 female and 3 male speakers with severe to profound hearing loss, who were compared to 11 age- and accent-matched normally hearing speakers. The test materials were 5 repetitions of hVd words containing 18 vowels. Acoustic measures that were studied included F1, F2, discrete cosine transform coefficients (DCTs), and vowel duration information. The durational analyses revealed increased total vowel durations with a maintenance of the tense/lax vowel distinctions in the deafened speakers. The deafened speakers preserved a differentiated vowel space, although there were some gender-specific differences seen. For example, there was a retraction of F2 in the front vowels for the female speakers that did not occur in the males. However, all deafened speakers showed a close correspondence between the monophthong and diphthong formant movements that did occur. Gaussian classification highlighted vowel confusions resulting from changes in the deafened vowel space. The results support the view that postlingually deafened speakers maintain reasonably good speech intelligibility, in part by employing production strategies designed to bolster auditory feedback.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号