期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Vowel and consonant recognition of cochlear implant patients using formant-estimating speech processors

P J Blamey R C Dowell A M Brown G M Clark P M Seligman 《The Journal of the Acoustical Society of America》1987,82(1):48-57

Vowel and consonant confusion matrices were collected in the hearing alone (H), lipreading alone (L), and hearing plus lipreading (HL) conditions for 28 patients participating in the clinical trial of the multiple-channel cochlear implant. All patients were profound-to-totally deaf and "hearing" refers to the presentation of auditory information via the implant. The average scores were 49% for vowels and 37% for consonants in the H condition and the HL scores were significantly higher than the L scores. Information transmission and multidimensional scaling analyses showed that different speech features were conveyed at different levels in the H and L conditions. In the HL condition, the visual and auditory signals provided independent information sources for each feature. For vowels, the auditory signal was the major source of duration information, while the visual signal was the major source of first and second formant frequency information. The implant provided information about the amplitude envelope of the speech and the estimated frequency of the main spectral peak between 800 and 4000 Hz, which was useful for consonant recognition. A speech processor that coded the estimated frequency and amplitude of an additional peak between 300 and 1000 Hz was shown to increase the vowel and consonant recognition in the H condition by improving the transmission of first formant and voicing information. 相似文献

2.

Interactions of speaking condition and auditory feedback on vowel production in postlingually deaf adults with cochlear implants

Ménard L Polak M Denny M Burton E Lane H Matthies ML Marrone N Perkell JS Tiede M Vick J 《The Journal of the Acoustical Society of America》2007,121(6):3790-3801

This study investigates the effects of speaking condition and auditory feedback on vowel production by postlingually deafened adults. Thirteen cochlear implant users produced repetitions of nine American English vowels prior to implantation, and at one month and one year after implantation. There were three speaking conditions (clear, normal, and fast), and two feedback conditions after implantation (implant processor turned on and off). Ten normal-hearing controls were also recorded once. Vowel contrasts in the formant space (expressed in mels) were larger in the clear than in the fast condition, both for controls and for implant users at all three time samples. Implant users also produced differences in duration between clear and fast conditions that were in the range of those obtained from the controls. In agreement with prior work, the implant users had contrast values lower than did the controls. The implant users' contrasts were larger with hearing on than off and improved from one month to one year postimplant. Because the controls and implant users responded similarly to a change in speaking condition, it is inferred that auditory feedback, although demonstrably important for maintaining normative values of vowel contrasts, is not needed to maintain the distinctiveness of those contrasts in different speaking conditions. 相似文献

3.

Effects of spectral flattening on vowel identification

J R Dubno M F Dorman 《The Journal of the Acoustical Society of America》1987,82(5):1503-1511

The identification of front vowels was studied in normal-hearing listeners using stimuli whose spectra had been altered to approximate the spectrum of vowels processed by auditory filters similar to those that might accompany sensorineural hearing loss. In the first experiment, front vowels were identified with greater than 95% accuracy when the first formant was specified in a normal manner and the higher frequency formants were represented by a broad, flat spectral plateau ranging from approximately 1600 to 3500 Hz. In the second experiment, the bandwidth of the first formant was systematically widened for stimuli with already flattened higher frequency formants. Normal vowel identification was preserved until the first formant was widened to six times its normal bandwidth. These results may account for the coexistence of abnormal vowel masking patterns (indicating flattened auditory spectra) and normal vowel recognition. 相似文献

4.

The direct and indirect roles of fundamental frequency in vowel perception

Barreda S Nearey TM 《The Journal of the Acoustical Society of America》2012,131(1):466-477

Several experiments have found that changing the intrinsic f0 of a vowel can have an effect on perceived vowel quality. It has been suggested that these shifts may occur because f0 is involved in the specification of vowel quality in the same way as the formant frequencies. Another possibility is that f0 affects vowel quality indirectly, by changing a listener's assumptions about characteristics of a speaker who is likely to have uttered the vowel. In the experiment outlined here, participants were asked to listen to vowels differing in terms of f0 and their formant frequencies and report vowel quality and the apparent speaker's gender and size on a trial-by-trial basis. The results presented here suggest that f0 affects vowel quality mainly indirectly via its effects on the apparent-speaker characteristics; however, f0 may also have some residual direct effects on vowel quality. Furthermore, the formant frequencies were also found to have significant indirect effects on vowel quality by way of their strong influence on the apparent speaker. 相似文献

5.

Effect of speaking rate on vowel formant movements

T Gay 《The Journal of the Acoustical Society of America》1978,63(1):223-230

The purpose of this experiment was to study the effects of changes in speaking rate on both the attainment of acoustic vowel targets and the relative time and speed of movements toward these presumed targets. Four speakers produced a number of different CVC and CVCVC utterances at slow and fast speaking rates. Spectrographic measurements showed that the midpoint format frequencies of the different vowels did not vary as a function of rate. However, for fast speech the onset frequencies of second formant transitions were closer to their target frequencies while CV transition rates remained essentially unchanged, indicating that movement toward the vowel simply began earlier for fast speech. Changes in both speaking rate and lexical stress had different effects. For stressed vowels, an increase in speaking rate was accompanied primarily by a decrease in duration. However, destressed vowels, even if they were of the same duration as quickly produced stressed vowels, were reduced in overall amplitude, fundamental frequency, and to some extent, vowel color. These results suggest that speaking rate and lexical stress are controlled by two different mechanisms. 相似文献

6.

Developmental study of vowel formant frequencies in an imitation task.

R D Kent L L Forner 《The Journal of the Acoustical Society of America》1979,65(1):208-217

Imitations of ten synthesized vowels were recorded from 33 speakers including men, women, and children. The first three formant frequencies of the imitations were estimated from spectrograms and considered with respect to developmental patterns in vowel formant structure, uniform scale factors for vowel normalization, and formant variability. Strong linear effects were observed in the group data for imitations of most of the English vowels studied, and straight lines passing through the origin provided a satisfactory fit to linear F1--F2 plots of the English vowel data. Logarithmic transformations of the formant frequencies helped substantially to equalize the dispersion of the group data for different vowels, but formant scale factors were observed to vary somewhat with both formant number and vowel identity. Variability of formant frequency was least for F1 (s.d. of 60 Hz or less for English vowels of adult males) and about equal for F2 and F3 (s.d. of 100 Hz or less for English vowels of adult males). 相似文献

7.

Cross-linguistic studies of children's and adults' vowel spaces

Chung H Kong EJ Edwards J Weismer G Fourakis M Hwang Y 《The Journal of the Acoustical Society of America》2012,131(1):442-454

This study examines cross-linguistic variation in the location of shared vowels in the vowel space across five languages (Cantonese, American English, Greek, Japanese, and Korean) and three age groups (2-year-olds, 5-year-olds, and adults). The vowels /a/, /i/, and /u/ were elicited in familiar words using a word repetition task. The productions of target words were recorded and transcribed by native speakers of each language. For correctly produced vowels, first and second formant frequencies were measured. In order to remove the effect of vocal tract size on these measurements, a normalization approach that calculates distance and angular displacement from the speaker centroid was adopted. Language-specific differences in the location of shared vowels in the formant values as well as the shape of the vowel spaces were observed for both adults and children. 相似文献

8.

Place coding of vowel formants for cochlear implant patients

P J Blamey G M Clark 《The Journal of the Acoustical Society of America》1990,88(2):667-673

Four multiple-channel cochlear implant patients were tested with synthesized versions of the words "hid, head, had, hud, hod, hood" containing 1, 2, or 3 formants, and with a natural 2-formant version of the same words. The formant frequencies were encoded in terms of the positions of electrical stimulation in the cochlea. Loudness, duration, and fundamental frequency were kept fixed within the synthetic stimulus sets. The average recognition scores were 47%, 61%, 62%, and 79% for the synthesized 1-, 2-, and 3-format vowels and the natural vowels, respectively. These scores showed that the place coding of the first and second formant frequencies accounted for a large part of the vowel recognition of cochlear implant patients using these coding schemes. The recognition of the natural stimuli was significantly higher than recognition of the synthetic stimuli, indicating that extra cues such as loudness, duration, and fundamental frequency contributed to recognition of the spoken words. 相似文献

9.

A Comparison of Two Methods of Formant Frequency Estimation for High-Pitched Voices

Molly L. Erickson PhD Amy E. D''Alfonso 《Journal of voice》2002,16(2):147-171

This study sought to compare formant frequencies estimated from natural phonation to those estimated using two methods of artificial laryngeal stimulation: (1) stimulation of the vocal tract using an artificial larynx placed on the neck and (2) stimulation of the vocal tract using an artificial larynx with an attached tube placed in the oral cavity. Twenty males between the ages of 18 and 45 performed the following three tasks on the vowels /a/ and /i/: (1) 4 seconds of sustained vowel, (2) 2 seconds of sustained vowel followed by 2 seconds of artificial phonation via a neck placement, and (3) 4 seconds of sustained vowel, the last two of which were accompanied by artificial phonation via an oral placement. Frequencies for formants 1-4 were measured for each task at second 1 and second 3 using linear predictive coding. These measures were compared across second 1 and second 3, as well as across all three tasks. Neither of the methods of artificial laryngeal stimulation tested in this study yielded formant frequency estimates that consistently agreed with those obtained from natural phonation for both vowels and all formants. However, when estimating mean formant frequency data for samples of large N, each of the methods agreed with mean estimations obtained from natural phonation for specific vowels and formants. The greatest agreement was found for a neck placement of the artificial larynx on the vowel /a/. 相似文献

10.

Level and Center Frequency of the Singer''s Formant 总被引：2，自引：0，他引：2

Johan Sundberg 《Journal of voice》2001,15(2):176-186

The "singer's formant" is a prominent spectrum envelope peak near 3 kHz, typically found in voiced sounds produced by classical operatic singers. According to previous research, it is mainly a resonatory phenomenon produced by a clustering of formants 3, 4, and 5. Its level relative to the first formant peak varies depending on vowel, vocal loudness, and other factors. Its dependence on vowel formant frequencies is examined. Applying the acoustic theory of voice production, the level difference between the first and third formant is calulated for some standard vowels. The difference between observed and calculated levels is determined for various voices. It is found to vary considerably more between vowels sung by professional singers than by untrained voices. The center frequency of the singer's formant as determined from long-term spectrum analysis of commercial recordings is found to increase slightly with the pitch range of the voice classification. 相似文献

11.

Probing the independence of formant control using altered auditory feedback

MacDonald EN Purcell DW Munhall KG 《The Journal of the Acoustical Society of America》2011,129(2):955-965

Two auditory feedback perturbation experiments were conducted to examine the nature of control of the first two formants in vowels. In the first experiment, talkers heard their auditory feedback with either F1 or F2 shifted in frequency. Talkers altered production of the perturbed formant by changing its frequency in the opposite direction to the perturbation but did not produce a correlated alteration of the unperturbed formant. Thus, the motor control system is capable of fine-grained independent control of F1 and F2. In the second experiment, a large meta-analysis was conducted on data from talkers who received feedback where both F1 and F2 had been perturbed. A moderate correlation was found between individual compensations in F1 and F2 suggesting that the control of F1 and F2 is processed in a common manner at some level. While a wide range of individual compensation magnitudes were observed, no significant correlations were found between individuals' compensations and vowel space differences. Similarly, no significant correlations were found between individuals' compensations and variability in normal vowel production. Further, when receiving normal auditory feedback, most of the population exhibited no significant correlation between the natural variation in production of F1 and F2. 相似文献

12.

Synthesis fidelity and time-varying spectral change in vowels

Assmann PF Katz WF 《The Journal of the Acoustical Society of America》2005,117(2):886-895

Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels. 相似文献

13.

Acoustic parameters measured by a formant-estimating speech processor for a multiple-channel cochlear implant 总被引：1，自引：0，他引：1

P J Blamey R C Dowell G M Clark P M Seligman 《The Journal of the Acoustical Society of America》1987,82(1):38-47

In order to assess the limitations imposed on a cochlear implant system by a wearable speech processor, the parameters extracted from a set of 11 vowels and 24 consonants were examined. An estimate of the fundamental frequency EF 0 was derived from the zero crossings of the low-pass filtered envelope of the waveform. Estimates of the first and second formant frequencies EF 1 and EF 2 were derived from the zero crossings of the waveform, which was filtered in the ranges 300-1000 and 800-4000 Hz. Estimates of the formant amplitudes EA 1 and EA 2 were derived by peak detectors operating on the outputs of the same filters. For vowels, these parameters corresponded well to the first and second formants and gave sufficient information to identify each vowel. For consonants, the relative levels and onset times of EA 1 and EA 2 and the EF 0 values gave cues to voicing. The variation in time of EA 1, EA 2, EF 1, and EF 2 gave cues to the manner of articulation. Cues to the place of articulation were given by EF 1 and EF 2. When pink noise was added, the parameters were gradually degraded as the signal-to-noise ratio decreased. Consonants were affected more than vowels, and EF 2 was affected more than EF 1. Results for three good patients using a speech processor that coded EF 0 as an electric pulse rate, EF 1 and EF 2 as electrode positions, and EA 1 and EA 2 as electric current levels confirmed that the parameters were useful for recognition of vowels and consonants. Average scores were 76% for recognition of 11 vowels and 71% for 12 consonants in the hearing-alone condition. The error rates were 4% for voicing, 12% for manner, and 25% for place. 相似文献

14.

Vowel recognition via cochlear implants and noise vocoders: effects of formant movement and duration

Iverson P Smith CA Evans BG 《The Journal of the Acoustical Society of America》2006,120(6):3998-4006

Previous work has demonstrated that normal-hearing individuals use fine-grained phonetic variation, such as formant movement and duration, when recognizing English vowels. The present study investigated whether these cues are used by adult postlingually deafened cochlear implant users, and normal-hearing individuals listening to noise-vocoder simulations of cochlear implant processing. In Experiment 1, subjects gave forced-choice identification judgments for recordings of vowels that were signal processed to remove formant movement and/or equate vowel duration. In Experiment 2, a goodness-optimization procedure was used to create perceptual vowel space maps (i.e., best exemplars within a vowel quadrilateral) that included F1, F2, formant movement, and duration. The results demonstrated that both cochlear implant users and normal-hearing individuals use formant movement and duration cues when recognizing English vowels. Moreover, both listener groups used these cues to the same extent, suggesting that postlingually deafened cochlear implant users have category representations for vowels that are similar to those of normal-hearing individuals. 相似文献

15.

The effects of tongue loading and auditory feedback on vowel production

Leung MT Ciocca V 《The Journal of the Acoustical Society of America》2011,129(1):316-325

This study investigated the role of sensory feedback during the production of front vowels. A temporary aftereffect induced by tongue loading was employed to modify the somatosensory-based perception of tongue height. Following the removal of tongue loading, tongue height during vowel production was estimated by measuring the frequency of the first formant (F1) from the acoustic signal. In experiment 1, the production of front vowels following tongue loading was investigated either in the presence or absence of auditory feedback. With auditory feedback available, the tongue height of front vowels was not modified by the aftereffect of tongue loading. By contrast, speakers did not compensate for the aftereffect of tongue loading when they produced vowels in the absence of auditory feedback. In experiment 2, the characteristics of the masking noise were manipulated such that it masked energy either in the F1 region or in the region of the second and higher formants. The results showed that the adjustment of tongue height during the production of front vowels depended on information about F1 in the auditory feedback. These findings support the idea that speech goals include both auditory and somatosensory targets and that speakers are able to make use of information from both sensory modalities to maximize the accuracy of speech production. 相似文献

16.

Evaluating models of vowel perception

Molis MR 《The Journal of the Acoustical Society of America》2005,118(2):1062-1071

相似文献

17.

Effects of frequency modulated tones and vowel formants on perioral muscle activity during isometric lip rounding

Shimon Sapir Elizabeth DeRosier Andrea M. Simonson Amy Wohlert 《Journal of voice》1990,4(2)

Two studies were conducted to assess the sensitivity of perioral muscles to vowel-like auditory stimuli. In one study, normal young adults produced an isometric lip rounding gesture while listening to a frequency modulated tone (FMT). The fundamental of the FMT was modulated over time in a sinusoidal fashion near the frequency ranges of the first and second formants of the vowels /u/ and /i/ (rate of modulation = 4.5 or 7 Hz). In another study, normal young adults produced an isometric lip rounding gesture while listening to synthesized vowels whose formant frequencies were modulated over time in a sinusoidal fashion to simulate repetitive changes from the vowel /u/ to /i/ (rate of modulation = 2 or 4 Hz). The FMTs and synthesized vowels were presented binaurally via headphones at 75 and 60 dB SL, respectively. Muscle activity from the orbicularis oris superior and inferior and from lip retractors was recorded with surface electromyography (EMG). Signal averaging and spectral analysis of the rectified and smoothed EMG failed to show perioral muscle responses to the auditory stimuli. Implications for auditory feedback theories of speech control are discussed. 相似文献

18.

Acoustic and perceptual correlates of the non-nasal--nasal distinction for vowels 总被引：1，自引：0，他引：1

S Hawkins K N Stevens 《The Journal of the Acoustical Society of America》1985,77(4):1560-1575

For each of five vowels [i e a o u] following [t], a continuum from non-nasal to nasal was synthesized. Nasalization was introduced by inserting a pole-zero pair in the vicinity of the first formant in an all-pole transfer function. The frequencies and spacing of the pole and zero were systematically varied to change the degree of nasalization. The selection of stimulus parameters was determined from acoustic theory and the results of pilot experiments. The stimuli were presented for identification and discrimination to listeners whose language included a non-nasal--nasal vowel opposition (Gujarati, Hindi, and Bengali) and to American listeners. There were no significant differences between language groups in the 50% crossover points of the identification functions. Some vowels were more influenced by range and context effects than were others. The language groups showed some differences in the shape of the discrimination functions for some vowels. On the basis of the results, it is postulated that (1) there is a basic acoustic property of nasality, independent of the vowel, to which the auditory system responds in a distinctive way regardless of language background; and (2) there are one or more additional acoustic properties that may be used to various degrees in different languages to enhance the contrast between a nasal vowel and its non-nasal congener. A proposed candidate for the basic acoustic property is a measure of the degree of prominence of the spectral peak in the vicinity of the first formant. Additional secondary properties include shifts in the center of gravity of the low-frequency spectral prominence, leading to a change in perceived vowel height, and changes in overall spectral balance. 相似文献

19.

Speech coding in the auditory nerve: V. Vowels in background noise 总被引：1，自引：0，他引：1

B Delgutte N Y Kiang 《The Journal of the Acoustical Society of America》1984,75(3):908-918

Responses of auditory-nerve fibers to steady-state, two-formant vowels in low-pass background noise (S/N = 10 dB) were obtained in anesthetized cats. For fibers over a wide range of characteristic frequencies (CFs), the peaks in discharge rate at the onset of the vowel stimuli were nearly eliminated in the presence of noise. In contrast, strong effects of noise on fine time patterns of discharge were limited to CF regions that are far from the formant frequencies. One effect is a reduction in the amplitude of the response component at the fundamental frequency in the high-CF regions and for CFs between F1 and F2 when the formants are widely separated. A reduction in the amplitude of the response components at the formant frequencies, with concomitant increase in components near CF or low-frequency components occurs in CF regions where the signal-to-noise ratio is particularly low. The processing schemes that were effective for estimating the formant frequencies and fundamental frequency of vowels in quiet generally remain adequate in moderate-level background noise. Overall, the discharge patterns contain many cues for distinctions among the vowel stimuli, so that the central processor should be able to identify the different vowels, consistent with psychophysical performance at moderate signal-to-noise ratios. 相似文献

20.

Acoustic analysis of compensatory articulation in children

S R Baum W F Katz 《The Journal of the Acoustical Society of America》1988,84(5):1662-1668

A study was undertaken to explore the effects of fixing the mandible with a bite block on the formant frequencies of the vowels [i a u] produced by two groups of children aged 4 and 5 and 7 and 8 years. Vowels produced in both normal and bite-block conditions were submitted to LPC analysis with windows placed over the first glottal pulse and at the vowel midpoint. For both groups of children, no differences were found in the frequencies of either the first or second formant between the normal and bite-block conditions. Results are discussed in relation to theories of the acquisition of speech motor control. 相似文献