期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Neighboring spectral content influences vowel identification

Holt LL Lotto AJ Kluender KR 《The Journal of the Acoustical Society of America》2000,108(2):710-722

Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception. 相似文献

2.

Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants

MV Kondaurova TR Bergeson LC Dilley 《The Journal of the Acoustical Society of America》2012,132(2):1039-1049

Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. 相似文献

3.

A narrow band pattern-matching model of vowel perception

Hillenbrand JM Houde RA 《The Journal of the Acoustical Society of America》2003,113(2):1044-1055

The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a "flooring" procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /hVd/ context spoken by 45 men, 48 women, and 46 children. The narrow band model classified vowels in this database with a degree of accuracy (91.4%) approaching that of human listeners. 相似文献

4.

Integration of monaural and binaural evidence of vowel formants

Akeroyd MA Summerfield AQ 《The Journal of the Acoustical Society of America》2000,107(6):3394-3406

The intelligibility of speech is sustained at lower signal-to-noise ratios when the speech has a different interaural configuration from the noise. This paper argues that the advantage arises in part because listeners combine evidence of the spectrum of speech in the across-frequency profile of interaural decorrelation with evidence in the across-frequency profile of intensity. To support the argument, three experiments examined the ability of listeners to integrate and segregate evidence of vowel formants in these two profiles. In experiment 1, listeners achieved accurate identification of the members of a small set of vowels whose first formant was defined by a peak in one profile and whose second formant was defined by a peak in the other profile. This result demonstrates that integration is possible. Experiment 2 demonstrated that integration is not mandatory, insofar as listeners could report the identity of a vowel defined entirely in one profile despite the presence of a competing vowel in the other profile. The presence of the competing vowel reduced accuracy of identification, however, showing that segregation was incomplete. Experiment 3 demonstrated that segregation of the binaural vowel, in particular, can be increased by the introduction of an onset asynchrony between the competing vowels. The results of experiments 2 and 3 show that the intrinsic cues for segregation of the profiles are relatively weak. Overall, the results are compatible with the argument that listeners can integrate evidence of spectral peaks from the two profiles. 相似文献

5.

Evolving theories of vowel perception

W Strange 《The Journal of the Acoustical Society of America》1989,85(5):2081-2087

Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments. 相似文献

6.

Coding of vowellike signals in cochlear implant listeners

Laback B Deutsch WA Baumgartner WD 《The Journal of the Acoustical Society of America》2004,116(2):1208-1223

Neural-population interactions resulting from excitation overlap in multi-channel cochlear implants (CI) may cause blurring of the "internal" auditory representation of complex sounds such as vowels. In experiment I, confusion matrices for eight German steady-state vowellike signals were obtained from seven CI listeners. Identification performance ranged between 42% and 74% correct. On the basis of an information transmission analysis across all vowels, pairs of most and least frequently confused vowels were selected for each subject. In experiment II, vowel masking patterns (VMPs) were obtained using the previously selected vowels as maskers. The VMPs were found to resemble the "electrical" vowel spectra to a large extent, indicating a relatively weak effect of neural-population interactions. Correlation between vowel identification data and VMP spectral similarity, measured by means of several spectral distance metrics, showed that the CI listeners identified the vowels based on differences in the between-peak spectral information as well as the location of spectral peaks. The effect of nonlinear amplitude mapping of acoustic into "electrical" vowels, as performed in the implant processors, was evaluated separately and compared to the effect of neural-population interactions. Amplitude mapping was found to cause more blurring than neural-population interactions. Subjects exhibiting strong blurring effects yielded lower overall vowel identification scores. 相似文献

7.

Target spectral, dynamic spectral, and duration cues in infant perception of German vowels

Bohn OS Polka L 《The Journal of the Acoustical Society of America》2001,110(1):504-515

Previous studies of vowel perception have shown that adult speakers of American English and of North German identify native vowels by exploiting at least three types of acoustic information contained in consonant-vowel-consonant (CVC) syllables: target spectral information reflecting the articulatory target of the vowel, dynamic spectral information reflecting CV- and -VC coarticulation, and duration information. The present study examined the contribution of each of these three types of information to vowel perception in prelingual infants and adults using a discrimination task. Experiment 1 examined German adults' discrimination of four German vowel contrasts (see text), originally produced in /dVt/ syllables, in eight experimental conditions in which the type of vowel information was manipulated. Experiment 2 examined German-learning infants' discrimination of the same vowel contrasts using a comparable procedure. The results show that German adults and German-learning infants appear able to use either dynamic spectral information or target spectral information to discriminate contrasting vowels. With respect to duration information, the removal of this cue selectively affected the discriminability of two of the vowel contrasts for adults. However, for infants, removal of contrastive duration information had a larger effect on the discrimination of all contrasts tested. 相似文献

8.

Acoustic analysis and perception of vowels in stuttered speech

P Howell L Vause 《The Journal of the Acoustical Society of America》1986,79(5):1571-1579

In stuttered repetitions of a syllable, the vowel that occurs often sounds like schwa even when schwa is not intended. In this article, acoustic analyses are reported which show that the spectral properties of stuttered vowels are similar to the following fluent vowel, so it would appear that the stutterers are articulating the vowel appropriately. Though spectral properties of the stuttered vowels are normal, others are unusual: The stuttered vowels are low in amplitude and short in duration. In two experiments, the effects of amplitude and duration on perception of these vowels are examined. It is shown that, if the amplitude of stuttered vowels is made normal and their duration is lengthened, they sound more like the intended vowels. These experiments lead to the conclusion that low amplitude and short duration are the factors that cause stuttered vowels to sound like schwa. This differs from the view of certain clinicians and theorists who contend that stutterers actually articulate /schwa/'s when these are heard in stuttered speech. Implications for stuttering therapy are considered. 相似文献

9.

Coarticulatory influences on the perceived height of nasal vowels 总被引：1，自引：0，他引：1

R A Krakow P S Beddor L M Goldstein C A Fowler 《The Journal of the Acoustical Society of America》1988,83(3):1146-1158

Certain of the complex spectral effects of vowel nasalization bear a resemblance to the effects of modifying the tongue or jaw position with which the vowel is produced. Perceptual evidence suggests that listener misperceptions of nasal vowel height arise as a result of this resemblance. Whereas previous studies examined isolated nasal vowels, this research focused on the role of phonetic context in shaping listeners' judgments of nasal vowel height. Identification data obtained from native American English speakers indicated that nasal coupling does not necessarily lead to listener misperceptions of vowel quality when the vowel's nasality is coarticulatory in nature. The perceived height of contextually nasalized vowels (in a [bVnd] environment) did not differ from that of oral vowels (in a [bVd] environment) produced with the same tongue-jaw configuration. In contrast, corresponding noncontextually nasalized vowels (in a [bVd] environment) were perceived as lower in quality than vowels in the other two conditions. Presumably the listeners' lack of experience with distinctive vowel nasalization prompted them to resolve the spectral effects of noncontextual nasalization in terms of tongue or jaw height, rather than velic height. The implications of these findings with respect to sound changes affecting nasal vowel height are also discussed. 相似文献

10.

Native dialect influences second-language vowel perception: Peruvian versus Iberian Spanish learners of Dutch

Escudero P Williams D 《The Journal of the Acoustical Society of America》2012,131(5):EL406-EL412

Peruvian Spanish (PS) and Iberian Spanish (IS) learners were tested on their ability to categorically discriminate and identify Dutch vowels. It was predicted that the acoustic differences between the vowel productions of the two dialects, which compare differently to Dutch vowels, would manifest in differential L2 perception for listeners of these two dialects. The results show that although PS learners had higher general L2 proficiency, IS learners were more accurate at discriminating all five contrasts and at identifying six of the L2 Dutch vowels. These findings confirm that acoustic differences in native vowel production lead to differential L2 vowel perception. 相似文献

11.

The acquisition of vowel discriminations by nonhuman primates

R D Hienz J V Brady 《The Journal of the Acoustical Society of America》1988,84(1):186-194

Three adult male baboons were trained on a psychophysical procedure to discriminate five synthetic, steady-state vowel sounds [a), (ae), (c), (U), and (epsilon] from one another. A pulsed train of one vowel comprised the reference stimulus during a session. Animals were trained to press a lever and release the lever only when this reference vowel sound changed to one of the comparison vowels. All animals learned the vowel discriminations rapidly and, once learned, performed the discriminations at the 95%-100% correct level. During the initial acquisition of the discriminations, however, percent correct detections were higher for those vowels with greater spectral differences from the reference vowel. For some cases, the detection scores correlated closely with differences between first formant peaks, while for others the detection scores correlated more closely with differences between second formant peaks. Once the discriminations were acquired, no discriminability differences were apparent among the different vowels. Underlying discriminability differences were still present, however, and could be revealed by giving a minor tranquilizer (diazepam) that lowered discrimination performances. These drug-induced decrements in vowel discriminability were also correlated with spectral differences, with lower vowel discriminability scores found for those vowels with smaller spectral differences from the reference vowel. 相似文献

12.

Developmental and cross-linguistic variation in the infant vowel space: the case of Canadian English and Canadian French

Rvachew S Mattock K Polka L Ménard L 《The Journal of the Acoustical Society of America》2006,120(4):2250-2259

This article describes the results of two experiments. Experiment 1 was a cross-sectional study designed to explore developmental and cross-linguistic variation in the vowel space of 10- to 18-month-old infants, exposed to either Canadian English or Canadian French. Acoustic parameters of the infant vowel space were described (specifically the mean and standard deviation of the first and second formant frequencies) and then used to derive the grave, acute, compact, and diffuse features of the vowel space across age. A decline in mean F1 with age for French-learning infants and a decline in mean F2 with age for English-learning infants was observed. A developmental expansion of the vowel space into the high-front and high-back regions was also evident. In experiment 2, the Variable Linear Articulatory Model was used to model the infant vowel space taking into consideration vocal tract size and morphology. Two simulations were performed, one with full range of movement for all articulatory paramenters, and the other for movement of jaw and lip parameters only. These simulated vowel spaces were used to aid in the interpretation of the developmental changes and cross-linguistic influences on vowel production in experiment 1. 相似文献

13.

Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners

Ferguson SH Kewley-Port D 《The Journal of the Acoustical Society of America》2002,112(1):259-271

Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels. 相似文献

14.

Common features of the amplitude-frequency characteristics of vowels in different forms of speech

N. G. Andreeva G. A. Kulikov A. P. Samokishchuk 《Acoustical Physics》2002,48(5):620-622

According to classical concepts, the relationship between the first two formants is the feature that determines the identification of long vowels in speech. However, the characteristics of vowels may considerably vary depending on the conditions of their production. Thus, the aforementioned features that are valid for adult speech cannot be extended to speech signals with high fundamental frequencies, such as infant speech or singing. On the basis of the studies of preverbal infant vocalizations, singing, and speech imitation by talkingbirds, it is shown that the stable features of vowel-like sounds are the positions and amplitude ratios of the most pronounced spectral maxima (including those corresponding to the fundamental frequency). The results of the studies suggest that precisely these features determine the categorical identification of vowels. The role of the relationship between the frequency and amplitude characteristics in the vowel identification irrespective of the way the vowel is produced and the age and state of the speaker, as well as in the case of speech imitation by talkingbirds, is discussed. 相似文献

15.

Modeling the perception of concurrent vowels: vowels with the same fundamental frequency

P F Assmann Q Summerfield 《The Journal of the Acoustical Society of America》1989,85(1):327-338

The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds. 相似文献

16.

The effect of reduced vowel working space on speech intelligibility in Mandarin-speaking young adults with cerebral palsy

Liu HM Tsao FM Kuhl PK 《The Journal of the Acoustical Society of America》2005,117(6):3879-3889

The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners. 相似文献

17.

Spectral enhancement of Polish vowels to improve their identification by hearing impaired listeners

E Ozimek A S?k A Wicher E Skrodzka J Konieczny 《Applied Acoustics》2004,65(5):473-483

Abnormalities in the cochlear function usually cause broadening of the auditory filters which reduces the speech intelligibility. An attempt to apply a spectral enhancement algorithm has been undertaken to improve the identification of Polish vowels by subjects with cochlear-based hearing-impairment. The identification scores of natural (unprocessed) vowels and spectrally enhanced (processed) vowels has been measured for hearing-impaired subjects. It has been found that spectral enhancement improves vowel scores by about 10% for those subjects, however, a wide variation in individual performance among subjects has been observed. The overall vowels identification scores obtained were 85% for natural vowels and 96% for spectrally enhanced vowels. 相似文献

18.

Surrogate analysis for detecting nonlinear dynamics in normal vowels.

I Tokuda T Miyano K Aihara 《The Journal of the Acoustical Society of America》2001,110(6):3207-3217

Normal vowels are known to have irregularities in the pitch-to-pitch variation which is quite important for speech signals to be perceived as natural human sound. Such pitch-to-pitch variation of vowels is studied in the light of nonlinear dynamics. For the analysis, five normal vowels recorded from three male and two female subjects are exploited, where the vowel signals are shown to have normal levels of the pitch-to-pitch variation. First, by the false nearest-neighbor analysis, nonlinear dynamics of the vowels are shown to be well analyzed by using a relatively low-dimensional reconstructing dimension of 4 < or = d < or = 7. Then, we further studied nonlinear dynamics of the vowels by spike-and-wave surrogate analysis. The results imply that there exists nonlinear dynamical correlation between one pitch-waveform pattern to another in the vowel signals. On the basis of the analysis results, applicability of the nonlinear prediction technique to vowel synthesis is discussed. 相似文献

19.

Effects of roving level and spectral range on vowel formant discrimination

Liu C 《The Journal of the Acoustical Society of America》2011,130(4):EL264-EL270

Thresholds of vowel formant discrimination for F1 and F2 of isolated vowels with full and partial vowel spectra were measured for normal-hearing listeners at fixed and roving speech levels. Performance of formant discrimination was significantly better for fixed levels than for roving levels with both full and partial spectra. The effect of vowel spectral range was present only for roving levels, but not for fixed levels. These results, consistent with studies of profile analysis, indicated different perceptual mechanisms for listeners to discriminate vowel formant frequency at fixed and roving levels. 相似文献

20.

Recognition of spectrally degraded and frequency-shifted vowels in acoustic and electric hearing 总被引：5，自引：0，他引：5

Fu QJ Shannon RV 《The Journal of the Acoustical Society of America》1999,105(3):1889-1900

The present study measured the recognition of spectrally degraded and frequency-shifted vowels in both acoustic and electric hearing. Vowel stimuli were passed through 4, 8, or 16 bandpass filters and the temporal envelopes from each filter band were extracted by half-wave rectification and low-pass filtering. The temporal envelopes were used to modulate noise bands which were shifted in frequency relative to the corresponding analysis filters. This manipulation not only degraded the spectral information by discarding within-band spectral detail, but also shifted the tonotopic representation of spectral envelope information. Results from five normal-hearing subjects showed that vowel recognition was sensitive to both spectral resolution and frequency shifting. The effect of a frequency shift did not interact with spectral resolution, suggesting that spectral resolution and spectral shifting are orthogonal in terms of intelligibility. High vowel recognition scores were observed for as few as four bands. Regardless of the number of bands, no significant performance drop was observed for tonotopic shifts equivalent to 3 mm along the basilar membrane, that is, for frequency shifts of 40%-60%. Similar results were obtained from five cochlear implant listeners, when electrode locations were fixed and the spectral location of the analysis filters was shifted. Changes in recognition performance in electrical and acoustic hearing were similar in terms of the relative location of electrodes rather than the absolute location of electrodes, indicating that cochlear implant users may at least partly accommodate to the new patterns of speech sounds after long-time exposure to their normal speech processor. 相似文献