首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The purpose of this study was to reexamine the factors leading to stop-consonant perception for consonant-vowel (CV) stimuli with just two formants over a range of vowels, under both an open- and closed-response condition. Five two-formant CV stimulus continua were synthesized, each covering a range of second-formant (F2) starting frequencies, for vowels corresponding roughly to [i,I,ae,u,a]. In addition, for the [I] and [a] continua, the duration of the first-formant (F1) transition was systematically varied. Three main findings emerged. First, criterion-level labial and alveolar responses were obtained for those stimuli with substantial F2 transitions. Second, for some stimuli, increases in the duration of the F1 transition increased velar responses to criterion level. Third, the response paradigm had a substantial influence on stop-consonant perception across all vowel continua. The results support a model of stop-consonant perception that includes spectral and time-varying spectral properties as integral components of analysis.  相似文献   

2.
This study investigated changes in vowel production and perception among university students from the north of England, as individuals adapt their accent from regional to educated norms. Subjects were tested in their production and perception at regular intervals over a period of 2 years: before beginning university, 3 months later, and at the end of their first and second years at university. At each testing session, subjects were recorded reading a set of experimental words and a short passage. Subjects also completed two perceptual tasks; they chose best exemplar locations for vowels embedded in either northern or southern English accented carrier sentences and identified words in noise spoken with either a northern or southern English accent. The results demonstrated that subjects at a late stage in their language development, early adulthood, changed their spoken accent after attending university. There were no reliable changes in perception over time, but there was evidence for a between-subjects link between production and perception; subjects chose similar vowels to the ones they produced, and subjects who had a more southern English accent were better at identifying southern English speech in noise.  相似文献   

3.
Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments.  相似文献   

4.
5.
Vowel perception studies were conducted on a group of four adolescent children with congenital profound sensorineural hearing impairments in the three conditions of audition alone, vision alone, and audition plus vision. Data were analyzed using the ALSCAL multidimensional scaling procedure to identify the underlying dimensions and individual differences in dimension emphasis. The three dimensions obtained from the analysis of data for the audition alone condition were interpreted as the parameters of first and second formant frequencies, and vowel length. The one dimension for the vision alone condition was interpreted as the parameter of the width of the internal lip opening. The three dimensions for the audition plus vision condition were interpreted as the parameters of first formant frequency, vowel length, and the width of the internal lip opening. Subject variations in parameter preferences were observed for the audition alone and audition plus vision conditions but not for the vision alone condition.  相似文献   

6.
To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.  相似文献   

7.
Selective adaption and anchoring effects in speech perception have generated several different hypotheses regarding the nature of contextual contrast, including auditory/phonetic feature detector fatigue, response bias, and auditory contrast. In the present study three different seven-step [hId]-[h epsilon d] continua were constructed to represent a low F0 (long vocal tract source), a high F0 (long vocal tract source), and a high F0 (short vocal tract source), respectively. Subjects identified the tokens from each of the stimulus continua under two conditions: an equiprobable control and an anchoring condition which included an endpoint stimulus from one of the three continua occurring at least three times more often than any other single stimulus. Differential contrast effects were found depending on whether the anchor differed from the test stimuli in terms of F0, absolute formant frequencies, or both. Results were inconsistent with both the feature detector fatigue and response bias hypothesis. Rather, the obtained data suggest that vowel contrast occurs on the basis of normalized formant values, thus supporting a version of the auditory-contrast theory.  相似文献   

8.
Static, dynamic, and relational properties in vowel perception   总被引:2,自引:0,他引:2  
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues.  相似文献   

9.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

10.
This study explored how across-talker differences influence non-native vowel perception. American English (AE) and Korean listeners were presented with recordings of 10 AE vowels in /bVd/ context. The stimuli were mixed with noise and presented for identification in a 10-alternative forced-choice task. The two listener groups heard recordings of the vowels produced by 10 talkers at three signal-to-noise ratios. Overall the AE listeners identified the vowels 22% more accurately than the Korean listeners. There was a wide range of identification accuracy scores across talkers for both AE and Korean listeners. At each signal-to-noise ratio, the across-talker intelligibility scores were highly correlated for AE and Korean listeners. Acoustic analysis was conducted for 2 vowel pairs that exhibited variable accuracy across talkers for Korean listeners but high identification accuracy for AE listeners. Results demonstrated that Korean listeners' error patterns for these four vowels were strongly influenced by variability in vowel production that was within the normal range for AE talkers. These results suggest that non-native listeners are strongly influenced by across-talker variability perhaps because of the difficulty they have forming native-like vowel categories.  相似文献   

11.
12.
13.
The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a "flooring" procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /hVd/ context spoken by 45 men, 48 women, and 46 children. The narrow band model classified vowels in this database with a degree of accuracy (91.4%) approaching that of human listeners.  相似文献   

14.
Several experiments have found that changing the intrinsic f0 of a vowel can have an effect on perceived vowel quality. It has been suggested that these shifts may occur because f0 is involved in the specification of vowel quality in the same way as the formant frequencies. Another possibility is that f0 affects vowel quality indirectly, by changing a listener's assumptions about characteristics of a speaker who is likely to have uttered the vowel. In the experiment outlined here, participants were asked to listen to vowels differing in terms of f0 and their formant frequencies and report vowel quality and the apparent speaker's gender and size on a trial-by-trial basis. The results presented here suggest that f0 affects vowel quality mainly indirectly via its effects on the apparent-speaker characteristics; however, f0 may also have some residual direct effects on vowel quality. Furthermore, the formant frequencies were also found to have significant indirect effects on vowel quality by way of their strong influence on the apparent speaker.  相似文献   

15.
The purpose of this study was to determine whether children give more perceptual weight than do adults to dynamic spectral cues versus static cues. Listeners were 10 children between the ages of 3;8 and 4;1 (mean 3;11) and ten adults between the ages of 23;10 and 32;0 (mean 25;11). Three experimental stimulus conditions were presented, with each containing stimuli of 30 ms duration. The first experimental condition consisted of unchanging formant onset frequencies ranging in value from frequencies for [i] to those for [a], appropriate for a bilabial stop consonant context. The second two experimental conditions consisted of either an [i] or [a] onset frequency with a 25 ms portion of a formant transition whose trajectory was toward one of a series of target frequencies ranging from those for [i] to those for [a]. Results indicated that the children attended differently than the adults on both the [a] and [i] formant onset frequency cue to identify the vowels. The adults gave more equal weight to the [i]-onset and [a]-onset dynamic cues as reflected in category boundaries than the children did. For the [i]-onset condition, children were not as confident compared to adults in vowel perception, as reflected in slope analyses.  相似文献   

16.
Vowel formants play an important role in speech theories and applications; however, the same formant values measured for the steady-state part of a vowel can correspond to different vowel categories. Experimental evidence indicates that dynamic information can also contribute to vowel characterization. Hence, dynamically modeling formant transitions may lead to quantitatively testable predictions in vowel categorization. Because the articulatory strategy used to manage different speaking rates and contrastive stress may depend on speaker and situation, the parameter values of a dynamic formant model may vary with speaking rate and stress. In most experiments speaking rate is rarely controlled, only two or three rates are tested, and most corpora contain just a few repetitions of each item. As a consequence, the dependence of dynamic models on those factors is difficult to gauge. This article presents a study of 2300 [iai] or [i epsilon i] stimuli produced by two speakers at nine or ten speaking rates in a carrier sentence for two contrastive stress patterns. The corpus was perceptually evaluated by naive listeners. Formant frequencies were measured during the steady-state parts of the stimuli, and the formant transitions were dynamically and kinematically modeled. The results indicate that (1) the corpus was characterized by a contextual assimilation instead of a centralization effect; (2) dynamic or kinematic modeling was equivalent as far as the analysis of the model parameters was concerned; (3) the dependence of the model parameter estimates on speaking rate and stress suggests that the formant transitions were sharper for high speaking rate, but no consistent trend was found for contrastive stress; (4) the formant frequencies measured in the steady-state parts of the vowels were sufficient to explain the perceptual results while the dynamic parameters of the models were not.  相似文献   

17.
18.
19.
The present study was designed to assess the effects of age on the time course of backward masking. In experiment 1, thresholds for detecting a 10-ms, 500-Hz sinusoidal signal were measured as a function of the temporal separation between the signal and a 50-ms broadband masker. Subjects were younger (18-24) and older (over age 65) adults with normal hearing (thresholds less than 20 dB HL) for frequencies of 4 kHz and below. Younger subjects exhibited less overall masking and steeper recovery functions than did the older adults. Masked thresholds for younger participants approached unmasked thresholds for signal-masker delays greater than 6-8 ms. In contrast, older adults exhibited significant masking even at the longest delay tested (20 ms). In experiment 2, signal duration was decreased to 5 ms for a separate group of younger adults. Although overall thresholds were elevated for the shorter signal duration, the slope of the backward masking recovery function was not different from that observed for younger adults in experiment 1. The results suggest that age, independent of hearing loss, affects the temporal course of backward masking.  相似文献   

20.
The role of auditory feedback in speech motor control was explored in three related experiments. Experiment 1 investigated auditory sensorimotor adaptation: the process by which speakers alter their speech production to compensate for perturbations of auditory feedback. When the first formant frequency (F1) was shifted in the feedback heard by subjects as they produced vowels in consonant-vowel-consonant (CVC) words, the subjects' vowels demonstrated compensatory formant shifts that were maintained when auditory feedback was subsequently masked by noise-evidence of adaptation. Experiment 2 investigated auditory discrimination of synthetic vowel stimuli differing in F1 frequency, using the same subjects. Those with more acute F1 discrimination had compensated more to F1 perturbation. Experiment 3 consisted of simulations with the directions into velocities of articulators model of speech motor planning, which showed that the model can account for key aspects of compensation. In the model, movement goals for vowels are regions in auditory space; perturbation of auditory feedback invokes auditory feedback control mechanisms that correct for the perturbation, which in turn causes updating of feedforward commands to incorporate these corrections. The relation between speaker acuity and amount of compensation to auditory perturbation is mediated by the size of speakers' auditory goal regions, with more acute speakers having smaller goal regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号