首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 108 毫秒
1.
Seven listener groups, varying in terms of the nasal consonant inventory of their native language, orthographically labeled and rated a set of naturally produced non-native nasal consonants varying in place of articulation. The seven listener groups included speakers of Malayalam, Marathi, Punjabi, Tamil, Oriya, Bengali, and American English. The stimulus set included bilabial, dental, alveolar, and retroflex nasals from Malayalam, Marathi, and Oriya. The stimulus set and nasal consonant inventories of the seven listener groups were described by both phonemic and allophonic representations. The study was designed to determine the extent to which phonemic and allophonic representations of perceptual categories can be used to predict a listener group's identification of non-native sounds. The results of the experiment showed that allophonic representations were more successful in predicting the native category that listeners used to label a non-native sound in a majority of trials. However, both representations frequently failed to accurately predict the goodness of fit between a non-native sound and a perceptual category. The results demonstrate that the labeling and rating of non-native stimuli were conditioned by a degree of language-specific phonetic detail that corresponds to perceptually relevant cues to native language contrasts.  相似文献   

2.
A set of experiments was conducted to examine the loudness of sounds with temporally asymmetric amplitude envelopes. Envelopes were generated with fast-attack/slow-decay characteristics to produce F-S (or "fast-slow") stimuli, while temporally reversed versions of these same envelopes produced corresponding S-F ("slow-fast") stimuli. For sinusoidal (330-6000 Hz) and broadband noise carriers, S-F stimuli were louder than F-S stimuli of equal energy. The magnitude of this effect was sensitive to stimulus order, with the largest differences between F-S and S-F loudness occurring after exposure to a preceding F-S stimulus. These results are not compatible with automatic gain control, power-spectrum models of loudness, or predictions obtained using the auditory image model [Patterson et al., J. Acoust. Soc. Am. 98, 1890-1894 (1995)]. Rather, they are comparable to phenomena of perceptual constancy, and may be related to the parsing of auditory input into direct and reverberant sound.  相似文献   

3.
In an investigation of contextual influences on sound categorization, 64 Peruvian Spanish listeners categorized vowels on an /i/ to /e/ continuum. First, to measure the influence of the stimulus range (broad acoustic context) and the preceding stimuli (local acoustic context), listeners were presented with different subsets of the Spanish /i/-/e/ continuum in separate blocks. Second, the influence of the number of response categories was measured by presenting half of the participants with /i/ and /e/ as responses, and the other half with /i/, /e/, /a/, /o/, and /u/. The results showed that the perceptual category boundary between /i/ and /e/ shifted depending on the stimulus range and that the formant values of locally preceding items had a contrastive influence. Categorization was less susceptible to broad and local acoustic context effects, however, when listeners were presented with five rather than two response options. Vowel categorization depends not only on the acoustic properties of the target stimulus, but also on its broad and local acoustic context. The influence of such context is in turn affected by the number of internal referents that are available to the listener in a task.  相似文献   

4.
Recent work [Iverson et al. (2003) Cognition, 87, B47-57] has suggested that Japanese adults have difficulty learning English /r/ and /l/ because they are overly sensitive to acoustic cues that are not reliable for /r/-/l/ categorization (e.g., F2 frequency). This study investigated whether cue weightings are altered by auditory training, and compared the effectiveness of different training techniques. Separate groups of subjects received High Variability Phonetic Training (natural words from multiple talkers), and 3 techniques in which the natural recordings were altered via signal processing (All Enhancement, with F3 contrast maximized and closure duration lengthened; Perceptual Fading, with F3 enhancement reduced during training; and Secondary Cue Variability, with variation in F2 and durations increased during training). The results demonstrated that all of the training techniques improved /r/-/l/ identification by Japanese listeners, but there were no differences between the techniques. Training also altered the use of secondary acoustic cues; listeners became biased to identify stimuli as English /l/ when the cues made them similar to the Japanese /r/ category, and reduced their use of secondary acoustic cues for stimuli that were dissimilar to Japanese /r/. The results suggest that both category assimilation and perceptual interference affect English /r/ and /l/ acquisition.  相似文献   

5.
The goal of this study was to examine the neural encoding of voice-onset time distinctions that indicate the phonetic categories /da/ and /ta/ for human listeners. Cortical Auditory Evoked Potentials (CAEP) were measured in conjunction with behavioral perception of a /da/-/ta/ continuum. Sixteen subjects participated in identification and discrimination experiments. A sharp category boundary was revealed between /da/ and /ta/ around the same location for all listeners. Subjects' discrimination of a VOT change of equal magnitude was significantly more accurate across the /da/-/ta/ categories than within the /ta/ category. Neurophysiologic correlates of VOT encoding were investigated using the N1 CAEP which reflects sensory encoding of stimulus features and the MMN CAEP which reflects sensory discrimination. The MMN elicited by the across-category pair was larger and more robust than the MMN which occurred in response to the within-category pair. Distinct changes in N1 morphology were related to VOT encoding. For stimuli that were behaviorally identified as /da/, a single negativity (N1) was apparent; however, for stimuli identified as /ta/, two distinct negativities (N1 and N1') were apparent. Thus the enhanced MMN responses and the morphological discontinuity in N1 morphology observed in the region of the /da/-/ta/ phonetic boundary appear to provide neurophysiologic correlates of categorical perception for VOT.  相似文献   

6.
This study investigated the extent to which adult Japanese listeners' perceived phonetic similarity of American English (AE) and Japanese (J) vowels varied with consonantal context. Four AE speakers produced multiple instances of the 11 AE vowels in six syllabic contexts /b-b, b-p, d-d, d-t, g-g, g-k/ embedded in a short carrier sentence. Twenty-four native speakers of Japanese were asked to categorize each vowel utterance as most similar to one of 18 Japanese categories [five one-mora vowels, five two-mora vowels, plus/ei, ou/ and one-mora and two-mora vowels in palatalized consonant CV syllables, C(j)a(a), C(j)u(u), C(j)o(o)]. They then rated the "category goodness" of the AE vowel to the selected Japanese category on a seven-point scale. None of the 11 AE vowels was assimilated unanimously to a single J response category in all context/speaker conditions; consistency in selecting a single response category ranged from 77% for /eI/ to only 32% for /ae/. Median ratings of category goodness for modal response categories were somewhat restricted overall, ranging from 5 to 3. Results indicated that temporal assimilation patterns (judged similarity to one-mora versus two-mora Japanese categories) differed as a function of the voicing of the final consonant, especially for the AE vowels, /see text/. Patterns of spectral assimilation (judged similarity to the five J vowel qualities) of /see text/ also varied systematically with consonantal context and speakers. On the basis of these results, it was predicted that relative difficulty in the identification and discrimination of AE vowels by Japanese speakers would vary significantly as a function of the contexts in which they were produced and presented.  相似文献   

7.
Researchers long have searched for invariant acoustic features that can be used to identify singing voice categories or even individual singers. Few researchers have examined how listeners perceive singing voice categories or individual voices. Timbre, the most studied perceptual dimension of the singing voice, is generally believed to vary systematically between singing voice categories but is often assumed to be invariant with an individual singer. To test this assumption, 2 mezzo-sopranos and 2 sopranos were recorded singing the vowel /a/ on the pitches A3, C4, G4, B4, F5, and A5. Trials of three stimuli were constructed. Two of the three stimuli in each trial were produced by the same singer at two different pitches (X1 and X2), while the third stimulus was produced by a different singer (Y). Three X1X2 conditions were created: (1) G4, B4; (2) C4, F5; and (3) A3, A5. For each singer and each condition, Y was varied across the three remaining singers and across all six pitches. Experienced and inexperienced listeners were asked to identify which stimulus was produced by the “odd” person. The ability to correctly choose the odd person varied greatly depending on pitch factors, suggesting that the traditional concept of an invariant timbre associated with a singer is inaccurate and that vocal timbre must be conceptualized in terms of transformations in perceived quality that occur across an individual singer's range and/or registers.  相似文献   

8.
Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories.  相似文献   

9.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

10.
This study examined perceptual learning of spectrally complex nonspeech auditory categories in an interactive multi-modal training paradigm. Participants played a computer game in which they navigated through a three-dimensional space while responding to animated characters encountered along the way. Characters' appearances in the game correlated with distinctive sound category distributions, exemplars of which repeated each time the characters were encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became harder to identify visually, so quick identification of approaching characters by sound patterns was, although never required or encouraged, of gradually increasing benefit. After 30 min of play, participants performed a categorization task, matching sounds to characters. Despite not being informed of audio-visual correlations, participants exhibited reliable learning of these patterns at posttest. Categorization accuracy was related to several measures of game performance and category learning was sensitive to category distribution differences modeling acoustic structures of speech categories. Category knowledge resulting from the game was qualitatively different from that gained from an explicit unsupervised categorization task involving the same stimuli. Results are discussed with respect to information sources and mechanisms involved in acquiring complex, context-dependent auditory categories, including phonetic categories, and to multi-modal statistical learning.  相似文献   

11.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

12.
Training Japanese listeners to identify English /r/ and /l/: a first report   总被引:5,自引:0,他引:5  
Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest-posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language.  相似文献   

13.
We examined effects of contrast and character size upon legibility of Japanese text stimuli presented on visual display terminal (VDT). In the experiment, three different character sizes were employed and the text stimulus was presented under a variety of conditions where contrast between the text and the background changed. Reading speed and the rate of readable characters were measured. Subjective rating for legibility was also evaluated. Results showed that legibility increases with luminance contrast for all character sizes examined here. A strong correlation was found between the subjective rating index and reading speed.  相似文献   

14.
The ability to discriminate complex temporal envelope patterns submitted to temporal compression or expansion was assessed in normal-hearing listeners. An XAB, matching-to-sample-procedure was used. X, the reference stimulus, is obtained by applying the sum of two, inharmonically related, sinusoids to a broadband noise carrier. A and B are obtained by multiplying the frequency of each modulation component of X by the same time expansion/compression factor, alpha (alphain[0.35-2.83]). For each trial, A or B is a time-reversed rendering of X, and the listeners' task is to choose which of the two is matched by X. Overall, the results indicate that discrimination performance degrades for increasing amounts of time expansion/compression (i.e., when alpha departs from 1), regardless of the frequency spacing of modulation components and the peak-to-trough ratio of the complex envelopes. An auditory model based on envelope extraction followed by a memory-limited, template-matching process accounted for results obtained without time scaling of stimuli, but generally underestimated discrimination ability with either time expansion or compression, especially with the longer stimulus durations. This result is consistent with partial or incomplete perceptual normalization of envelope patterns.  相似文献   

15.
This study examined imitation of a voice onset time (VOT) continuum ranging from/da/to/ta/by by subjects differing in age and/or linguistic experience. The subjects did not reproduce the incremental increases in VOT linearly, but instead showed abrupt shifts in VOT between two or three VOT response "modes." The location of the response shifts occurred at the same location as phoneme boundaries obtained in a previous identification experiment. This supports the view that the stimuli were categorized before being imitated. Children and adults who spoke just Spanish generally produced only lead and short-lag VOT responses. English monolinguals tended to produce stops with only short-lag and long-lag VOT values. The native Spanish adults and children who spoke English, on the other hand, produced stops with VOT values falling into all three model VOT ranges. This was interpreted to mean that they had established a phonetic category [th] with which to implement the voiceless aspirated realizations of /t/ in English. Their inability to produce English /p,t,k/ with the same values as native speakers of English must therefore be attributed to the information specified in their new English phonetic categories (which might be incorrect as the result of exposure to Spanish-accented English), to partially formed phonetic realization rules, or both.  相似文献   

16.
The role of language-specific factors in phonetically based trading relations was examined by assessing the ability of 20 native Japanese speakers to identify and discriminate stimuli of two synthetic /r/-/l/ series that varied temporal and spectral parameters independently. Results of forced-choice identification and oddity discrimination tasks showed that the nine Japanese subjects who were able to identify /r/ and /l/ reliably demonstrated a trading relation similar to that of Americans. Discrimination results reflected the perceptual equivalence of temporal and spectral parameters. Discrimination by the 11 Japanese subjects who were unable to identify the /r/-/l/ series differed significantly from the skilled Japanese subjects and native English speakers. However, their performance could not be predicted on the basis of acoustic dissimilarity alone. These results provide evidence that the trading relation between temporal and spectral cues for the /r/-/l/ contrast is not solely attributable to general auditory or language-universal phonetic processing constraints, but rather is also a function of phonemic processes that can be modified in the course of learning a second language.  相似文献   

17.
Auditory evoked potential (AEP) correlates of the neural representation of stimuli along a /ga/-/ka/ and a /ba/-/pa/ continuum were examined to determine whether the voice-onset time (VOT)-related change in the N1 onset response from a single to double-peaked component is a reliable indicator of the perception of voiced and voiceless sounds. Behavioral identification results from ten subjects revealed a mean category boundary at a VOT of 46 ms for the /ga/-/ka/ continuum and at a VOT of 27.5 ms for the /ba/-/pa/ continuum. In the same subjects, electrophysiologic recordings revealed that a single N1 component was seen for stimuli with VOTs of 30 ms and less, and two components (N1' and N1) were seen for stimuli with VOTs of 40 ms and more for both continua. That is, the change in N1 morphology (from single to double-peaked) coincided with the change in perception from voiced to voiceless for stimuli from the /ba/-/pa/ continuum, but not for stimuli from the /ga/-/ka/ continuum. The results of this study show that N1 morphology does not reliably predict phonetic identification of stimuli varying in VOT. These findings also suggest that the previously reported appearance of a "double-peak" onset response in aggregate recordings from the auditory cortex does not indicate a cortical correlate of the perception of voicelessness.  相似文献   

18.
Effects of sound level on auditory cortical activation are seen in neuroimaging data. However, factors such as the cortical response to the intense ambient scanner noise and to the bandwidth of the acoustic stimuli will both confound precise quantification and interpretation of such sound-level effects. The present study used temporally "sparse" imaging to reduce effects of scanner noise. To achieve control for stimulus bandwidth, three schemes were compared for sound-level matching across bandwidth: component level, root-mean-square power and loudness. The calculation of the loudness match was based on the model reported by Moore and Glasberg [Acta Acust. 82, 335-345 (1996)]. Ten normally hearing volunteers were scanned using functional magnetic resonance imaging (tMRI) while listening to a 300-Hz tone presented at six different sound levels between 66 and 91 dB SPL and a harmonic-complex tone (F0= 186 Hz) presented at 65 and 85 dB SPL. This range of sound levels encompassed all three bases of sound-level matching. Activation in the superior temporal gyrus, induced by each of the eight tone conditions relative to a quiet baseline condition, was quantified as to extent and magnitude. Sound level had a small, but significant, effect on the extent of activation for the pure tone, but not for the harmonic-complex tone, while it had a significant effect on the response magnitude for both types of stimulus. Response magnitude increased linearly as a function of sound level for the full range of levels for the pure tone. The harmonic-complex tone produced greater activation than the pure tone, irrespective of the matching scheme for sound level, indicating that bandwidth had a greater effect on the pattern of auditory activation than sound level. Nevertheless, when the data were collapsed across stimulus class, extent and magnitude were significantly correlated with the loudness scale (measured in phons), but not with the intensity scale (measured in SPL). We therefore recommend the loudness formula as the most appropriate basis of matching sound level to control for loudness effects when cortical responses to other stimulus attributes, such as stimulus class, are the principal concern.  相似文献   

19.
This study assessed the extent to which second-language learners are sensitive to phonetic information contained in visual cues when identifying a non-native phonemic contrast. In experiment 1, Spanish and Japanese learners of English were tested on their perception of a labial/ labiodental consonant contrast in audio (A), visual (V), and audio-visual (AV) modalities. Spanish students showed better performance overall, and much greater sensitivity to visual cues than Japanese students. Both learner groups achieved higher scores in the AV than in the A test condition, thus showing evidence of audio-visual benefit. Experiment 2 examined the perception of the less visually-salient /1/-/r/ contrast in Japanese and Korean learners of English. Korean learners obtained much higher scores in auditory and audio-visual conditions than in the visual condition, while Japanese learners generally performed poorly in both modalities. Neither group showed evidence of audio-visual benefit. These results show the impact of the language background of the learner and visual salience of the contrast on the use of visual cues for a non-native contrast. Significant correlations between scores in the auditory and visual conditions suggest that increasing auditory proficiency in identifying a non-native contrast is linked with an increasing proficiency in using visual cues to the contrast.  相似文献   

20.
This paper is concerned with the representation of the spectra of synthesized steady-state vowels in the temporal aspects of the discharges of auditory-nerve fibers. The results are based on a study of the responses of large numbers of single auditory-nerve fibers in anesthetized cats. By presenting the same set of stimuli to all the fibers encountered in each cat, we can directly estimate the population response to those stimuli. Period histograms of the responses of each unit to the vowels were constructed. The temporal response of a fiber to each harmonic component of the stimulus is taken to be the amplitude of the corresponding component in the Fourier transform of the unit's period histogram. At low sound levels, the temporal response to each stimulus component is maximal among units with CFs near the frequency of the component (i.e., near its place). Responses to formant components are larger than responses to other stimulus components. As sound level is increased, the responses to the formants, particularly the first formant, increase near their places and spread to adjacent regions, particularly toward higher CFs. Responses to nonformant components, exept for harmonics and intermodulation products of the formants (2F1,2F2,F1 + F2, etc), are suppressed; at the highest sound levels used (approximately 80 dB SPL), temporal responses occur almost exclusively at the first two or three formants and their harmonics and intermodulation products. We describe a simple calculation which combines rate, place, and temporal information to provide a good representation of the vowels' spectra, including a clear indication of at least the first two formant frequencies. This representation is stable with changes in sound level at least up to 80 dB SPL; its stability is in sharp contrast to the behavior of the representation of the vowels' spectra in terms of discharge rate which degenerates at stimulus levels within the conversational range.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号