首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The role of language-specific factors in phonetically based trading relations was examined by assessing the ability of 20 native Japanese speakers to identify and discriminate stimuli of two synthetic /r/-/l/ series that varied temporal and spectral parameters independently. Results of forced-choice identification and oddity discrimination tasks showed that the nine Japanese subjects who were able to identify /r/ and /l/ reliably demonstrated a trading relation similar to that of Americans. Discrimination results reflected the perceptual equivalence of temporal and spectral parameters. Discrimination by the 11 Japanese subjects who were unable to identify the /r/-/l/ series differed significantly from the skilled Japanese subjects and native English speakers. However, their performance could not be predicted on the basis of acoustic dissimilarity alone. These results provide evidence that the trading relation between temporal and spectral cues for the /r/-/l/ contrast is not solely attributable to general auditory or language-universal phonetic processing constraints, but rather is also a function of phonemic processes that can be modified in the course of learning a second language.  相似文献   

2.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

3.
Current speech perception models propose that relative perceptual difficulties with non-native segmental contrasts can be predicted from cross-language phonetic similarities. Japanese (J) listeners performed a categorical discrimination task in which nine contrasts (six adjacent height pairs, three front/back pairs) involving eight American (AE) vowels [i?, ?, ε, ??, ɑ?, ?, ?, u?] in /hVb?/ disyllables were tested. The listeners also completed a perceptual assimilation task (categorization as J vowels with category goodness ratings). Perceptual assimilation patterns (quantified as categorization overlap scores) were highly predictive of discrimination accuracy (r(s)=0.93). Results suggested that J listeners used both spectral and temporal information in discriminating vowel contrasts.  相似文献   

4.
The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues.  相似文献   

5.
Different patterns of performance across vowels and consonants in tests of categorization and discrimination indicate that vowels tend to be perceived more continuously, or less categorically, than consonants. The present experiments examined whether analogous differences in perception would arise in nonspeech sounds that share critical transient acoustic cues of consonants and steady-state spectral cues of simplified synthetic vowels. Listeners were trained to categorize novel nonspeech sounds varying along a continuum defined by a steady-state cue, a rapidly-changing cue, or both cues. Listeners' categorization of stimuli varying on the rapidly changing cue showed a sharp category boundary and posttraining discrimination was well predicted from the assumption of categorical perception. Listeners more accurately discriminated but less accurately categorized steady-state nonspeech stimuli. When listeners categorized stimuli defined by both rapidly-changing and steady-state cues, discrimination performance was accurate and the categorization function exhibited a sharp boundary. These data are similar to those found in experiments with dynamic vowels, which are defined by both steady-state and rapidly-changing acoustic cues. A general account for the speech and nonspeech patterns is proposed based on the supposition that the perceptual trace of rapidly-changing sounds decays faster than the trace of steady-state sounds.  相似文献   

6.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

7.
Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process.  相似文献   

8.
In order to determine the effects of hearing loss and spectral shaping on a dynamic spectral speech cue, behavioral identification and neural response patterns of stop-consonant stimuli varying along the /b-d-g/ place-of-articulation continuum were measured from 11 young adults (mean age = 27 years) and 10 older adults (mean age = 55.2 years) with normal hearing, and compared to those from 10 older adults (mean age = 61.3 years) with mild-to-moderate hearing impairment. Psychometric functions and N1-P2 cortical evoked responses were obtained using consonant-vowel (CV) stimuli with frequency-independent (unshaped) amplification as well as with frequency-dependent (shaped) amplification that enhanced F2 relative to the rest of the stimulus. Results indicated that behavioral identification and neural response patterns of stop-consonant CVs were affected primarily by aging and secondarily by age-related hearing loss. Further, enhancing the audibility of the F2 transition cue with spectrally shaped amplification partially reduced the effects of age-related hearing loss on categorization ability but not neural response patterns of stop-consonant CVs. These findings suggest that aging affects excitatory and inhibitory processes and may contribute to the perceptual differences of dynamic spectral cues seen in older versus young adults. Additionally, age and age-related hearing loss may have separate influences on neural function.  相似文献   

9.
The effects of mild-to-moderate hearing impairment on the perceptual importance of three acoustic correlates of stop consonant place of articulation were examined. Normal-hearing and hearing-impaired adults identified a stimulus set comprising all possible combinations of the levels of three factors: formant transition type (three levels), spectral tilt type (three levels), and abruptness of frequency change (two levels). The levels of these factors correspond to those appropriate for /b/, /d/, and /g/ in the /ae/ environment. Normal-hearing subjects responded primarily in accord with the place of articulation specified by the formant transitions. Hearing-impaired subjects showed less-than-normal reliance on formant transitions and greater-than-normal reliance on spectral tilt and abruptness of frequency change. These results suggest that hearing impairment affects the perceptual importance of cues to stop consonant identity, increasing the importance of information provided by both temporal characteristics and gross spectral shape and decreasing the importance of information provided by the formant transitions.  相似文献   

10.
Spectral-ripple discrimination has been used widely for psychoacoustical studies in normal-hearing, hearing-impaired, and cochlear implant listeners. The present study investigated the perceptual mechanism for spectral-ripple discrimination in cochlear implant listeners. The main goal of this study was to determine whether cochlear implant listeners use a local intensity cue or global spectral shape for spectral-ripple discrimination. The effect of electrode separation on spectral-ripple discrimination was also evaluated. Results showed that it is highly unlikely that cochlear implant listeners depend on a local intensity cue for spectral-ripple discrimination. A phenomenological model of spectral-ripple discrimination, as an "ideal observer," showed that a perceptual mechanism based on discrimination of a single intensity difference cannot account for performance of cochlear implant listeners. Spectral modulation depth and electrode separation were found to significantly affect spectral-ripple discrimination. The evidence supports the hypothesis that spectral-ripple discrimination involves integrating information from multiple channels.  相似文献   

11.
This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary.  相似文献   

12.
The contribution of temporal asynchrony, spatial separation, and frequency separation to the cross-spectral fusion of temporally contiguous brief narrow-band noise bursts was studied using the Rhythmic Masking Release paradigm (RMR). RMR involves the discrimination of one of two possible rhythms, despite perceptual masking of the rhythm by an irregular sequence of sounds identical to the rhythmic bursts, interleaved among them. The release of the rhythm from masking can be induced by causing the fusion of the irregular interfering sounds with concurrent "flanking" sounds situated in different frequency regions. The accuracy and the rated clarity of the identified rhythm in a 2-AFC procedure were employed to estimate the degree of fusion of the interferring sounds with flanking sounds. The results suggest that while synchrony fully fuses short-duration noise bursts across frequency and across space (i.e., across ears and loudspeakers), an asynchrony of 20-40 ms produces no fusion. Intermediate asynchronies of 10-20 ms produce partial fusion, where the presence of other cues is critical for unambiguous grouping. Though frequency and spatial separation reduced fusion, neither of these manipulations was sufficient to abolish it. For the parameters varied in this study, stimulus onset asynchrony was the dominant cue determining fusion, but there were additive effects of the other cues. Temporal synchrony appears to be critical in determining whether brief sounds with abrupt onsets and offsets are heard as one event or more than one.  相似文献   

13.
The effect of the filter bank on fundamental frequency (F0) discrimination was examined in four Nucleus CI24 cochlear implant subjects for synthetic stylized vowel-like stimuli. The four tested filter banks differed in cutoff frequencies, amount of overlap between filters, and shape of the filters. To assess the effects of temporal pitch cues on F0 discrimination, temporal fluctuations were removed above 10 Hz in one condition and above 200 Hz in another. Results indicate that F0 discrimination based upon place pitch cues is possible, but just-noticeable differences exceed 1 octave or more depending on the filter bank used. Increasing the frequency resolution in the F0 range improves the F0 discrimination based upon place pitch cues. The results of F0 discrimination based upon place pitch agree with a model that compares the centroids of the electrical excitation pattern. The addition of temporal fluctuations up to 200 Hz significantly improves F0 discrimination. Just-noticeable differences using both place and temporal pitch cues range from 6% to 60%. Filter banks that do not resolve the higher harmonics provided the best temporal pitch cues, because temporal pitch cues are clearest when the fluctuation on all channels is at F0 and preferably in phase.  相似文献   

14.
Recent work [Iverson et al. (2003) Cognition, 87, B47-57] has suggested that Japanese adults have difficulty learning English /r/ and /l/ because they are overly sensitive to acoustic cues that are not reliable for /r/-/l/ categorization (e.g., F2 frequency). This study investigated whether cue weightings are altered by auditory training, and compared the effectiveness of different training techniques. Separate groups of subjects received High Variability Phonetic Training (natural words from multiple talkers), and 3 techniques in which the natural recordings were altered via signal processing (All Enhancement, with F3 contrast maximized and closure duration lengthened; Perceptual Fading, with F3 enhancement reduced during training; and Secondary Cue Variability, with variation in F2 and durations increased during training). The results demonstrated that all of the training techniques improved /r/-/l/ identification by Japanese listeners, but there were no differences between the techniques. Training also altered the use of secondary acoustic cues; listeners became biased to identify stimuli as English /l/ when the cues made them similar to the Japanese /r/ category, and reduced their use of secondary acoustic cues for stimuli that were dissimilar to Japanese /r/. The results suggest that both category assimilation and perceptual interference affect English /r/ and /l/ acquisition.  相似文献   

15.
A variety of dolphin sonar discrimination experiments have been conducted, yet little is known about the cues utilized by dolphins in making fine target discriminations. In order to gain insights on cues available to echolocating dolphins, sonar discrimination experiments were conducted with human subjects using the same targets employed in dolphin experiments. When digital recordings of echoes from targets ensonified with a dolphinlike signal were played back at a slower rate to human subjects, they could also make fine target discriminations under controlled laboratory conditions about as well as dolphins under less controlled conditions. Subjects reported that time-separation-pitch and duration cues were important. They also reported that low-amplitude echo components 32 dB below the maximum echo component were usable. The signal-to-noise ratio had to be greater than 10 dB above the detection threshold for simple discrimination and 30 dB for difficult discrimination. Except for two cases in which spectral cues in the form of "click pitch" were important, subjects indicated that time-domain rather than frequency-domain processing seemed to be more relevant in analyzing the echoes.  相似文献   

16.
In English, voiced and voiceless syllable-initial stop consonants differ in both fundamental frequency at the onset of voicing (onset F0) and voice onset time (VOT). Although both correlates, alone, can cue the voicing contrast, listeners weight VOT more heavily when both are available. Such differential weighting may arise from differences in the perceptual distance between voicing categories along the VOT versus onset F0 dimensions, or it may arise from a bias to pay more attention to VOT than to onset F0. The present experiment examines listeners' use of these two cues when classifying stimuli in which perceptual distance was artificially equated along the two dimensions. Listeners were also trained to categorize stimuli based on one cue at the expense of another. Equating perceptual distance eliminated the expected bias toward VOT before training, but successfully learning to base decisions more on VOT and less on onset F0 was easier than vice versa. Perceptual distance along both dimensions increased for both groups after training, but only VOT-trained listeners showed a decrease in Garner interference. Results lend qualified support to an attentional model of phonetic learning in which learning involves strategic redeployment of selective attention across integral acoustic cues.  相似文献   

17.
Two experiments determined the just noticeable difference (jnd) in onset frequency for speech formant transitions followed by a 1800-Hz steady state. Influences of transition duration (30, 45, 60, and 120 ms), transition-onset region (above or below 1800 Hz), and the rate of transition were examined. An overall improvement in discrimination with duration was observed suggesting better frequency resolution and, consequently, better use of pitch/timbre cues with longer transitions. In addition, falling transitions (with onsets above 1800 Hz) were better discriminated than rising, and changing onset to produce increments in transition rate-of-change in frequency yielded smaller jnd's than changing onset to produce decrements. The shortest transitions displayed additional rate-related effects. This last observation may be due to differences in the degree of dispersion of activity in the cochlea when high-rate transitions are effectively treated as non-time-varying, wideband events. The other results may reflect mechanisms that extract the temporal envelopes of signals: Envelope slope and magnitude differences are proposed to provide discriminative cues that supplement or supplant weaker spectrally based pitch/timbre cues for transitions in the short-to-moderate duration range. It is speculated that these cues may also support some speech perceptual decisions.  相似文献   

18.
Normal-hearing listeners' ability to "hear out" the pitch of a target harmonic complex tone (HCT) was tested with simultaneous HCT or noise maskers, all bandpass-filtered into the same spectral region (1200-3600 Hz). Target-to-masker ratios (TMRs) necessary to discriminate fixed fundamental-frequency (F0) differences were measured for target F0s between 100 and 400 Hz. At high F0s (400 Hz), asynchronous gating of masker and signal, presenting the masker in a different F0 range, and reducing the F0 rove of the masker, all resulted in improved performance. At the low F0s (100 Hz), none of these manipulations improved performance significantly. The findings are generally consistent with the idea that the ability to segregate sounds based on cues such as F0 differences and onset/offset asynchronies can be strongly limited by peripheral harmonic resolvability. However, some cases were observed where perceptual segregation appeared possible, even when no peripherally resolved harmonics were present in the mixture of target and masker. A final experiment, comparing TMRs necessary for detection and F0 discrimination, showed that F0 discrimination of the target was possible with noise maskers at only a few decibels above detection threshold, whereas similar performance with HCT maskers was only possible 15-25 dB above detection threshold.  相似文献   

19.
Previous psychophysical studies have shown that the perceptual distinction between voiceless fricatives and affricates in consonant-vowel syllables depends primarily on frication duration, whereas amplitude rise slope was suggested as the cue in automatic classification experiments. The effects of both cues on the manner of articulation between /integral of/ and /t integral of/ were investigated. Subjects performed a forced-choice task (/integral of/ or /t integral of) in response to edited waveforms of Japanese fricatives /integral of i/, /integral of u/, and /integral of a/. We found that frication duration, onset slope, and the interaction between duration and onset slope influenced the perceptual distinction. That is, the percent of /integral of/ responses increased with an increase in frication duration (experiments 1-3). The percent of /integral of/ responses also increased with a decrease in slope steepness (experiment 3), and the relative importance between slope portions was not even but weighted at onset (experiments 1 and 2). There was an interaction between the two cues of frication duration and steepness. The relative importance of the slope cue was maximum at a frication duration of 150 ms (experiment 3). It is concluded that the frication duration and amplitude rise slope at frication onset are acoustic cues that discriminate between /integral of/ and /t integral of/, and that the two cues interact with each other.  相似文献   

20.
A computational model of the dolphin auditory system was developed to describe how multiple discrimination cues may be represented and employed during echolocation discrimination tasks. The model consisted of a bank of gammatone filters followed by half-wave rectification and low pass filtering. The output of the model resembles a spectrogram; however, the model reflects temporal and spectral resolving properties of the dolphin auditory system. Model outputs were organized to represent discrimination cues related to spectral, temporal and intensity information. Two empirical experiments, a phase discrimination experiment [Johnson et al., Animal Sonar Processes and Performance (Plenum, New York, 1988)] and a cylinder wall thickness discrimination tasks [Au and Pawolski, J. Comp. Physiol. A 170, 41-47 (1992)] were then simulated. Model performance was compared to dolphin performance. Although multiple discrimination cues were potentially available to the dolphin, simulation results suggest temporal information was used in the former experiment and spectral information in the latter. This model's representation of sound provides a more accurate approximation to what the dolphin may be hearing compared to conventional spectrograms, time-amplitude, or spectral representations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号