首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
语音中元音和辅音的听觉感知研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文对语音中元音和辅音的听觉感知研究进行综述。80多年前基于无意义音节的权威实验结果表明辅音对人的听感知更为重要,由于实验者在学术上的成就和权威性,这一结论成为了常识,直到近20年前基于自然语句的实验挑战了这个结论并引发了新一轮的研究。本文主要围绕元音和辅音对语音感知的相对重要性、元音和辅音的稳态信息和边界动态信息对语音感知的影响以及相关研究的潜在应用等进行较为系统的介绍,最后给出了总结与展望。  相似文献   

2.
The perception of voicing in final velar stop consonants was investigated by systematically varying vowel duration, change in offset frequency of the final first formant (F1) transition, and rate of frequency change in the final F1 transition for several vowel contexts. Consonant-vowel-consonant (CVC) continua were synthesized for each of three vowels, [i,I,ae], which represent a range of relatively low to relatively high-F1 steady-state values. Subjects responded to the stimuli under both an open- and closed-response condition. Results of the study show that both vowel duration and F1 offset properties influence perception of final consonant voicing, with the salience of the F1 offset property higher for vowels with high-F1 steady-state frequencies than low-F1 steady-state frequencies, and the opposite occurring for the vowel duration property. When F1 onset and offset frequencies were controlled, rate of the F1 transition change had inconsistent and minimal effects on perception of final consonant voicing. Thus the findings suggest that it is the termination value of the F1 offset transition rather than rate and/or duration of frequency change, which cues voicing in final velar stop consonants during the transition period preceding closure.  相似文献   

3.
This study investigated age-related differences in sensitivity to temporal cues in modified natural speech sounds. Listeners included young noise-masked subjects, elderly normal-hearing subjects, and elderly hearing-impaired subjects. Four speech continua were presented to listeners, with stimuli from each continuum varying in a single temporal dimension. The acoustic cues varied in separate continua were voice-onset time, vowel duration, silence duration, and transition duration. In separate conditions, the listeners identified the word stimuli, discriminated two stimuli in a same-different paradigm, and discriminated two stimuli in a 3-interval, 2-alternative forced-choice procedure. Results showed age-related differences in the identification function crossover points for the continua that varied in silence duration and transition duration. All listeners demonstrated shorter difference limens (DLs) for the three-interval paradigm than the two-interval paradigm, with older hearing-impaired listeners showing larger DLs than the other listener groups for the silence duration cue. The findings support the general hypothesis that aging can influence the processing of specific temporal cues that are related to consonant manner distinctions.  相似文献   

4.
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.  相似文献   

5.
Icelandic has a phonologic contrast of quantity, distinguishing long and short vowels and consonants. Perceptual studies have shown that a major cue for quantity in perception is relational, involving the vowel-to-rhyme ratio. This cue is approximately invariant under transformations of rate, thus yielding a higher-order invariant for the perception of quantity in Icelandic. Recently it has, however, been shown that vowel spectra can also influence the perception of quantity. This holds for vowels which have different spectra in their long and short varieties. This finding raises the question of whether the durational contrast is less well articulated in those cases where vowel spectra provide another cue for quantity. To test this possibility, production measurements were carried out on vowels and consonants in words which were spoken by a number of speakers at different utterance rates in two experiments. A simple neural network was then trained on the production measurements. Using the network to classify the training stimuli shows that the durational distinctions between long and short phonemes are as clearly articulated whether or not there is a secondary, spectral, cue to quantity.  相似文献   

6.
The experiments reported employed nonspeech analogs of speech stimuli to examine the perceptual interaction between first-formant onset frequency and voice-onset time, acoustic cues to the voicing distinction in English initial stop consonants. The nonspeech stimuli comprised two pure tones varying in relative onset time, and listeners were asked to judge the simultaneity of tone onsets. These judgments were affected by the frequency of the lower tone in a manner that parallels the influence of first-formant onset frequency on voicing judgments. This effect was shown to occur regardless of prior learning and to be systematic over a wide range of lower tone frequencies including frequencies beyond the range of possible first-formant frequencies of speech, suggesting that the effect in speech is not attributable to (tacit) knowledge of production constraints, as some current theories suggest.  相似文献   

7.
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception.  相似文献   

8.
Speech-understanding difficulties observed in elderly hearing-impaired listeners are predominantly errors in the recognition of consonants, particularly within consonants that share the same manner of articulation. Spectral shape is an important acoustic cue that serves to distinguish such consonants. The present study examined whether individual differences in speech understanding among elderly hearing-impaired listeners could be explained by individual differences in spectral-shape discrimination ability. This study included a group of 20 elderly hearing-impaired listeners, as well as a group of young normal-hearing adults for comparison purposes. All subjects were tested on speech-identification tasks, with natural and computer-synthesized speech stimuli, and on a series of spectral-shape discrimination tasks. As expected, the young normal-hearing adults performed better than the elderly listeners on many of the identification tasks and on all but two discrimination tasks. Regression analyses of the data from the elderly listeners revealed moderate predictive relationships between some of the spectral-shape discrimination thresholds and speech-identification performance. The results indicated that when all stimuli were at least minimally audible, some of the individual differences in the identification of natural and synthetic speech tokens by elderly hearing-impaired listeners were associated with corresponding differences in their spectral-shape discrimination abilities for similar sounds.  相似文献   

9.
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.  相似文献   

10.
Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories.  相似文献   

11.
This study examined the perceptual specialization for native-language speech sounds, by comparing native Hindi and English speakers in their perception of a graded set of English /w/-/v/ stimuli that varied in similarity to natural speech. The results demonstrated that language experience does not affect general auditory processes for these types of sounds; there were strong cross-language differences for speech stimuli, and none for stimuli that were nonspeech. However, the cross-language differences extended into a gray area of speech-like stimuli that were difficult to classify, suggesting that the specialization occurred in phonetic processing prior to categorization.  相似文献   

12.
Experiments were conducted to determine the underlying resolving power of the auditory system for temporal changes at the onset of speech and nonspeech stimuli. Stimulus sets included a bilabial VOT continuum and an analogous nonspeech continuum similar to the "noise-buzz" stimuli used by Miller et al. [J. Acoust. Soc. Am. 60, 410-417 (1976)]. The main difference between these and earlier experiments was that efforts were made to minimize both the trial-to-trial stimulus uncertainty and the cognitive load inherent in some of the testing procedures. Under conditions of minimal psychophysical uncertainty, not only does discrimination performance improve overall, but the local maximum, usually interpreted as evidence of categorical perception, is eliminated. Instead, discrimination performance for voice onset time (VOT) or noise lead time (NLT) is very accurate for short onset times and generally decreases with increasing onset time. This result suggests that "categorization" of familiar sounds is not the result of a psychoacoustic threshold (as Miller et al. have suggested) but rather of processing at a more central level of the auditory system.  相似文献   

13.
Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features.  相似文献   

14.
In English, voiced and voiceless syllable-initial stop consonants differ in both fundamental frequency at the onset of voicing (onset F0) and voice onset time (VOT). Although both correlates, alone, can cue the voicing contrast, listeners weight VOT more heavily when both are available. Such differential weighting may arise from differences in the perceptual distance between voicing categories along the VOT versus onset F0 dimensions, or it may arise from a bias to pay more attention to VOT than to onset F0. The present experiment examines listeners' use of these two cues when classifying stimuli in which perceptual distance was artificially equated along the two dimensions. Listeners were also trained to categorize stimuli based on one cue at the expense of another. Equating perceptual distance eliminated the expected bias toward VOT before training, but successfully learning to base decisions more on VOT and less on onset F0 was easier than vice versa. Perceptual distance along both dimensions increased for both groups after training, but only VOT-trained listeners showed a decrease in Garner interference. Results lend qualified support to an attentional model of phonetic learning in which learning involves strategic redeployment of selective attention across integral acoustic cues.  相似文献   

15.
For stimuli modeling stop consonants varying in the acoustic correlates of voice onset time (VOT), human listeners are more likely to perceive stimuli with lower f0's as voiced consonants--a pattern of perception that follows regularities in English speech production. The present study examines the basis of this observation. One hypothesis is that lower f0's enhance perception of voiced stops by virtue of perceptual interactions that arise from the operating characteristics of the auditory system. A second hypothesis is that this perceptual pattern develops as a result of experience with f0-voicing covariation. In a test of these hypotheses, Japanese quail learned to respond to stimuli drawn from a series varying in VOT through training with one of three patterns of f0-voicing covariation. Voicing and f0 varied in the natural pattern (shorter VOT, lower f0), in an inverse pattern (shorter VOT, higher f0), or in a random pattern (no f0-voicing covariation). Birds trained with stimuli that had no f0-voicing covariation exhibited no effect of f0 on response to novel stimuli varying in VOT. For the other groups, birds' responses followed the experienced pattern of covariation. These results suggest f0 does not exert an obligatory influence on categorization of consonants as [VOICE] and emphasize the learnability of covariation among acoustic characteristics of speech.  相似文献   

16.
To investigate possible auditory factors in the perception of stops and glides (e.g., /b/ vs /w/), a two-category labeling performance was compared on several series of /ba/-/wa/ stimuli and on corresponding nonspeech stimulus series that modeled the first-formant trajectories and amplitude rise times of the speech items. In most respects, performance on the speech and nonspeech stimuli was closely parallel. Transition duration proved to be an effective cue for both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets, and the category boundaries along the transition-duration dimension did not differ significantly in the two cases. When the stop/glide distinction was signaled by variation in transition duration, there was a reliable stimulus-length effect: A longer vowel shifted the category boundary toward greater transition durations. A similar effect was observed for the corresponding nonspeech stimuli. Variation in rise time had only a small effect in signaling both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets. There was, however, one discrepancy between the speech and nonspeech performance. When the stop/glide distinction was cued by rise-time variation, there was a stimulus-length effect, but no such effect occurred for the corresponding nonspeech stimuli. On balance, the results suggest that there are significant auditory commonalities between the perception of stops and glides and the perception of acoustically analogous nonspeech stimuli.  相似文献   

17.
Whether or not categorical perception results from the operation of a special, language-specific, speech mode remains controversial. In this cross-language (Mandarin Chinese, English) study of the categorical nature of tone perception, we compared native Mandarin and English speakers' perception of a physical continuum of fundamental frequency contours ranging from a level to rising tone in both Mandarin speech and a homologous (nonspeech) harmonic tone. This design permits us to evaluate the effect of language experience by comparing Chinese and English groups; to determine whether categorical perception is speech-specific or domain-general by comparing speech to nonspeech stimuli for both groups; and to examine whether categorical perception involves a separate categorical process, distinct from regions of sensory discontinuity, by comparing speech to nonspeech stimuli for English listeners. Results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. Categorical perception of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners, and perception of nonspeech stimuli was more categorical for English listeners than was perception of speech stimuli. These findings lead us to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations.  相似文献   

18.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

19.
Two recent accounts of the acoustic cues which specify place of articulation in syllable-initial stop consonants claim that they are located in the initial portions of the CV waveform and are context-free. Stevens and Blumstein [J. Acoust. Soc. Am. 64, 1358-1368 (1978)] have described the perceptually relevant spectral properties of these cues as static, while Kewley-Port [J. Acoust. Soc. Am. 73, 322-335 (1983)] describes these cues as dynamic. Three perceptual experiments were conducted to test predictions derived from these accounts. Experiment 1 confirmed that acoustic cues for place of articulation are located in the initial 20-40 ms of natural stop-vowel syllables. Next, short synthetic CV's modeled after natural syllables were generated using either a digital, parallel-resonance synthesizer in experiment 2 or linear prediction synthesis in experiment 3. One set of synthetic stimuli preserved the static spectral properties proposed by Stevens and Blumstein. Another set of synthetic stimuli preserved the dynamic properties suggested by Kewley-Port. Listeners in both experiments identified place of articulation significantly better from stimuli which preserved dynamic acoustic properties than from those based on static onset spectra. Evidently, the dynamic structure of the initial stop-vowel articulatory gesture can be preserved in context-free acoustic cues which listeners use to identify place of articulation.  相似文献   

20.
Three experiments tested the hypothesis that vowels play a disproportionate role in hearing talker identity, while consonants are more important in perceiving word meaning. In each study, listeners heard 128 stimuli consisting of two different words. Stimuli were balanced for same/different meaning, same/different talker, and male/female talker. The first word in each was intact, while the second was either intact (Experiment 1), or had vowels ("Consonants-Only") or consonants wels-Only") replaced by silence (Experiments 2, 3). Different listeners performed a same/ different judgment of either talker identity (Talker) or word meaning (Meaning). Baseline testing in Experiment 1 showed above-chance performance in both, with greater accuracy for Meaning. In Experiment 2, Talker identity was more accurately judged from Vowels-Only stimuli, with modestly better overall Meaning performance with Consonants-Only stimuli. However, performance with vowel-initial Vowels-Only stimuli in particular was most accurate of all. Editing Vowels-Only stimuli further in Experiment 3 had no effect on Talker discrimination, while dramatically reducing accuracy in the Meaning condition, including both vowel-initial and consonant-initial Vowels-Only stimuli. Overall, results confirmed a priori predictions, but are largely inconsistent with recent tests of vowels and consonants in sentence comprehension. These discrepancies and possible implications for the evolutionary origins of speech are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号