首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
Perception of sine-wave analogs of voice onset time stimuli   总被引:1,自引:0,他引:1  
It has been argued that perception of stop consonant voicing contrasts is based on auditory mechanisms responsible for the resolution of temporal order. As one source of evidence, category boundaries for nonspeech stimuli whose components vary in relative onset time are reasonably close to the labeling boundary for a labial stop voiced-voiceless continuum. However, voicing boundaries change considerably when the onset frequency of the first formant (F1) is varied--either directly or as a side effect of a change in F1 transition duration. Stimuli consisted of a midfrequency sinusoid that was initiated 0-50 ms prior to the onset of a low-frequency sinusoid. Results showed that the labeling boundary for relative onset time increased for longer durations of a low-frequency tone sweep. This effect is analogous to the F1 transition duration effect with synthetic speech. Further, the discrimination of differences in relative onset time was poorer for stimuli with longer frequency sweeps. However, unlike synthetic speech, there were no systematic effects when the frequency of a transitionless lower sinusoid was varied. These findings are discussed in relation to the potential contributions of auditory mechanisms and speech-specific processes in the perception of the voicing contrast.  相似文献   

2.
Whether or not categorical perception results from the operation of a special, language-specific, speech mode remains controversial. In this cross-language (Mandarin Chinese, English) study of the categorical nature of tone perception, we compared native Mandarin and English speakers' perception of a physical continuum of fundamental frequency contours ranging from a level to rising tone in both Mandarin speech and a homologous (nonspeech) harmonic tone. This design permits us to evaluate the effect of language experience by comparing Chinese and English groups; to determine whether categorical perception is speech-specific or domain-general by comparing speech to nonspeech stimuli for both groups; and to examine whether categorical perception involves a separate categorical process, distinct from regions of sensory discontinuity, by comparing speech to nonspeech stimuli for English listeners. Results show evidence of strong categorical perception of speech stimuli for Chinese but not English listeners. Categorical perception of nonspeech stimuli was comparable to that for speech stimuli for Chinese but weaker for English listeners, and perception of nonspeech stimuli was more categorical for English listeners than was perception of speech stimuli. These findings lead us to adopt a memory-based, multistore model of perception in which categorization is domain-general but influenced by long-term categorical representations.  相似文献   

3.
To investigate possible auditory factors in the perception of stops and glides (e.g., /b/ vs /w/), a two-category labeling performance was compared on several series of /ba/-/wa/ stimuli and on corresponding nonspeech stimulus series that modeled the first-formant trajectories and amplitude rise times of the speech items. In most respects, performance on the speech and nonspeech stimuli was closely parallel. Transition duration proved to be an effective cue for both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets, and the category boundaries along the transition-duration dimension did not differ significantly in the two cases. When the stop/glide distinction was signaled by variation in transition duration, there was a reliable stimulus-length effect: A longer vowel shifted the category boundary toward greater transition durations. A similar effect was observed for the corresponding nonspeech stimuli. Variation in rise time had only a small effect in signaling both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets. There was, however, one discrepancy between the speech and nonspeech performance. When the stop/glide distinction was cued by rise-time variation, there was a stimulus-length effect, but no such effect occurred for the corresponding nonspeech stimuli. On balance, the results suggest that there are significant auditory commonalities between the perception of stops and glides and the perception of acoustically analogous nonspeech stimuli.  相似文献   

4.
Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech?×?F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception.  相似文献   

5.
F1 structure provides information for final-consonant voicing   总被引:1,自引:0,他引:1  
Previous research has shown that F1 offset frequencies are generally lower for vowels preceding voiced consonants than for vowels preceding voiceless consonants. Furthermore, it has been shown that listeners use these differences in offset frequency in making judgments about final-consonant voicing. A recent production study [W. Summers, J. Acoust. Soc. Am. 82, 847-863 (1987)] reported that F1 frequency differences due to postvocalic voicing are not limited to the final transition or offset region of the preceding vowel. Vowels preceding voiced consonants showed lower F1 onset frequencies and lower F1 steady-state frequencies than vowels preceding voiceless consonants. The present study examined whether F1 frequency differences in the initial transition and steady-state regions of preceding vowels affect final-consonant voicing judgments in perception. The results suggest that F1 frequency differences in these early portions of preceding vowels do, in fact, influence listeners' judgments of postvocalic consonantal voicing.  相似文献   

6.
Listeners' auditory discrimination of vowel sounds depends in part on the order in which stimuli are presented. Such presentation order effects have been argued to be language independent, and to result from psychophysical (not speech- or language-specific) factors such as the decay of memory traces over time or increased weighting of later-occurring stimuli. In the present study, native Cantonese speakers' discrimination of a linguistic tone continuum is shown to exhibit order of presentation effects similar to those shown for vowels in previous studies. When presented with two successive syllables differing in fundamental frequency by approximately 4 Hz, listeners were significantly more sensitive to this difference when the first syllable was higher in frequency than the second. However, American English-speaking listeners with no experience listening to Cantonese showed no such contrast effect when tested in the same manner using the same stimuli. Neither English nor Cantonese listeners showed any order of presentation effects in the discrimination of a nonspeech continuum in which tokens had the same fundamental frequencies as the Cantonese speech tokens but had a qualitatively non-speech-like timbre. These results suggest that tone presentation order effects, unlike vowel effects, may be language specific, possibly resulting from the need to compensate for utterance-related pitch declination when evaluating fundamental frequency for tone identification.  相似文献   

7.
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.  相似文献   

8.
Several types of measurements were made to determine the acoustic characteristics that distinguish between voiced and voiceless fricatives in various phonetic environments. The selection of measurements was based on a theoretical analysis that indicated the acoustic and aerodynamic attributes at the boundaries between fricatives and vowels. As expected, glottal vibration extended over a longer time in the obstruent interval for voiced fricatives than for voiceless fricatives, and there were more extensive transitions of the first formant adjacent to voiced fricatives than for the voiceless cognates. When two fricatives with different voicing were adjacent, there were substantial modifications of these acoustic attributes, particularly for the syllable-final fricative. In some cases, these modifications leads to complete assimilation of the voicing feature. Several perceptual studies with synthetic vowel-consonant-vowel stimuli and with edited natural stimuli examined the role of consonant duration, extent and location of glottal vibration, and extent of formant transitions on the identification of the voicing characteristics of fricatives. The perceptual results were in general consistent with the acoustic observations and with expectations based on the theoretical model. The results suggest that listeners base their voicing judgments of intervocalic fricatives on an assessment of the time interval in the fricative during which there is no glottal vibration. This time interval must exceed about 60 ms if the fricative is to be judged as voiceless, except that a small correction to this threshold is applied depending on the extent to which the first-formant transitions are truncated at the consonant boundaries.  相似文献   

9.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

10.
Previous studies have shown that infants discriminate voice onset time (VOT) differences for certain speech contrasts categorically. In addition, investigations of nonspeech processing by infants also yield evidence of categorical discrimination of temporal-order differences. These findings have led some researchers to argue that common auditory mechanisms underlie the infant's discrimination of timing differences in speech and nonspeech contrasts [e.g., Jusczyk et al., J. Acoust. Soc. Am. 67, 262-270 (1980)]. Nevertheless, some discrepancies in the location of the infant's category boundaries for different kinds of contrasts have been noted [e.g., Eilers et al. (1980)]. Because different procedures were used to study the different kinds of contrasts, interpretation of the discrepancies between the studies has been difficult. In the present study, three different continua were examined: [ba]-[pa] stimuli, which differed in VOT; [du]-[tu] stimuli, which differed in VOT but which lacked format transitions; nonspeech formant onset time (FOT) stimuli that varied in the time that lower harmonics increased in amplitude. An experiment with adults indicated a close match between the perceptual boundaries for the three series. Similarly, tests with 2-month-old infants using high amplitude sucking procedure yielded estimates of perceptual category boundaries at between 20 and 40 ms for all three stimulus series.  相似文献   

11.
The perception of voicing in final velar stop consonants was investigated by systematically varying vowel duration, change in offset frequency of the final first formant (F1) transition, and rate of frequency change in the final F1 transition for several vowel contexts. Consonant-vowel-consonant (CVC) continua were synthesized for each of three vowels, [i,I,ae], which represent a range of relatively low to relatively high-F1 steady-state values. Subjects responded to the stimuli under both an open- and closed-response condition. Results of the study show that both vowel duration and F1 offset properties influence perception of final consonant voicing, with the salience of the F1 offset property higher for vowels with high-F1 steady-state frequencies than low-F1 steady-state frequencies, and the opposite occurring for the vowel duration property. When F1 onset and offset frequencies were controlled, rate of the F1 transition change had inconsistent and minimal effects on perception of final consonant voicing. Thus the findings suggest that it is the termination value of the F1 offset transition rather than rate and/or duration of frequency change, which cues voicing in final velar stop consonants during the transition period preceding closure.  相似文献   

12.
Gap detection thresholds for speech and analogous nonspeech stimuli were determined in younger and older adults with clinically normal hearing in the speech range. Gap detection thresholds were larger for older than for younger listeners in all conditions, with the size of the age difference increasing with stimulus complexity. For both ages, gap detection thresholds were far smaller when the markers before and after the gap were the same (spectrally symmetrical) compared to when they were different (spectrally asymmetrical) for both speech and nonspeech stimuli. Moreover, gap detection thresholds were smaller for nonspeech than for speech stimuli when the markers were spectrally symmetrical but the opposite was observed when the markers were spectrally asymmetrical. This pattern of results may reflect the benefit of activating well-learned gap-dependent phonemic contrasts. The stimulus-dependent age effects were interpreted as reflecting the differential effects of age-dependent losses in temporal processing ability on within- and between-channel gap detection.  相似文献   

13.
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction-adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.  相似文献   

14.
What's in a whisper?   总被引:4,自引:0,他引:4  
Whispering is a common, natural way of reducing speech perceptibility, but whether and how whispering affects consonant identification and the acoustic features presumed important for it in normal speech perception are unknown. In this experiment, untrained listeners identified 18 different whispered initial consonants significantly better than chance in nonsense syllables. The phonetic features of place and manner of articulation and, to a lesser extent, voicing, were correctly identified. Confusion matrix and acoustic analyses indicated preservation of resonance characteristics for place and manner of articulation and suggested the use of burst, aspiration, or frication duration and intensity, and/or first-formant cutback for voicing decisions.  相似文献   

15.
A series of experiments was carried out to investigate how fundamental frequency declination is perceived by speakers of English. Using linear predictor coded speech, nonsense sentences were constructed in which fundamental frequency on the last stressed syllable had been systematically varied. Listeners were asked to judge which stressed syllable was higher in pitch. Their judgments were found to reflect normalization for expected declination; in general, when two stressed syllables sounded equal in pitch, the second was actually lower. The pattern of normalization reflected certain major features of production patterns: A greater correction for declination was made for wide pitch range stimuli than for narrow pitch range stimuli. The slope of expected declination was less for longer stimuli than for shorter ones. Lastly, amplitude was found to have a significant effect on judgments, suggesting that the amplitude downdrift which normally accompanies fundamental frequency declination may have an important role in the perception of phrasing.  相似文献   

16.
Responses of chinchilla auditory nerve fibers to synthesized stop consonant syllables differing in voice-onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and chinchilla subjects. Synchronized discharge rates of neurons tuned to frequencies near the first formant increased at the onset of voicing for VOTs longer than 20 ms. Stimulus components near the formant or the neuron's characteristic frequency accounted for the increase. In these neurons, synchronized response changes were closely related to the same neuron's average discharge rates [D. G. Sinex and L. P. McDonald, J. Acoust. Soc. Am. 83, 1817-1827 (1988)]. Neurons tuned to frequency regions near the second and third formants usually responded to components near the second formant prior to the onset of voicing. These neurons' synchronized discharges could be captured by the first formant at the onset of voicing or with a latency of 50-60 ms, whichever was later. Since these neurons' average rate responses were unaffected by the onset of voicing, the latency of the synchronized response did provide as much additional neural cue to VOT. Overall, however, discharge synchrony did not provide as much information about VOT as was provided by the best average rate responses. The results are compared to other measurements of the peripheral encoding of speech sounds and to aspects of VOT perception.  相似文献   

17.
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.  相似文献   

18.
Experiments were conducted to determine the underlying resolving power of the auditory system for temporal changes at the onset of speech and nonspeech stimuli. Stimulus sets included a bilabial VOT continuum and an analogous nonspeech continuum similar to the "noise-buzz" stimuli used by Miller et al. [J. Acoust. Soc. Am. 60, 410-417 (1976)]. The main difference between these and earlier experiments was that efforts were made to minimize both the trial-to-trial stimulus uncertainty and the cognitive load inherent in some of the testing procedures. Under conditions of minimal psychophysical uncertainty, not only does discrimination performance improve overall, but the local maximum, usually interpreted as evidence of categorical perception, is eliminated. Instead, discrimination performance for voice onset time (VOT) or noise lead time (NLT) is very accurate for short onset times and generally decreases with increasing onset time. This result suggests that "categorization" of familiar sounds is not the result of a psychoacoustic threshold (as Miller et al. have suggested) but rather of processing at a more central level of the auditory system.  相似文献   

19.
While a large portion of the variance among listeners in speech recognition is associated with the audibility of components of the speech waveform, it is not possible to predict individual differences in the accuracy of speech processing strictly from the audiogram. This has suggested that some of the variance may be associated with individual differences in spectral or temporal resolving power, or acuity. Psychoacoustic measures of spectral-temporal acuity with nonspeech stimuli have been shown, however, to correlate only weakly (or not at all) with speech processing. In a replication and extension of an earlier study [Watson et al., J. Acoust. Soc. Am. Suppl. 1 71. S73 (1982)] 93 normal-hearing college students were tested on speech perception tasks (nonsense syllables, words, and sentences in a noise background) and on six spectral-temporal discrimination tasks using simple and complex nonspeech sounds. Factor analysis showed that the abilities that explain performance on the nonspeech tasks are quite distinct from those that account for performance on the speech tasks. Performance was significantly correlated among speech tasks and among nonspeech tasks. Either, (a) auditory spectral-temporal acuity for nonspeech sounds is orthogonal to speech processing abilities, or (b) the appropriate tasks or types of nonspeech stimuli that challenge the abilities required for speech recognition have yet to be identified.  相似文献   

20.
The influence of vocalic context on various temporal and spectral properties of preceding acoustic segments was investigated in utterances containing [schwa No. CV] sequences produced by two girls aged 4;8 and 9;5 years and by their father. The younger (but not the older) child's speech showed a systematic lowering of [s] noise and [th] release burst spectra before [u] as compared to [i] and [ae]. The older child's speech, on the other hand, showed an orderly relationship of the second-formant frequency in [] to the transconsonantal vowel. Both children tended to produce longer [s] noises and voice onset times as well as higher second-formant peaks at constriction noise offset before [i] than before [u] and [ae]. All effects except the first were shown by the adult who, in addition, produced first-formant frequencies in [] that anticipated the transconsonantal vowel. These observations suggest that different forms of anticipatory coarticulation may have different causes and may follow different developmental patterns. A strategy for future research is suggested.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号