共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper reports acoustic measurements and results from a series of perceptual experiments on the voiced-voiceless distinction for syllable-final stop consonants in absolute final position and in the context of a following syllable beginning with a different stop consonant. The focus is on temporal cues to the distinction, with vowel duration and silent closure duration as the primary and secondary dimensions, respectively. The main results are that adding a second syllable to a monosyllable increases the number of voiced stop consonant responses, as does shortening of the closure duration in disyllables. Both of these effects are consistent with temporal regularities in speech production: Vowel durations are shorter in the first syllable of disyllables than in monosyllables, and closure durations are shorter for voiced than for voiceless stops in disyllabic utterances of this type. While the perceptual effects thus may derive from two separate sources of tacit phonetic knowledge available to listeners, the data are also consistent with an interpretation in terms of a single effect; one of temporal proximity of following context. 相似文献
2.
R N Ohde 《The Journal of the Acoustical Society of America》1985,78(5):1554-1561
Fundamental frequency (F0) and voice onset time (VOT) were measured in utterances containing voiceless aspirated [ph, th, kh], voiceless unaspirated [sp, st, sk], and voiced [b, d, g] stop consonants produced in the context of [i, e, u, o, a] by 8- to 9-year-old subjects. The results revealed that VOT reliably differentiated voiceless aspirated from voiceless unaspirated and voiced stops, whereas F0 significantly contrasted voiced with voiceless aspirated and unaspirated stops, except for the first glottal period, where voiceless unaspirated stops contrasted with the other two categories. Fundamental frequency consistently differentiated vowel height in alveolar and velar stop consonant environments only. In comparing the results of these children and of adults, it was observed that the acoustic correlates of stop consonant voicing and vowel quality were different not only in absolute values, but also in terms of variability. Further analyses suggested that children were more variable in production due to inconsistency in achieving specific targets. The findings also suggest that, of the acoustic correlates of the voicing feature, the primary distinction of VOT is strongly developed by 8-9 years of age, whereas the secondary distinction of F0 is still in an emerging state. 相似文献
3.
This study investigated the binaural temporal window in adults and children 5-10.5 years of age. Detection thresholds were estimated for a brief, interaurally out-of-phase (Spi) 500 Hz pure tone signal masked by bandpass, 100-2000 Hz Gaussian noise. In one set of conditions, the masker was consistently either in phase (No) or out of phase (Npi). In another set of conditions, the masker changed abruptly in interaural phase (NoNpi or NpiNo), and threshold was estimated at a range of delays with respect to the phase transition. Masked thresholds were also obtained in further conditions where the masker interaural phase was steady and the signal was of long duration. Age effects obtained with dynamic maskers could be accounted for by positing that children have a binaural temporal window with a relatively prolonged leading edge or that the children position the binaural temporal window relatively late with respect to the signal. Modeling of the reduced masking-level difference shown by children for a brief Spi signal presented in a steady No or Npi masker was more consistent with late placement of a symmetrical binaural temporal window than a binaural temporal window having a relatively prolonged leading edge. 相似文献
4.
Despite a lack of traditional speech features, novel sentences restricted to a narrow spectral slit can retain nearly perfect intelligibility [R. M. Warren et al., Percept. Psychophys. 57, 175-182 (1995)]. The current study employed 514 listeners to elucidate the cues allowing this high intelligibility, and to examine generally the use of narrow-band temporal speech patterns. When 1/3-octave sentences were processed to preserve the overall temporal pattern of amplitude fluctuation, but eliminate contrasting amplitude patterns within the band, sentence intelligibility dropped from values near 100% to values near zero (experiment 1). However, when a 1/3-octave speech band was partitioned to create a contrasting pair of independently amplitude-modulated 1/6-octave patterns, some intelligibility was restored (experiment 2). An additional experiment (3) showed that temporal patterns can also be integrated across wide frequency separations, or across the two ears. Despite the linguistic content of single temporal patterns, open-set intelligibility does not occur. Instead, a contrast between at least two temporal patterns is required for the comprehension of novel sentences and their component words. These contrasting patterns can reside together within a narrow range of frequencies, or they can be integrated across frequencies or ears. This view of speech perception, in which across-frequency changes in energy are seen as systematic changes in the temporal fluctuation patterns at two or more fixed loci, is more in line with the physiological encoding of complex signals. 相似文献
5.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure. 相似文献
6.
7.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not. 相似文献
8.
J R Westbury 《The Journal of the Acoustical Society of America》1983,73(4):1322-1336
Measurements were made of saggital plane movements of the larynx, soft palate, and portions of the tongue, from a high-speed cinefluorographic film of utterances produced by one adult male speaker of American English. These measures were then used to approximate the temporal variations in supraglottal cavity volume during the closures of voiced and voiceless stop consonants. All data were subsequently related to a synchronous acoustic recording of the utterances. Instances of /p,t,k/ were always accompanied by silent closures, and sometimes accompanied by decreases in supraglottal volume. In contrast, instances of /b,d,g/ were always accompanied both by significant intervals of vocal fold vibration during closure, and relatively large increases in supraglottal volume. However, the magnitudes of volume increments during the voiced stops, and the means by which those increments were achieved, differed considerably across place of articulation and phonetic environment. These results are discussed in the context of a well-known model of the breath-stream control mechanism, and their relevance for a general theory of speech motor control is considered. 相似文献
9.
AA Shultz AL Francis F Llanos 《The Journal of the Acoustical Society of America》2012,132(2):EL95-EL101
This study examines English speakers' relative weighting of two voicing cues in production and perception. Participants repeated words differing in initial consonant voicing ([b] or [p]) and labeled synthesized tokens ranging between [ba] and [pa] orthogonally according to voice onset time (VOT) and onset f0. Discriminant function analysis and logistic regression were used to calculate individuals' relative weighting of each cue. Production results showed a significant negative correlation of VOT and onset f0, while perception results showed a trend toward a positive correlation. No significant correlations were found across perception and production, suggesting a complex relationship between the two domains. 相似文献
10.
11.
Auditory evoked potential (AEP) correlates of the neural representation of stimuli along a /ga/-/ka/ and a /ba/-/pa/ continuum were examined to determine whether the voice-onset time (VOT)-related change in the N1 onset response from a single to double-peaked component is a reliable indicator of the perception of voiced and voiceless sounds. Behavioral identification results from ten subjects revealed a mean category boundary at a VOT of 46 ms for the /ga/-/ka/ continuum and at a VOT of 27.5 ms for the /ba/-/pa/ continuum. In the same subjects, electrophysiologic recordings revealed that a single N1 component was seen for stimuli with VOTs of 30 ms and less, and two components (N1' and N1) were seen for stimuli with VOTs of 40 ms and more for both continua. That is, the change in N1 morphology (from single to double-peaked) coincided with the change in perception from voiced to voiceless for stimuli from the /ba/-/pa/ continuum, but not for stimuli from the /ga/-/ka/ continuum. The results of this study show that N1 morphology does not reliably predict phonetic identification of stimuli varying in VOT. These findings also suggest that the previously reported appearance of a "double-peak" onset response in aggregate recordings from the auditory cortex does not indicate a cortical correlate of the perception of voicelessness. 相似文献
12.
Nittrouer S 《The Journal of the Acoustical Society of America》2005,118(2):1072-1088
Because laboratory studies are conducted in optimal listening conditions, often with highly stylized stimuli that attenuate or eliminate some naturally occurring cues, results may have constrained applicability to the "real world." Such studies show that English-speaking adults weight vocalic duration greatly and formant offsets slightly in voicing decisions for word-final obstruents. Using more natural stimuli, Nittrouer [J. Acoust. Soc. Am. 115, 1777-1790 (2004)] found different results, raising questions about what would happen if experimental conditions were even more like the real world. In this study noise was used to simulate the real world. Edited natural words with voiced and voiceless final stops were presented in quiet and noise to adults and children (4 to 8 years) for labeling. Hypotheses tested were (1) Adults (and perhaps older children) would weight vocalic duration more in noise than in quiet; (2) Previously reported age-related differences in cue weighting might not be found in this real-world simulation; and (3) Children would experience greater masking than adults. Results showed: (1) no increase for any age listeners in the weighting of vocalic duration in noise; (2) age-related differences in the weighting of cues in both quiet and noise; and (3) masking effects for all listeners, but more so for children than adults. 相似文献
13.
Gick B Ikegami Y Derrick D 《The Journal of the Acoustical Society of America》2010,128(5):EL342-EL346
Asynchronous cross-modal information is integrated asymmetrically in audio-visual perception. To test whether this asymmetry generalizes across modalities, auditory (aspirated "pa" and unaspirated "ba" stops) and tactile (slight, inaudible, cutaneous air puffs) signals were presented synchronously and asynchronously. Results were similar to previous AV studies: the temporal window of integration for the enhancement effect (but not the interference effect) was asymmetrical, allowing up to 200 ms of asynchrony when the puff followed the audio signal, but only up to 50 ms when the puff preceded the audio signal. These findings suggest that perceivers accommodate differences in physical transmission speed of different multimodal signals. 相似文献
14.
15.
J M Lindholm M Dorman B E Taylor M T Hannley 《The Journal of the Acoustical Society of America》1988,83(4):1608-1614
The effects of mild-to-moderate hearing impairment on the perceptual importance of three acoustic correlates of stop consonant place of articulation were examined. Normal-hearing and hearing-impaired adults identified a stimulus set comprising all possible combinations of the levels of three factors: formant transition type (three levels), spectral tilt type (three levels), and abruptness of frequency change (two levels). The levels of these factors correspond to those appropriate for /b/, /d/, and /g/ in the /ae/ environment. Normal-hearing subjects responded primarily in accord with the place of articulation specified by the formant transitions. Hearing-impaired subjects showed less-than-normal reliance on formant transitions and greater-than-normal reliance on spectral tilt and abruptness of frequency change. These results suggest that hearing impairment affects the perceptual importance of cues to stop consonant identity, increasing the importance of information provided by both temporal characteristics and gross spectral shape and decreasing the importance of information provided by the formant transitions. 相似文献
16.
Krumbholz K Bleeck S Patterson RD Senokozlieva M Seither-Preisler A Lütkenhöner B 《The Journal of the Acoustical Society of America》2005,118(2):946-954
Temporal models of pitch are based on the assumption that the auditory system measures the time intervals between neural events, and that pitch corresponds to the most common time interval. The current experiments were designed to test whether time intervals are analyzed independently in each peripheral channel, or whether the time-interval analysis in one channel is affected by synchronous activity in other channels. Regular and irregular click trains were filtered into narrow frequency bands to produce target and flanker stimuli. The threshold for discriminating a regular target from an irregular distracter click train was measured in the presence of an irregular masker click train in the target band, as a function of the frequency separation between the target band and a flanker band. The flanker click train was either regular or irregular. The threshold for detecting the regular target was 5-7 dB lower when the flanker was regular. The data indicate that the detection of temporal regularity (and thus, pitch) involves cross-channel processes that can operate over widely separated channels. Model simulations suggest that these cross-channel processes occur after the time-interval extraction stage and that they depend on the similarity, or consistency, of the time-interval patterns in the relevant channels. 相似文献
17.
For normally hearing subjects shortening the silence duration of an intervocalic voiceless plosive induces a misperception of voicing. The time boundary for this effect is about 60 ms, which corresponds to a possible forward masking effect at the frequency of voicing. If recovery from masking is indeed involved, hearing-impaired subjects, who may have prolonged forward masking, can be expected to show abnormally long time boundary for voicing misperception. This study investigated the perception of voicing of an intervocalic plosive for a natural speech sample "aka" as a function of occlusive silence duration for normally hearing and hearing-impaired subjects. To investigate a correlation with forward masking, a second test was performed on the subjects. The same first a of the "aka" was selected and at its end was concatenated a voiced murmur taken from an "aga" elocution from the same speaker, and the minimum duration of the voiced murmur necessary for it to be perceived was measured. About half of the hearing-impaired subjects needed an abnormally long silence duration to avoid voicing misperception. The data indicate a significant correlation between the results of the two tests with a slope of regression line close to unity, and thus support the hypothesis of a voicing perception ruled by recovery from forward masking. Increase in silence duration of voiceless plosives might then be a beneficial acoustical processing for some hearing-impaired subjects. 相似文献
18.
19.
P G Stelmachowicz A L Pittman B M Hoover D E Lewis 《The Journal of the Acoustical Society of America》2001,110(4):2183-2190
Recent studies with adults have suggested that amplification at 4 kHz and above fails to improve speech recognition and may even degrade performance when high-frequency thresholds exceed 50-60 dB HL. This study examined the extent to which high frequencies can provide useful information for fricative perception for normal-hearing and hearing-impaired children and adults. Eighty subjects (20 per group) participated. Nonsense syllables containing the phonemes /s/, /f/, and /O/, produced by a male, female, and child talker, were low-pass filtered at 2, 3, 4, 5, 6, and 9 kHz. Frequency shaping was provided for the hearing-impaired subjects only. Results revealed significant differences in recognition between the four groups of subjects. Specifically, both groups of children performed more poorly than their adult counterparts at similar bandwidths. Likewise, both hearing-impaired groups performed more poorly than their normal-hearing counterparts. In addition, significant talker effects for /s/ were observed. For the male talker, optimum performance was reached at a bandwidth of approximately 4-5 kHz, whereas optimum performance for the female and child talkers did not occur until a bandwidth of 9 kHz. 相似文献