首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Responses of chinchilla auditory nerve fibers to synthesized stop consonant syllables differing in voice-onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and chinchilla subjects. Synchronized discharge rates of neurons tuned to frequencies near the first formant increased at the onset of voicing for VOTs longer than 20 ms. Stimulus components near the formant or the neuron's characteristic frequency accounted for the increase. In these neurons, synchronized response changes were closely related to the same neuron's average discharge rates [D. G. Sinex and L. P. McDonald, J. Acoust. Soc. Am. 83, 1817-1827 (1988)]. Neurons tuned to frequency regions near the second and third formants usually responded to components near the second formant prior to the onset of voicing. These neurons' synchronized discharges could be captured by the first formant at the onset of voicing or with a latency of 50-60 ms, whichever was later. Since these neurons' average rate responses were unaffected by the onset of voicing, the latency of the synchronized response did provide as much additional neural cue to VOT. Overall, however, discharge synchrony did not provide as much information about VOT as was provided by the best average rate responses. The results are compared to other measurements of the peripheral encoding of speech sounds and to aspects of VOT perception.  相似文献   

2.
Voice onset time (VOT) is a temporal cue that can distinguish consonants such as /d/ from /t/. It has previously been shown that neurons' responses to the onset of voicing are strongly dependent on their static spectral sensitivity. This study examined the relation between temporal resolution, determined from responses to sinusoidally amplitude-modulated (SAM) tones, and responses to syllables with different VOTs. Responses to syllables and SAM tones were obtained from low-frequency neurons in the inferior colliculus (IC) of the chinchilla. VOT and modulation period varied from 10 to 70 ms in 10-ms steps, and discharge rates elicited by stimuli whose amplitude envelopes were modulated over the same temporal interval were compared. Neurons that respond preferentially to syllables with particular VOTs might be expected to respond best to the SAM tones with comparable modulation periods. However, no consistent agreement between responses to VOT syllables and to SAM tones was obtained. These results confirm the previous suggestion that IC neurons' selectivity for VOT is determined by spectral rather than temporal sensitivity.  相似文献   

3.
Human and chinchilla listeners exhibit nonmonotonic temporal acuity for speech sounds differing in voice onset time (VOT). Characteristics of the neural discharge pattern or of the stimuli themselves that might account for the pattern of temporal acuity have not been described. Responses of chinchilla auditory-nerve fibers to syllables from an alveolar VOT continuum were measured. Peak discharge rates and peak response latencies elicited by the syllables with the shortest and longest VOTs were highly variable across groups of neurons with similar characteristic frequencies. For VOTs from the middle of the continuum, peak responses were larger, and response latencies were nearly constant across the same group of neurons. Overall, the magnitude and temporal variability of the responses of populations of primary auditory neurons varied nonmonotonically with VOT, consistent with the pattern of psychophysical temporal acuity for these syllables exhibited by humans and chinchillas. Spectral analyses suggested by the pattern of neural responses indicated that synchronous or correlated spectral cues were available over a wider bandwidth for those syllables from the middle of the continuum for which the neural representation was least variable.  相似文献   

4.
Responses of chinchilla auditory-nerve fibers to synthesized stop consonants differing in voice onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and with chinchilla subjects. Average discharge rates of neurons tuned to the frequency region near the first formant generally increased at the onset of voicing, for VOTs longer than 20 ms. These rate increases were closely related to spectral amplitude changes associated with the onset of voicing and with the activation of the first formant; as a result, they provided accurate information about VOT. Neurons tuned to frequency regions near the second and third formants did not encode VOT in their average discharge rates. Modulations in the average rates of these neurons reflected spectral variations that were independent of VOT. The results are compared to other measurements of the peripheral encoding of speech sounds and to psychophysical observations suggesting that syllables with large variations in VOT are heard as belonging to one of only two phonemic categories.  相似文献   

5.
Measurements of the temporal characteristics of word-initial stressed syllables in CV CV-type words in Modern Greek showed that the timing of the initial consonant in terms of its closure duration and voice onset time (VOT) is dependent on place and manner of articulation. This is contrary to recent accounts of word-initial voiceless consonants in English which propose that closure and VOT together comprise a voiceless interval independent of place and manner of articulation. The results also contribute to the development of a timing model for Modern Greek which generates closure, VOT, and vowel durations for word-initial, stressed CV syllables. The model is made up of a series of rules operating in an ordered fashion on a given word duration to derive first a stressed syllable duration and then all intrasyllabic acoustic intervals.  相似文献   

6.
Natural speech consonant-vowel (CV) syllables [( f, s, theta, s, v, z, ?] followed by [i, u, a]) were computer edited to include 20-70 ms of their frication noise in 10-ms steps as measured from their onset, as well as the entire frication noise. These stimuli, and the entire syllables, were presented to 12 subjects for consonant identification. Results show that the listener does not require the entire fricative-vowel syllable in order to correctly perceive a fricative. The required frication duration depends on the particular fricative, ranging from approximately 30 ms for [s, z] to 50 ms for [f, s, v], while [theta, ?] are identified with reasonable accuracy in only the full frication and syllable conditions. Analysis in terms of the linguistic features of voicing, place, and manner of articulation revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frication duration than identification in terms of voicing and manner of articulation.  相似文献   

7.
The responses of four high-spontaneous fibers from a damaged cat cochlea responding to naturally uttered consonant-vowel (CV) syllables [m], [p], and [t], each with [a], [i], and [u] in four different levels of noise were simulated using a two-stage computer model. At the lowest noise level [+30 dB signal-to-noise (S/N) ratio], the responses of the models of the three fibers from a heavily damaged portion of the cochlea [characteristic frequencies (CFs) from 1.6 to 2.14 kHz] showed quite different response patterns from those of fibers in normal cochleas: There was little response to the noise alone, the consonant portions of the syllables evoked small-amplitude wide-bandwidth complexes, and the vowel-segment response synchrony was often masked by low-frequency components, especially the first formant. At the next level of noise (S/N = 20 dB), spectral information regarding the murmur segments of the [m] syllables was essentially lost. At the highest noise levels used (S/N = +10 and 0 dB), the noise was almost totally disruptive of coding of the spectral peaks of the consonant portions of the stop CVs. Possible implications of the results with regard to the understanding of speech by hearing-impaired listeners are discussed.  相似文献   

8.
Auditory evoked potential (AEP) correlates of the neural representation of stimuli along a /ga/-/ka/ and a /ba/-/pa/ continuum were examined to determine whether the voice-onset time (VOT)-related change in the N1 onset response from a single to double-peaked component is a reliable indicator of the perception of voiced and voiceless sounds. Behavioral identification results from ten subjects revealed a mean category boundary at a VOT of 46 ms for the /ga/-/ka/ continuum and at a VOT of 27.5 ms for the /ba/-/pa/ continuum. In the same subjects, electrophysiologic recordings revealed that a single N1 component was seen for stimuli with VOTs of 30 ms and less, and two components (N1' and N1) were seen for stimuli with VOTs of 40 ms and more for both continua. That is, the change in N1 morphology (from single to double-peaked) coincided with the change in perception from voiced to voiceless for stimuli from the /ba/-/pa/ continuum, but not for stimuli from the /ga/-/ka/ continuum. The results of this study show that N1 morphology does not reliably predict phonetic identification of stimuli varying in VOT. These findings also suggest that the previously reported appearance of a "double-peak" onset response in aggregate recordings from the auditory cortex does not indicate a cortical correlate of the perception of voicelessness.  相似文献   

9.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

10.
The noninvasive imaging of the monkey auditory system with functional magnetic resonance imaging (fMRI) can bridge the gap between electrophysiological studies in monkeys and imaging studies in humans. Some of the recent imaging of monkey auditory cortical and subcortical structures relies on a technique of “sparse imaging,” which was developed in human studies to sidestep the negative influence of scanner noise by adding periods of silence in between volume acquisition. Among the various aspects that have gone into the ongoing optimization of fMRI of the monkey auditory cortex, replacing the more common continuous-imaging paradigm with sparse imaging seemed to us to make the most obvious difference in the amount of activity that we could reliably obtain from awake or anesthetized animals. Here, we directly compare the sparse- and continuous-imaging paradigms in anesthetized animals. We document a strikingly greater auditory response with sparse imaging, both quantitatively and qualitatively, which includes a more expansive and robust tonotopic organization. There were instances where continuous imaging could better reveal organizational properties that sparse imaging missed, such as aspects of the hierarchical organization of auditory cortex. We consider the choice of imaging paradigm as a key component in optimizing the fMRI of the monkey auditory cortex.  相似文献   

11.
In contrast to the availability of consonant confusion studies with adults, to date, no investigators have compared children's consonant confusion patterns in noise to those of adults in a single study. To examine whether children's error patterns are similar to those of adults, three groups of children (24 each in 4-5, 6-7, and 8-9 yrs. old) and 24 adult native speakers of American English (AE) performed a recognition task for 15 AE consonants in /ɑ/-consonant-/ɑ/ nonsense syllables presented in a background of speech-shaped noise. Three signal-to-noise ratios (SNR: 0, +5, and +10 dB) were used. Although the performance improved as a function of age, the overall consonant recognition accuracy as a function of SNR improved at a similar rate for all groups. Detailed analyses using phonetic features (manner, place, and voicing) revealed that stop consonants were the most problematic for all groups. In addition, for the younger children, front consonants presented in the 0 dB SNR condition were more error prone than others. These results suggested that children's use of phonetic cues do not develop at the same rate for all phonetic features.  相似文献   

12.
The voice onset time (VOT) of a stop consonant is the interval between its burst onset and voicing onset. Among a variety of research topics on VOT, one that has been studied for years is how VOTs are efficiently measured. Manual annotation is a feasible way, but it becomes a time-consuming task when the corpus size is large. This paper proposes an automatic VOT estimation method based on an onset detection algorithm. At first, a forced alignment is applied to identify the locations of stop consonants. Then a random forest based onset detector searches each stop segment for its burst and voicing onsets to estimate a VOT. The proposed onset detection can detect the onsets in an efficient and accurate manner with only a small amount of training data. The evaluation data extracted from the TIMIT corpus were 2344 words with a word-initial stop. The experimental results showed that 83.4% of the estimations deviate less than 10 ms from their manually labeled values, and 96.5% of the estimations deviate by less than 20 ms. Some factors that influence the proposed estimation method, such as place of articulation, voicing of a stop consonant, and quality of succeeding vowel, were also investigated.  相似文献   

13.
The purpose of this report is to present new data that provide a novel perspective on temporal masking, different from that found in the classical auditory literature on this topic. Specifically, measurement conditions are presented that minimize rather than maximize temporal spread of masking for a gated (200-ms) narrow-band (405-Hz-wide) noise masker logarithmically centered at 2500 Hz. Masked detection thresholds were measured for brief sinusoids in a two-interval, forced-choice (21FC) task. Detection was measured at each of 43 temporal positions within the signal observation interval for the sinusoidal signal presented either preceding, during, or following the gating of the masker, which was centered temporally within each 500-ms observation interval. Results are presented for three listeners; first, for detection of a 1900-Hz signal across a range of masker component levels (0-70 dB SPL) and, second, for masked detection as a function of signal frequency (fs = 500-5000 Hz) for a fixed masker component level (40 dB SPL). For signals presented off-frequency from the masker, and at low-to-moderate masker levels, the resulting temporal masking functions are characterized by sharp temporal edges. The sharpness of the edges is accentuated by complex patterns of temporal overshoot and undershoot, corresponding with diminished and enhanced detection, respectively, at both masker onset and offset. This information about the onset and offset timing of the gated masker is faithfully represented in the temporal masking functions over the full decade range of signal frequencies (except for fs=2500 Hz presented at the center frequency of the masker). The precise representation of the timing information is remarkable considering that the temporal envelope characteristics of the gated masker are evident in the remote masking response at least two octaves below the frequencies of the masker at a cochlear place where little or no masker activity would be expected. This general enhancement of the temporal edges of the masking response is reminiscent of spectral edge enhancement by lateral suppression/inhibition.  相似文献   

14.
Confusion patterns among English consonants were examined using log-linear modeling techniques to assess the influence of low-pass filtering, shaped noise, presentation level, and consonant position. Ten normal-hearing listeners were presented consonant-vowel (CV) and vowel-consonant (VC) syllables containing the vowel /a/. Stimuli were presented in quiet and in noise, and were either filtered or broadband. The noise was shaped such that the effective signal level in each 1/3 octave band was equivalent in quiet and noise listening conditions. Three presentation levels were analyzed corresponding to the overall rms level of the combined speech stimuli. Error patterns were affected significantly by presentation level, filtering, and consonant position as a complex interaction. The effect of filtering was dependent on presentation level and consonant position. The effects stemming from the noise were less pronounced. Specific confusions responsible for these effects were isolated, and an acoustical interaction is suggested, stressing the spectral characteristics of the signals and their modification by presentation level and filtering.  相似文献   

15.
Five commonly used methods for determining the onset of voicing of syllable-initial stop consonants were compared. The speech and glottal activity of 16 native speakers of Cantonese with normal voice quality were investigated during the production of consonant vowel (CV) syllables in Cantonese. Syllables consisted of the initial consonants /ph/, /th/, /kh/, /p/, /t/, and /k/ followed by the vowel /a/. All syllables had a high level tone, and were all real words in Cantonese. Measurements of voicing onset were made based on the onset of periodicity in the acoustic waveform, and on spectrographic measures of the onset of a voicing bar (f0), the onset of the first formant (F1), second formant (F2), and third formant (F3). These measurements were then compared against the onset of glottal opening as determined by electroglottography. Both accuracy and variability of each measure were calculated. Results suggest that the presence of aspiration in a syllable decreased the accuracy and increased the variability of spectrogram-based measurements, but did not strongly affect measurements made from the acoustic waveform. Overall, the acoustic waveform provided the most accurate estimate of voicing onset; measurements made from the amplitude waveform were also the least variable of the five measures. These results can be explained as a consequence of differences in spectral tilt of the voicing source in breathy versus modal phonation.  相似文献   

16.
Experiments were conducted to determine the underlying resolving power of the auditory system for temporal changes at the onset of speech and nonspeech stimuli. Stimulus sets included a bilabial VOT continuum and an analogous nonspeech continuum similar to the "noise-buzz" stimuli used by Miller et al. [J. Acoust. Soc. Am. 60, 410-417 (1976)]. The main difference between these and earlier experiments was that efforts were made to minimize both the trial-to-trial stimulus uncertainty and the cognitive load inherent in some of the testing procedures. Under conditions of minimal psychophysical uncertainty, not only does discrimination performance improve overall, but the local maximum, usually interpreted as evidence of categorical perception, is eliminated. Instead, discrimination performance for voice onset time (VOT) or noise lead time (NLT) is very accurate for short onset times and generally decreases with increasing onset time. This result suggests that "categorization" of familiar sounds is not the result of a psychoacoustic threshold (as Miller et al. have suggested) but rather of processing at a more central level of the auditory system.  相似文献   

17.

Background  

Primary auditory cortex (AI) neurons show qualitatively distinct response features to successive acoustic signals depending on the inter-stimulus intervals (ISI). Such ISI-dependent AI responses are believed to underlie, at least partially, categorical perception of click trains (elemental vs. fused quality) and stop consonant-vowel syllables (eg.,/da/-/ta/continuum).  相似文献   

18.
SUMMARY: The present study investigated the effect of tonal changes on voice onset time (VOT) between normal laryngeal (NL) and superior esophageal (SE) speakers of Mandarin Chinese. VOT values were measured from the syllables /pha/, /tha/, and /kha/ produced at four tone levels by eight NL and seven SE speakers who were native speakers of Mandarin. Results indicated that Mandarin tones were associated with significantly different VOT values for NL speakers, in which high-falling tone was associated with significantly shorter VOT values than mid-rising tone and falling-rising tone. Regarding speaker group, SE speakers showed significantly shorter VOT values than NL speakers across all tone levels. This may be related to their use of pharyngoesophageal (PE) segment as another sound source. SE speakers appear to take a shorter time to start PE segment vibration compared to NL speakers using the vocal folds for vibration.  相似文献   

19.
On the role of spectral transition for speech perception   总被引:2,自引:0,他引:2  
This paper examines the relationship between dynamic spectral features and the identification of Japanese syllables modified by initial and/or final truncation. The experiments confirm several main points. "Perceptual critical points," where the percent correct identification of the truncated syllable as a function of the truncation position changes abruptly, are related to maximum spectral transition positions. A speech wave of approximately 10 ms in duration that includes the maximum spectral transition position bears the most important information for consonant and syllable perception. Consonant and vowel identification scores simultaneously change as a function of the truncation position in the short period, including the 10-ms period for final truncation. This suggests that crucial information for both vowel and consonant identification is contained across the same initial part of each syllable. The spectral transition is more crucial than unvoiced and buzz bar periods for consonant (syllable) perception, although the latter features are of some perceptual importance. Also, vowel nuclei are not necessary for either vowel or syllable perception.  相似文献   

20.
On the basis of theoretical considerations and the results of experiments with synthetic consonant-vowel syllables, it has been hypothesized that the short-time spectrum sampled at the onset of a stop consonant should exhibit gross properties that uniquely specify the consonantal place of articulation independent of the following vowel. The aim of this paper is to test this hypothesis by measuring the spectrum sampled at the onsets and offsets of a large number of consonant-vowel (CV) and vowel-consonant (VC) syllables containing both voiced and voiceless stops produced by several speakers. Templates were devised in an attempt to capture three classes of spectral shapes: diffuse-rising, diffuse-falling, and compact, corresponding to alveolar, labial, and velar consonants, respectively. Spectra were derived from the utterances by sampling at the consonantal release of CV syllables and at the implosion and burst release of VC syllables, and these spectra (smoothed by a linear prediction algorithm) were matched against the templates. It was found that about 85% of the spectra at initial consonant release and at final burst release were correctly classified by the templates, although there was some variability across vowel contexts. The spectra sampled at the implosion were not consistently classified. A preliminary examination of spectra sampled at the release of nasal consonants in CV syllables showed a somewhat lower accuracy of classification by the same templates. Overall, the results support an hypothesis that, in natural speech, the acoustic characteristics of stop consonants, specified in terms of the gross spectral shape sampled at the discontinuity in the acoustic signal, show invariant properties independent of the adjacent vowel or of the voicing characteristics of the consonant. The implication is that the auditory system is endowed with detectors that are sensitive to these kinds of gross spectral shapes, and that the existence of these detectors helps the infant to organize the sounds of speech into their natural classes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号