首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Responses of auditory-nerve fibers in anesthetized cats to nine different spoken stop- and nasal-consonant/vowel syllables presented at 70 dB SPL in various levels of speech-shaped noise [signal-to-noise (S/N) ratios of 30, 20, 10, and 0 dB] are reported. The temporal aspects of speech encoding were analyzed using spectrograms. The responses of the "lower-spontaneous-rate" fibers (less than 20/s) were found to be more limited than those of the high-spontaneous-rate fibers. The lower-spontaneous-rate fibers did not encode noise-only portions of the stimulus at the lowest noise level (S/N = 30 dB) and only responded to the consonant if there was a formant or major spectral peak near its characteristic frequency. The fibers' responses at the higher noise levels were compared to those obtained at the lowest noise level using the covariance as a quantitative measure of signal degradation. The lower-spontaneous-rate fibers were found to preserve more of their initial temporal encoding than high-spontaneous-rate fibers of the same characteristic frequency. The auditory-nerve fibers' responses were also analyzed for rate-place encoding of the stimuli. The results are similar to those found for temporal encoding.  相似文献   

2.
Responses of "high-spontaneous" single auditory-nerve fibers in anesthetized cat to nine different spoken stop and nasal consonant-vowel syllables presented in four different levels of speech-shaped noise are reported. The temporal information contained in the responses was analyzed using "composite" spectrograms and pseudo-3D spatial-frequency plots. Spectral characteristics of both consonant and vowel segments of the CV syllables were strongly encoded at S/N ratios of 30 and 20 dB. At S/N = 10 dB, formant information during the vowel segments was all that was reliably detectable in most cases. Even at S/N = 0 dB, most vowel formants were detectable, but only with relatively long analysis windows (40 ms). The increases (and decreases) in discharge rate during various phases of the responses were also determined. The rate responses to the "release" and to the voicing of the stop-consonant syllables were quite robust, being detectable at least half of the time, even at the highest noise level. Comparisons with psychoacoustic studies using similar stimuli are made.  相似文献   

3.
This study complements earlier experiments on the perception of the [m]-[n] distinction in CV syllables [B. H. Repp, J. Acoust. Soc. Am. 79, 1987-1999 (1986); B. H. Repp, J. Acoust. Soc. Am. 82, 1525-1538 (1987)]. Six talkers produced VC syllables consisting of [m] or [n] preceded by [i, a, u]. In listening experiments, these syllables were truncated from the beginning and/or from the end, or waveform portions surrounding the point of closure were replaced with noise, so as to map out the distribution of the place of articulation information for consonant perception. These manipulations revealed that the vocalic formant transitions alone conveyed about as much place of articulation information as did the nasal murmur alone, and both signal portions were about as informative in VC as in CV syllables. Nevertheless, full VC syllables were less accurately identified than full CV syllables, especially in female speech. The reason for this was hypothesized to be the relative absence of a salient spectral change between the vowel and the murmur in VC syllables. This hypothesis was supported by the relative ineffectiveness of two additional manipulations meant to disrupt the perception of relational spectral information (channel separation or temporal separation of vowel and murmur) and by subjects' poor identification scores for brief excerpts including the point of maximal spectral change. While, in CV syllables, the abrupt spectral change from the murmur to the vowel provides important additional place of articulation information, for VC syllables it seems as if the format transitions in the vowel and the murmur spectrum functioned as independent cues.  相似文献   

4.
The responses of single cat auditory-nerve fibers to naturally spoken voiced sounds (the vowels [a, i, u] and the murmur [m]) presented at normal intensity (70 dB SPL) in various levels of speech-shaped noise were analyzed for the encoding of the glottal-pulse (fundamental) period. To quantify the strength of this fundamental-period encoding, selected segments of the response histograms were autocorrelated, rectified, and fitted with the best-fitting sinusoid of the fundamental frequency. The magnitude of this best-fitting sinusoid was taken as the magnitude of synchronization. In most cases, it was found that the "lower-SR" fibers (those with spontaneous discharge rates less than 20/s) encoded the fundamental periodicity more strongly and more robustly than did the "high-SR" fibers (those with spontaneous discharge rates greater than 20/s). When either a single strong spectral peak or a relatively "flat" spectrum excited a fiber, it showed poor synchronization to the fundamental period, regardless of its spontaneous-rate class. Judging from a few examples, the glottal-pulse synchronization appears to be intensity dependent, with the relative performance of the high-SR fibers improving at lower intensities. A conceptual model is given which accounts for the general characteristics of the data.  相似文献   

5.
This study was designed to characterize the effect of background noise on the identification of syllables using behavioral and electrophysiological measures. Twenty normal-hearing adults (18-30 years) performed an identification task in a two-alternative forced-choice paradigm. Stimuli consisted of naturally produced syllables [da] and [ga] embedded in white noise. The noise was initiated 1000 ms before the onset of the speech stimuli in order to separate the auditory event related potentials (AERP) response to noise onset from that to the speech. Syllables were presented in quiet and in five SNRs: +15, +3, 0, -3, and -6 dB. Results show that (1) performance accuracy, d', and reaction time were affected by the noise, more so for reaction time; (2) both N1 and P3 latency were prolonged as noise levels increased, more so for P3; (3) [ga] was better identified than [da], in all noise conditions; and (4) P3 latency was longer for [da] than for [ga] for SNR 0 through -6 dB, while N1 latency was longer for [ga] than for [da] in most listening conditions. In conclusion, the unique stimuli structure utilized in this study demonstrated the effects of noise on speech recognition at both the physical and the perceptual processing levels.  相似文献   

6.
The present study investigated anticipatory labial coarticulation in the speech of adults and children. CV syllables, composed of [s], [t], and [d] before [i] and [u], were produced by four adult speakers and eight child speakers aged 3-7 years. Each stimulus was computer edited to include only the aperiodic portion of fricative-vowel and stop-vowel syllables. LPC spectra were then computed for each excised segment. Analyses of the effect of the following vowel on the spectral peak associated with the second formant frequency and on the characteristic spectral prominence for each consonant were performed. Perceptual data were obtained by presenting the aperiodic consonantal segments to subjects who were instructed to identify the following vowel as [i] or [u]. Both the acoustic and the perceptual data show strong coarticulatory effects for the adults and comparable, although less consistent, coarticulation in the speech stimuli of the children. The results are discussed in terms of the articulatory and perceptual aspects of coarticulation in language learning.  相似文献   

7.
Voice onset time (VOT) signifies the interval between consonant onset and the start of rhythmic vocal-cord vibrations. Differential perception of consonants such as /d/ and /t/ is categorical in American English, with the boundary generally lying at a VOT of 20-40 ms. This study tests whether previously identified response patterns that differentially reflect VOT are maintained in large-scale population activity within primary auditory cortex (A1) of the awake monkey. Multiunit activity and current source density patterns evoked by the syllables /da/ and /ta/ with variable VOTs are examined. Neural representation is determined by the tonotopic organization. Differential response patterns are restricted to lower best-frequency regions. Response peaks time-locked to both consonant and voicing onsets are observed for syllables with a 40- and 60-ms VOT, whereas syllables with a 0- and 20-ms VOT evoke a single response time-locked only to consonant onset. Duration of aspiration noise is represented in higher best-frequency regions. Representation of VOT and aspiration noise in discrete tonotopic areas of A1 suggest that integration of these phonetic cues occurs in secondary areas of auditory cortex. Findings are consistent with the evolving concept that complex stimuli are encoded by synchronized activity in large-scale neuronal ensembles.  相似文献   

8.
Natural speech consonant-vowel (CV) syllables [( f, s, theta, s, v, z, ?] followed by [i, u, a]) were computer edited to include 20-70 ms of their frication noise in 10-ms steps as measured from their onset, as well as the entire frication noise. These stimuli, and the entire syllables, were presented to 12 subjects for consonant identification. Results show that the listener does not require the entire fricative-vowel syllable in order to correctly perceive a fricative. The required frication duration depends on the particular fricative, ranging from approximately 30 ms for [s, z] to 50 ms for [f, s, v], while [theta, ?] are identified with reasonable accuracy in only the full frication and syllable conditions. Analysis in terms of the linguistic features of voicing, place, and manner of articulation revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frication duration than identification in terms of voicing and manner of articulation.  相似文献   

9.
This paper presents the results of a closed-set recognition task for 64 consonant-vowel sounds (16 C X 4 V, spoken by 18 talkers) in speech-weighted noise (-22,-20,-16,-10,-2 [dB]) and in quiet. The confusion matrices were generated using responses of a homogeneous set of ten listeners and the confusions were analyzed using a graphical method. In speech-weighted noise the consonants separate into three sets: a low-scoring set C1 (/f/, /theta/, /v/, /d/, /b/, /m/), a high-scoring set C2 (/t/, /s/, /z/, /S/, /Z/) and set C3 (/n/, /p/, /g/, /k/, /d/) with intermediate scores. The perceptual consonant groups are C1: {/f/-/theta/, /b/-/v/-/d/, /theta/-/d/}, C2: {/s/-/z/, /S/-/Z/}, and C3: /m/-/n/, while the perceptual vowel groups are /a/-/ae/ and /epsilon/-/iota/. The exponential articulation index (AI) model for consonant score works for 12 of the 16 consonants, using a refined expression of the AI. Finally, a comparison with past work shows that white noise masks the consonants more uniformly than speech-weighted noise, and shows that the AI, because it can account for the differences in noise spectra, is a better measure than the wideband signal-to-noise ratio for modeling and comparing the scores with different noise maskers.  相似文献   

10.
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.  相似文献   

11.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

12.
Two effects of reverberation on the identification of consonants were evaluated for ten normal-hearing subjects: (1) the overlap of energy of a preceding consonant on the following consonant, called "overlap-masking"; and (2) the internal temporal smearing of energy within each consonant, called "self-masking." The stimuli were eight consonants/p,t,k,f,m,n,l,w/. The consonants were spoken in /s-at/context (experiment 1) and generated by a speech synthesizer in /s-at/ and/-at/contexts (experiment 2). In both experiments, identification of consonants was tested in four conditions: (1) quiet, without degradations; (2) with a babble of voices; (3) with noise that was shaped like either natural or synthetic/s/ for the two experiments, respectively; and (4) with room reverberation. The results for the natural and synthetic syllables indicated that the effect of reverberation on identification of consonants following/s/ was not comparable to masking by either the /s/ -spectrum-shaped noise or the babble. In addition, the results for the synthetic syllables indicated that most of the errors in reverberation for the /s-at/context were similar to a sum of errors in two conditions: (1) with /s/-shaped noise causing overlap masking; and (2) with reverberation causing self-masking within each consonant.  相似文献   

13.
Confusion patterns among English consonants were examined using log-linear modeling techniques to assess the influence of low-pass filtering, shaped noise, presentation level, and consonant position. Ten normal-hearing listeners were presented consonant-vowel (CV) and vowel-consonant (VC) syllables containing the vowel /a/. Stimuli were presented in quiet and in noise, and were either filtered or broadband. The noise was shaped such that the effective signal level in each 1/3 octave band was equivalent in quiet and noise listening conditions. Three presentation levels were analyzed corresponding to the overall rms level of the combined speech stimuli. Error patterns were affected significantly by presentation level, filtering, and consonant position as a complex interaction. The effect of filtering was dependent on presentation level and consonant position. The effects stemming from the noise were less pronounced. Specific confusions responsible for these effects were isolated, and an acoustical interaction is suggested, stressing the spectral characteristics of the signals and their modification by presentation level and filtering.  相似文献   

14.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

15.
In contrast to the availability of consonant confusion studies with adults, to date, no investigators have compared children's consonant confusion patterns in noise to those of adults in a single study. To examine whether children's error patterns are similar to those of adults, three groups of children (24 each in 4-5, 6-7, and 8-9 yrs. old) and 24 adult native speakers of American English (AE) performed a recognition task for 15 AE consonants in /ɑ/-consonant-/ɑ/ nonsense syllables presented in a background of speech-shaped noise. Three signal-to-noise ratios (SNR: 0, +5, and +10 dB) were used. Although the performance improved as a function of age, the overall consonant recognition accuracy as a function of SNR improved at a similar rate for all groups. Detailed analyses using phonetic features (manner, place, and voicing) revealed that stop consonants were the most problematic for all groups. In addition, for the younger children, front consonants presented in the 0 dB SNR condition were more error prone than others. These results suggested that children's use of phonetic cues do not develop at the same rate for all phonetic features.  相似文献   

16.
Auditory-nerve fiber spike trains were recorded in response to spoken English stop consonant-vowel syllables, both voiced (/b,d,g/) and unvoiced (/p,t,k/), in the initial position of syllables with the vowels /i,a,u/. Temporal properties of the neural responses and stimulus spectra are displayed in a spectrographic format. The responses were categorized in terms of the fibers' characteristic frequencies (CF) and spontaneous rates (SR). High-CF, high-SR fibers generally synchronize to formants throughout the syllables. High-CF, low/medium-SR fibers may also synchronize to formants; however, during the voicing, there may be sufficient low-frequency energy present to suppress a fiber's synchronized response to a formant near its CF. Low-CF fibers, from both SR groups, synchronize to energy associated with voicing. Several proposed acoustic correlates to perceptual features of stop consonant-vowel syllables, including the initial spectrum, formant transitions, and voice-onset time, are represented in the temporal properties of auditory-nerve fiber responses. Nonlinear suppression affects the temporal features of the responses, particularly those of low/medium-spontaneous-rate fibers.  相似文献   

17.
Frequency resolution was evaluated for two normal-hearing and seven hearing-impaired subjects with moderate, flat sensorineural hearing loss by measuring percent correct detection of a 2000-Hz tone as the width of a notch in band-reject noise increased. The level of the tone was fixed for each subject at a criterion performance level in broadband noise. Discrimination of synthetic speech syllables that differed in spectral content in the 2000-Hz region was evaluated as a function of the notch width in the same band-reject noise. Recognition of natural speech consonant/vowel syllables in quiet was also tested; results were analyzed for percent correct performance and relative information transmitted for voicing and place features. In the hearing-impaired subjects, frequency resolution at 2000 Hz was significantly correlated with the discrimination of synthetic speech information in the 2000-Hz region and was not related to the recognition of natural speech nonsense syllables unless (a) the speech stimuli contained the vowel /i/ rather than /a/, and (b) the score reflected information transmitted for place of articulation rather than percent correct.  相似文献   

18.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

19.
Native American English and non-native (Dutch) listeners identified either the consonant or the vowel in all possible American English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0, 8, and 16 dB). The phoneme identification performance of the non-native listeners was less accurate than that of the native listeners. All listeners were adversely affected by noise. With these isolated syllables, initial segments were harder to identify than final segments. Crucially, the effects of language background and noise did not interact; the performance asymmetry between the native and non-native groups was not significantly different across signal-to-noise ratios. It is concluded that the frequently reported disproportionate difficulty of non-native listening under disadvantageous conditions is not due to a disproportionate increase in phoneme misidentifications.  相似文献   

20.
Maturation of the traveling-wave delay in the human cochlea   总被引:1,自引:0,他引:1  
The maturation of the traveling-wave delay in the human cochlea was investigated in 227 subjects ranging in age from 29 weeks conceptional age to 49 years by using frequency specific auditory brain-stem responses (ABRs). The derived response technique was applied to ABRs obtained with click stimuli (presented at a fixed level equal to 60-dB sensation level in normal hearing adults) in the presence of high-pass noise masking (slope 96 dB/oct) to obtain frequency specific responses from octave-wide bands. The estimate of traveling-wave delay was obtained by taking the difference between wave I latencies from adjacent derived bands. It was found that the traveling-wave delay between the octave band with center frequency (CF) of 11.3 kHz and that with CF of 5.7 kHz decreased (about 0.4 ms on average) in exponential fashion with age to reach adult values at 3-6 months of age. This decrease was in agreement with reported data in kitten auditory-nerve fibers. The traveling-wave delays between adjacent octave bands with successive lower CF did not change with age.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号