首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There exists no clear understanding of the importance of spectral tilt for perception of stop consonants. It is hypothesized that spectral tilt may be particularly salient when formant patterns are ambiguous or degraded. Here, it is demonstrated that relative change in spectral tilt over time, not absolute tilt, significantly influences perception of /b/ vs /d/. Experiments consisted of burstless synthesized stimuli that varied in spectral tilt and onset frequency of the second formant. In Experiment 1, tilt of the consonant at voice onset was varied. In Experiment 2, tilt of the vowel steady state was varied. Results of these experiments were complementary and revealed a significant contribution of relative spectral tilt change only when formant information was ambiguous. Experiments 3 and 4 replicated Experiments 1 and 2 in an /aba/-/ada/ context. The additional tilt contrast provided by the initial vowel modestly enhanced effects. In Experiment 5, there was no effect for absolute tilt when consonant and vowel tilts were identical. Consistent with earlier studies demonstrating contrast between successive local spectral features, perceptual effects of gross spectral characteristics are likewise relative. These findings have implications for perception in nonlaboratory environments and for listeners with hearing impairment.  相似文献   

2.
The phonetic identification ability of an individual (SS) who exhibits the best, or equal to the best, speech understanding of patients using the Symbion four-channel cochlear implant is described. It has been found that SS: (1) can use aspects of signal duration to form categories that are isomorphic with the phonetic categories established by listeners with normal auditory function; (2) can combine temporal and spectral cues in a normal fashion to form categories; (3) can use aspects of fricative noises to form categories that correspond to normal phonetic categories; (4) uses information from both F1 and higher formants in vowel identification; and (5) appears to identify stop consonant place of articulation on the basis of information provided by the center frequency of the burst and by the abruptness of frequency change following signal onset. SS has difficulty identifying stop consonants from the information provided by formant transitions and cannot differentially identify signals that have identical F1's and relatively low-frequency F2's. SS's performance suggests that simple speech processing strategies (filtering of the signal into four bands) and monopolar electrode design are viable options in the design of cochlear prostheses.  相似文献   

3.
This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary.  相似文献   

4.
5.
This study reassessed the role of the nasal murmur and formant transitions as perceptual cues for place of articulation in nasal consonants across a number of vowel environments. Five types of computer-edited stimuli were generated from natural utterances consisting of [m n] followed by [i e a o u]: (1) full murmurs; (2) transitions plus vowel segments; (3) the last six pulses of the murmur; (4) the six pulses starting from the beginning of the formant transitions; and (5) the six pulses surrounding the nasal release (three pulses before and three pulses after). Results showed that the murmur provided as much information for the perception of place of articulation as did the transitions. Moreover, the highest performance scores for place of articulation were obtained in the six-pulse condition containing both murmur and transition information. The data support the view that it is the combination of nasal murmur plus formant transitions which forms an integrated property for the perception of place of articulation.  相似文献   

6.
Five commonly used methods for determining the onset of voicing of syllable-initial stop consonants were compared. The speech and glottal activity of 16 native speakers of Cantonese with normal voice quality were investigated during the production of consonant vowel (CV) syllables in Cantonese. Syllables consisted of the initial consonants /ph/, /th/, /kh/, /p/, /t/, and /k/ followed by the vowel /a/. All syllables had a high level tone, and were all real words in Cantonese. Measurements of voicing onset were made based on the onset of periodicity in the acoustic waveform, and on spectrographic measures of the onset of a voicing bar (f0), the onset of the first formant (F1), second formant (F2), and third formant (F3). These measurements were then compared against the onset of glottal opening as determined by electroglottography. Both accuracy and variability of each measure were calculated. Results suggest that the presence of aspiration in a syllable decreased the accuracy and increased the variability of spectrogram-based measurements, but did not strongly affect measurements made from the acoustic waveform. Overall, the acoustic waveform provided the most accurate estimate of voicing onset; measurements made from the amplitude waveform were also the least variable of the five measures. These results can be explained as a consequence of differences in spectral tilt of the voicing source in breathy versus modal phonation.  相似文献   

7.
According to recent theoretical accounts of place of articulation perception, global, invariant properties of the stop CV syllable onset spectrum serve as primary, innate cues to place of articulation, whereas contextually variable formant transitions constitute secondary, learned cues. By this view, one might expect that young infants would find the discrimination of place of articulation contrasts signaled by formant transition differences more difficult than those cued by gross spectral differences. Using an operant head-turning paradigm, we found that 6-month-old infants were able to discriminate two-formant stimuli contrasting in place of articulation as well as they did five-formant + burst stimuli. Apparently, neither the global properties of the onset spectrum nor simply the additional acoustic information contained in the five-formant + burst stimuli afford the infant any advantage in the discrimination task. Rather, formant transition information provides a sufficient basis for discriminating place of articulation differences.  相似文献   

8.
A digital processing method is described for altering spectral contrast (the difference in amplitude between spectral peaks and valleys) in natural utterances. Speech processed with programs implementing the contrast alteration procedure was presented to listeners with moderate to severe sensorineural hearing loss. The task was a three alternative (/b/,/d/, or /g/) stop consonant identification task for consonants at a fixed location in short nonsense utterances. Overall, tokens with enhanced contrast showed moderate gains in percentage correct stop consonant identification when compared to unaltered tokens. Conversely, reducing spectral contrast generally reduced percent correct stop consonant identification. Contrast alteration effects were inconsistent for utterances containing /d/. The observed contrast effects also interacted with token intelligibility.  相似文献   

9.
Previous research with speechlike signals has suggested that upward spread of masking from the first formant (F 1) may interfere with the identification of place of articulation information signaled by changes in the upper formants. This suggestion was tested by presenting two-formant stop consonant--vowel syllables varying along a/ba--/da/--/ga/ continuum to hearing-impaired listeners grouped according to etiological basis of the disorder. The syllables were presented monaurally at 80 dB and 100 dB SPL when formant amplitudes were equal and when F 1 amplitude was reduced by 6, 12, and 18 dB. Noise-on-tone masking patterns were also generated using narrow bands of noise at 80 and 100 dB SPL to assess the extent of upward spread of masking. Upward spread of masking could be demonstrated in both speech and nonspeech tasks, irrespective of the subject's age, audiometric configuration, or etiology of hearing impairment. Attenuation of F 1 had different effects on phonetic identification in different subject groups: While listeners with noise-induced hearing loss showed substantial improvement in identifying place of articulation, upward spread of masking did not consistently account for poor place identification in other types of sensorineural hearing impairment.  相似文献   

10.
In VCV nonsense forms (such as /epsilondepsilon/, while both the CV transition and the VC transition are perceptible in isolation, the CV transition dominates identification of the stop consonant. Thus, the question arises, what role, if any, do VC transitions play in word perception? Stimuli were two-syllable English words in which the medial consonant was either a stop or a fricative (e.g., "feeding" and "gravy"). Each word was constructed in three ways: (1) the VC transition was incompatible with the CV in either place, manner of articulation, or both; (2) the VC transition was eliminated and the steady-state portion of first vowel was substituted in its place; and (3) the original word. All versions of a particular word were identical with respect to duration, pitch contour, and amplitude envelope. While an intelligibility test revealed no differences among the three conditions, data from a paired comparison preference task and an unspeeded lexical decision task indicated that incompatible VC transitions hindered word perception, but lack of VC transitions did not. However, there were clear differences among the three conditions in the speeded lexical decision task for word stimuli, but not for nonword stimuli that were constructed in an analogous fashion. We discuss the use of lexical tasks for speech quality assessment and possible processes by which listeners recognize spoken words.  相似文献   

11.
Vowel and consonant confusion matrices were collected in the hearing alone (H), lipreading alone (L), and hearing plus lipreading (HL) conditions for 28 patients participating in the clinical trial of the multiple-channel cochlear implant. All patients were profound-to-totally deaf and "hearing" refers to the presentation of auditory information via the implant. The average scores were 49% for vowels and 37% for consonants in the H condition and the HL scores were significantly higher than the L scores. Information transmission and multidimensional scaling analyses showed that different speech features were conveyed at different levels in the H and L conditions. In the HL condition, the visual and auditory signals provided independent information sources for each feature. For vowels, the auditory signal was the major source of duration information, while the visual signal was the major source of first and second formant frequency information. The implant provided information about the amplitude envelope of the speech and the estimated frequency of the main spectral peak between 800 and 4000 Hz, which was useful for consonant recognition. A speech processor that coded the estimated frequency and amplitude of an additional peak between 300 and 1000 Hz was shown to increase the vowel and consonant recognition in the H condition by improving the transmission of first formant and voicing information.  相似文献   

12.
The addition of low-passed (LP) speech or even a tone following the fundamental frequency (F0) of speech has been shown to benefit speech recognition for cochlear implant (CI) users with residual acoustic hearing. The mechanisms underlying this benefit are still unclear. In this study, eight bimodal subjects (CI users with acoustic hearing in the non-implanted ear) and eight simulated bimodal subjects (using vocoded and LP speech) were tested on vowel and consonant recognition to determine the relative contributions of acoustic and phonetic cues, including F0, to the bimodal benefit. Several listening conditions were tested (CI/Vocoder, LP, T(F0-env), CI/Vocoder + LP, CI/Vocoder + T(F0-env)). Compared with CI/Vocoder performance, LP significantly enhanced both consonant and vowel perception, whereas a tone following the F0 contour of target speech and modulated with an amplitude envelope of the maximum frequency of the F0 contour (T(F0-env)) enhanced only consonant perception. Information transfer analysis revealed a dual mechanism in the bimodal benefit: The tone representing F0 provided voicing and manner information, whereas LP provided additional manner, place, and vowel formant information. The data in actual bimodal subjects also showed that the degree of the bimodal benefit depended on the cutoff and slope of residual acoustic hearing.  相似文献   

13.
This study investigated whether any perceptually useful coarticulatory information is carried by the release burst of the first of two successive, nonhomorganic stop consonants. The CV portions of natural VCCV utterances were replaced with matched synthetic stimuli from a continuum spanning the three places of stop articulation. There was a sizable effect of coarticulatory cues in the natural-speech portion on the perception of the second stop consonant. Moreover, when the natural VC portions including the final release burst were presented in isolation, listeners were significantly better than chance at guessing the identity of the following, "missing" syllable-initial stop. The hypothesis that the release burst of a syllable-final stop contains significant coarticulatory information about the place of articulation of a following, nonhomorganic stop was further confirmed in acoustic analyses which revealed significant effects of CV context on the spectral properties of the release bursts. The relationship between acoustic stimulus properties and listeners' perceptual responses was not straightforward, however.  相似文献   

14.
Effect of spectral envelope smearing on speech reception. I.   总被引:2,自引:0,他引:2  
The effect of reduced spectral contrast on the speech-reception threshold (SRT) for sentences in noise and on phoneme identification, was investigated with 16 normal-hearing subjects. Signal processing was performed by smoothing the envelope of the squared short-time fast Fourier transform (FFT) by convolving it with a Gaussian-shaped filter, and overlapping additions to reconstruct a continuous signal. Spectral energy in the frequency region from 100 to 8000 Hz was smeared over bandwidths of 1/8, 1/4, 1/3, 1/2, 1, 2, and 4 oct for the SRT experiment. Vowel and consonant identification was studied for smearing bandwidths of 1/8, 1/2, and 2 oct. Results showed the SRT in noise to increase as the spectral energy was smeared over bandwidths exceeding the ear's critical bandwidth. Vowel identification suffered more from this type of processing than consonant identification. Vowels were primarily confused with the back vowels /c,u/, and consonants were confused where place of articulation is concerned.  相似文献   

15.
An important problem in speech perception is to determine how humans extract the perceptually invariant place of articulation information in the speech wave across variable acoustic contexts. Although analyses have been developed that attempted to classify the voiced stops /b/ versus /d/ from stimulus onset information, most of the human perceptual research to date suggests that formant transition information is more important than onset information. The purpose of the present study was to determine if animal subjects, specifically Japanese macaque monkeys, are capable of categorizing /b/ versus /d/ in synthesized consonant-vowel (CV) syllables using only formant transition information. Three monkeys were trained to differentiate CV syllables with a "go-left" versus a "go-right" label. All monkeys first learned to differentiate a /za/ versus /da/ manner contrast and easily transferred to three new vowel contexts /[symbol: see text], epsilon, I/. Next, two of the three monkeys learned to differentiate a /ba/ versus /da/ stop place contrast, but were unable to transfer it to the different vowel contexts. These results suggest that animals may not use the same mechanisms as humans do for classifying place contrasts, and call for further investigation of animal perception of formant transition information versus stimulus onset information in place contrasts.  相似文献   

16.
The purpose of this study was to determine the role of static, dynamic, and integrated cues for perception in three adult age groups, and to determine whether age has an effect on both consonant and vowel perception, as predicted by the "age-related deficit hypothesis." Eight adult subjects in each of the age ranges of young (ages 20-26), middle aged (ages 52-59), and old (ages 70-76) listened to synthesized syllables composed of combinations of [b d g] and [i u a]. The synthesis parameters included manipulations of the following stimulus variables: formant transition (moving or straight), noise burst (present or absent), and voicing duration (10, 30, or 46 ms). Vowel perception was high across all conditions and there were no significant differences among age groups. Consonant identification showed a definite effect of age. Young and middle-aged adults were significantly better than older adults at identifying consonants from secondary cues only. Older adults relied on the integration of static and dynamic cues to a greater extent than younger and middle-aged listeners for identification of place of articulation of stop consonants. Duration facilitated correct stop-consonant identification in the young and middle-aged groups for the no-burst conditions, but not in the old group. These findings for the duration of stop-consonant transitions indicate reductions in processing speed with age. In general, the results did not support the age-related deficit hypothesis for adult identification of vowels and consonants from dynamic spectral cues.  相似文献   

17.
This study focuses on the initial component of the stop consonant release burst, the release transient. In theory, the transient, because of its impulselike source, should contain much information about the vocal tract configuration at release, but it is usually weak in intensity and difficult to isolate from the accompanying frication in natural speech. For this investigation, a human talker produced isolated release transients of /b,d,g/ in nine vocalic contexts by whispering these syllables very quietly. He also produced the corresponding CV syllables with regular phonation for comparison. Spectral analyses showed the isolated transients to have a clearly defined formant structure, which was not seen in natural release bursts, whose spectra were dominated by the frication noise. The formant frequencies varied systematically with both consonant place of articulation and vocalic context. Perceptual experiments showed that listeners can identify both consonants and vowels from isolated transients, though not very accurately. Knowing one of the two segments in advance did not help, but when the transients were followed by a compatible synthetic, steady-state vowel, consonant identification improved somewhat. On the whole, isolated transients, despite their clear formant structure, provided only partial information for consonant identification, but no less so, it seems, than excerpted natural release bursts. The information conveyed by artificially isolated transients and by natural (frication-dominated) release bursts appears to be perceptually equivalent.  相似文献   

18.
Responses of chinchilla auditory nerve fibers to synthesized stop consonant syllables differing in voice-onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and chinchilla subjects. Synchronized discharge rates of neurons tuned to frequencies near the first formant increased at the onset of voicing for VOTs longer than 20 ms. Stimulus components near the formant or the neuron's characteristic frequency accounted for the increase. In these neurons, synchronized response changes were closely related to the same neuron's average discharge rates [D. G. Sinex and L. P. McDonald, J. Acoust. Soc. Am. 83, 1817-1827 (1988)]. Neurons tuned to frequency regions near the second and third formants usually responded to components near the second formant prior to the onset of voicing. These neurons' synchronized discharges could be captured by the first formant at the onset of voicing or with a latency of 50-60 ms, whichever was later. Since these neurons' average rate responses were unaffected by the onset of voicing, the latency of the synchronized response did provide as much additional neural cue to VOT. Overall, however, discharge synchrony did not provide as much information about VOT as was provided by the best average rate responses. The results are compared to other measurements of the peripheral encoding of speech sounds and to aspects of VOT perception.  相似文献   

19.
Responses of chinchilla auditory-nerve fibers to synthesized stop consonants differing in voice onset time (VOT) were obtained. The syllables, heard as /ga/-/ka/ or /da/-/ta/, were similar to those previously used by others in psychophysical experiments with human and with chinchilla subjects. Average discharge rates of neurons tuned to the frequency region near the first formant generally increased at the onset of voicing, for VOTs longer than 20 ms. These rate increases were closely related to spectral amplitude changes associated with the onset of voicing and with the activation of the first formant; as a result, they provided accurate information about VOT. Neurons tuned to frequency regions near the second and third formants did not encode VOT in their average discharge rates. Modulations in the average rates of these neurons reflected spectral variations that were independent of VOT. The results are compared to other measurements of the peripheral encoding of speech sounds and to psychophysical observations suggesting that syllables with large variations in VOT are heard as belonging to one of only two phonemic categories.  相似文献   

20.
The third formant and the second formant were found on average to cue the place of articulation of intervocalic stop consonants equally well when the stop consonants occurred before the vowel/i/. This result and others provide some support for the notion that the fundamental resonance of the front cavity plays an important role in the perception of the phonetic dimension of place of articulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号