首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There exists no clear understanding of the importance of spectral tilt for perception of stop consonants. It is hypothesized that spectral tilt may be particularly salient when formant patterns are ambiguous or degraded. Here, it is demonstrated that relative change in spectral tilt over time, not absolute tilt, significantly influences perception of /b/ vs /d/. Experiments consisted of burstless synthesized stimuli that varied in spectral tilt and onset frequency of the second formant. In Experiment 1, tilt of the consonant at voice onset was varied. In Experiment 2, tilt of the vowel steady state was varied. Results of these experiments were complementary and revealed a significant contribution of relative spectral tilt change only when formant information was ambiguous. Experiments 3 and 4 replicated Experiments 1 and 2 in an /aba/-/ada/ context. The additional tilt contrast provided by the initial vowel modestly enhanced effects. In Experiment 5, there was no effect for absolute tilt when consonant and vowel tilts were identical. Consistent with earlier studies demonstrating contrast between successive local spectral features, perceptual effects of gross spectral characteristics are likewise relative. These findings have implications for perception in nonlaboratory environments and for listeners with hearing impairment.  相似文献   

2.
Among the most influential publications in speech perception is Liberman, Delattre, and Cooper's [Am. J. Phys. 65, 497-516 (1952)] report on the identification of synthetic, voiceless stops generated by the Pattern Playback. Their map of stop consonant identification shows a highly complex relationship between acoustics and perception. This complex mapping poses a challenge to many classes of relatively simple pattern recognition models which are unable to capture the original finding of Liberman et al. that identification of /k/ was bimodal for bursts preceding front vowels but otherwise unimodal. A replication of this experiment was conducted in an attempt to reproduce these identification patterns using a simulation of the Pattern Playback device. Examination of spectrographic data from stimuli generated by the Pattern Playback revealed additional spectral peaks that are consistent with harmonic distortion characteristic of tube amplifiers of that era. Only when harmonic distortion was introduced did bimodal /k/ responses in front-vowel context emerge. The acoustic consequence of this distortion is to add, e.g., a high-frequency peak to midfrequency bursts or a midfrequency peak to a low-frequency burst. This likely resulted in additional /k/ responses when the second peak approximated the second formant of front vowels. Although these results do not challenge the main observations made by Liberman et al. that perception of stop bursts is context dependent, they do show that the mapping from acoustics to perception is much less complex without these additional distortion products.  相似文献   

3.
We have examined the effects of the relative amplitude of the release burst on perception of the place of articulation of utterance-initial voiceless and voiced stop consonants. The amplitude of the burst, which occurs within the first 10-15 ms following consonant release, was systematically varied in 5-dB steps from -10 to +10 dB relative to a "normal" burst amplitude for two labial-to-alveolar synthetic speech continua--one comprising voiceless stops and the other, voiced stops. The distribution of spectral energy in the bursts for the labial and alveolar stops at the ends of the continuum was consistent with the spectrum shapes observed in natural utterances, and intermediate shapes were used for intermediate stimuli on the continuum. The results of identification tests with these stimuli showed that the relative amplitude of the burst significantly affected the perception of the place of articulation of both voiceless and voiced stops, but the effect was greater for the former than the latter. The results are consistent with a view that two basic properties contribute to the labial-alveolar distinction in English. One of these is determined by the time course of the change in amplitude in the high-frequency range (above 2500 Hz) in the few tens of ms following consonantal release, and the other is determined by the frequencies of spectral peaks associated with the second and third formants in relation to the first formant.  相似文献   

4.
5.
Previous work has shown that the lips are moving at a high velocity when the oral closure occurs for bilabial stop consonants, resulting in tissue compression and mechanical interactions between the lips. The present experiment recorded tongue movements in four subjects during the production of velar and alveolar stop consonants to examine kinematic events before, during, and after the stop closure. The results show that, similar to the lips, the tongue is often moving at a high velocity at the onset of closure. The tongue movements were more complex, with both horizontal and vertical components. Movement velocity at closure and release were influenced by both the preceding and the following vowel. During the period of oral closure, the tongue moved through a trajectory of usually less than 1 cm; again, the magnitude of the movement was context dependent. Overall, the tongue moved in forward-backward curved paths. The results are compatible with the idea that the tongue is free to move during the closure as long as an airtight seal is maintained. A new interpretation of the curved movement paths of the tongue in speech is also proposed. This interpretation is based on the principle of cost minimization that has been successfully applied in the study of hand movements in reaching.  相似文献   

6.
The speech production skills of 12 dysphasic children and of 12 normal children were compared. The dysphasic children were found to have significantly greater difficulty than the normal children in producing stop consonants. In addition, it was found that seven of the dysphasic children, who had difficulty in perceiving initial stop consonants, had greater difficulty in producing stop consonants than the remaining five dysphasic children who showed no such perceptual difficulty. A detailed phonetic analysis indicated that the dysphasic children seldom omitted stops or substituted nonstop for stop consonants. Instead, their errors were predominantly of voicing or place of articulation. Acoustic analyses suggested that the voicing errors were related to lack of precise control over the timing of speech events, specifically, voice onset time for initial stops and vowel duration preceding final stops. The number of voicing errors on final stops, however, was greater than expected on the basis of lack of differentiation of vowel duration alone. They appeared also to be related to a tendency in the dysphasic children to produce final stops with exaggerated aspiration. The possible relationship of poor timing control in speech production in these children and auditory temporal processing deficits in speech perception is discussed.  相似文献   

7.
8.
Measurements were made of saggital plane movements of the larynx, soft palate, and portions of the tongue, from a high-speed cinefluorographic film of utterances produced by one adult male speaker of American English. These measures were then used to approximate the temporal variations in supraglottal cavity volume during the closures of voiced and voiceless stop consonants. All data were subsequently related to a synchronous acoustic recording of the utterances. Instances of /p,t,k/ were always accompanied by silent closures, and sometimes accompanied by decreases in supraglottal volume. In contrast, instances of /b,d,g/ were always accompanied both by significant intervals of vocal fold vibration during closure, and relatively large increases in supraglottal volume. However, the magnitudes of volume increments during the voiced stops, and the means by which those increments were achieved, differed considerably across place of articulation and phonetic environment. These results are discussed in the context of a well-known model of the breath-stream control mechanism, and their relevance for a general theory of speech motor control is considered.  相似文献   

9.
Comodulation masking release (CMR) refers to an improvement in the detection threshold of a signal masked by noise with coherent amplitude fluctuation across frequency, as compared to noise without the envelope coherence. The present study tested whether such an advantage for signal detection would facilitate the identification of speech phonemes. Consonant identification of bandpass speech was measured under the following three masker conditions: (1) a single band of noise in the speech band ("on-frequency" masker); (2) two bands of noise, one in the on-frequency band and the other in the "flanking band," with coherence of temporal envelope fluctuation between the two bands (comodulation); and (3) two bands of noise (on-frequency band and flanking band), without the coherence of the envelopes (noncomodulation). A pilot experiment with a small number of consonant tokens was followed by the main experiment with 12 consonants and the following masking conditions: three frequency locations of the flanking band and two masker levels. Results showed that in all conditions, the comodulation condition provided higher identification scores than the noncomodulation condition, and the difference in score was 3.5% on average. No significant difference was observed between the on-frequency only condition and the comodulation condition, i.e., an "unmasking" effect by the addition of a comodulated flaking band was not observed. The positive effect of CMR on consonant recognition found in the present study endorses a "cued-listening" theory, rather than an envelope correlation theory, as a basis of CMR in a suprathreshold task.  相似文献   

10.
This study examines English speakers' relative weighting of two voicing cues in production and perception. Participants repeated words differing in initial consonant voicing ([b] or [p]) and labeled synthesized tokens ranging between [ba] and [pa] orthogonally according to voice onset time (VOT) and onset f0. Discriminant function analysis and logistic regression were used to calculate individuals' relative weighting of each cue. Production results showed a significant negative correlation of VOT and onset f0, while perception results showed a trend toward a positive correlation. No significant correlations were found across perception and production, suggesting a complex relationship between the two domains.  相似文献   

11.
In VCV nonsense forms (such as /epsilondepsilon/, while both the CV transition and the VC transition are perceptible in isolation, the CV transition dominates identification of the stop consonant. Thus, the question arises, what role, if any, do VC transitions play in word perception? Stimuli were two-syllable English words in which the medial consonant was either a stop or a fricative (e.g., "feeding" and "gravy"). Each word was constructed in three ways: (1) the VC transition was incompatible with the CV in either place, manner of articulation, or both; (2) the VC transition was eliminated and the steady-state portion of first vowel was substituted in its place; and (3) the original word. All versions of a particular word were identical with respect to duration, pitch contour, and amplitude envelope. While an intelligibility test revealed no differences among the three conditions, data from a paired comparison preference task and an unspeeded lexical decision task indicated that incompatible VC transitions hindered word perception, but lack of VC transitions did not. However, there were clear differences among the three conditions in the speeded lexical decision task for word stimuli, but not for nonword stimuli that were constructed in an analogous fashion. We discuss the use of lexical tasks for speech quality assessment and possible processes by which listeners recognize spoken words.  相似文献   

12.
Spoken communication in a non-native language is especially difficult in the presence of noise. This study compared English and Spanish listeners' perceptions of English intervocalic consonants as a function of masker type. Three maskers (stationary noise, multitalker babble, and competing speech) provided varying amounts of energetic and informational masking. Competing English and Spanish speech maskers were used to examine the effect of masker language. Non-native performance fell short of that of native listeners in quiet, but a larger performance differential was found for all masking conditions. Both groups performed better in competing speech than in stationary noise, and both suffered most in babble. Since babble is a less effective energetic masker than stationary noise, these results suggest that non-native listeners are more adversely affected by both energetic and informational masking. A strong correlation was found between non-native performance in quiet and degree of deterioration in noise, suggesting that non-native phonetic category learning can be fragile. A small effect of language background was evident: English listeners performed better when the competing speech was Spanish.  相似文献   

13.
Signal design in cf-bats is hypothesized to be commensurate with the evaluation of time-variant echo parameters, imposed by changes in the sound channel occurring as the bat flies by a target. Two such parameters, the proportional changes in Doppler frequency and sound pressure amplitude, are surveyed, employing a simplified acoustic model in order to assess their fitness for target localization given a translational movement within a plane. This is accomplished by considering the properties of the scalar fields given by the value of these putative sensory variables as a function of position in a plane. The considered criteria are: existence and extent of ambiguity areas (i.e., multiple solutions for target position), magnitude of the variables (relevant with regard to perceptual thresholds), as well as magnitude and orthogonality of the gradients (relevant to localization accuracy). It is concluded that these properties render the considered variables compatible with gross judgements of target position. This may be sufficient for behavioral contexts like obstacle avoidance, where adoption of suitable safety margins could compensate for the variance and bias associated with estimates of target location.  相似文献   

14.
The purpose of this experiment was to evaluate the utilization of short-term spectral cues for recognition of initial plosive consonants (/b,d,g/) by normal-hearing and by hearing-impaired listeners differing in audiometric configuration. Recognition scores were obtained for these consonants paired with three vowels (/a,i,u/) while systematically reducing the duration (300 to 10 ms) of the synthetic consonant-vowel syllables. Results from 10 normal-hearing and 15 hearing-impaired listeners suggest that audiometric configuration interacts in a complex manner with the identification of short-duration stimuli. For consonants paired with the vowels /a/ and /u/, performance deteriorated as the slope of the audiometric configuration increased. The one exception to this result was a subject who had significantly elevated pure-tone thresholds relative to the other hearing-impaired subjects. Despite the changes in the shape of the onset spectral cues imposed by hearing loss, with increasing duration, consonant recognition in the /a/ and /u/ context for most hearing-impaired subjects eventually approached that of the normal-hearing listeners. In contrast, scores for consonants paired with /i/ were poor for a majority of hearing-impaired listeners for stimuli of all durations.  相似文献   

15.
Fundamental frequency (F0) and voice onset time (VOT) were measured in utterances containing voiceless aspirated [ph, th, kh], voiceless unaspirated [sp, st, sk], and voiced [b, d, g] stop consonants produced in the context of [i, e, u, o, a] by 8- to 9-year-old subjects. The results revealed that VOT reliably differentiated voiceless aspirated from voiceless unaspirated and voiced stops, whereas F0 significantly contrasted voiced with voiceless aspirated and unaspirated stops, except for the first glottal period, where voiceless unaspirated stops contrasted with the other two categories. Fundamental frequency consistently differentiated vowel height in alveolar and velar stop consonant environments only. In comparing the results of these children and of adults, it was observed that the acoustic correlates of stop consonant voicing and vowel quality were different not only in absolute values, but also in terms of variability. Further analyses suggested that children were more variable in production due to inconsistency in achieving specific targets. The findings also suggest that, of the acoustic correlates of the voicing feature, the primary distinction of VOT is strongly developed by 8-9 years of age, whereas the secondary distinction of F0 is still in an emerging state.  相似文献   

16.
Previous research in cross-language perception has shown that non-native listeners often assimilate both single phonemes and phonotactic sequences to native language categories. This study examined whether associating meaning with words containing non-native phonotactics assists listeners in distinguishing the non-native sequences from native ones. In the first experiment, American English listeners learned word-picture pairings including words that contained a phonological contrast between CC and CVC sequences, but which were not minimal pairs (e.g., [ftake], [ftalu]). In the second experiment, the word-picture pairings specifically consisted of minimal pairs (e.g., [ftake], [ftake]). Results showed that the ability to learn non-native CC was significantly improved when listeners learned minimal pairs as opposed to phonological contrast alone. Subsequent investigation of individual listeners revealed that there are both high and low performing participants, where the high performers were much more capable of learning the contrast between native and non-native words. Implications of these findings for second language lexical representations and loanword adaptation are discussed.  相似文献   

17.
This study assessed the extent to which second-language learners are sensitive to phonetic information contained in visual cues when identifying a non-native phonemic contrast. In experiment 1, Spanish and Japanese learners of English were tested on their perception of a labial/ labiodental consonant contrast in audio (A), visual (V), and audio-visual (AV) modalities. Spanish students showed better performance overall, and much greater sensitivity to visual cues than Japanese students. Both learner groups achieved higher scores in the AV than in the A test condition, thus showing evidence of audio-visual benefit. Experiment 2 examined the perception of the less visually-salient /1/-/r/ contrast in Japanese and Korean learners of English. Korean learners obtained much higher scores in auditory and audio-visual conditions than in the visual condition, while Japanese learners generally performed poorly in both modalities. Neither group showed evidence of audio-visual benefit. These results show the impact of the language background of the learner and visual salience of the contrast on the use of visual cues for a non-native contrast. Significant correlations between scores in the auditory and visual conditions suggest that increasing auditory proficiency in identifying a non-native contrast is linked with an increasing proficiency in using visual cues to the contrast.  相似文献   

18.
This study examined the ability of six-month-old infants to recognize the perceptual similarity of syllables sharing a phonetic segment when variations were introduced in phonetic environment and talker. Infants in a "phonetic" group were visually reinforced for head turns when a change occurred from a background category of labial nasals to a comparison category of alveolar nasals . The infants were initially trained on a [ma]-[na] contrast produced by a male talker. Novel tokens differing in vowel environment and talker were introduced over several stages of increasing complexity. In the most complex stage infants were required to make a head turn when a change occurred from [ma,mi,mu] to [na,ni,nu], with the tokens in each category produced by both male and female talkers. A " nonphonetic " control group was tested using the same pool of stimuli as the phonetic condition. The only difference was that the stimuli in the background and comparison categories were chosen in such a way that the sounds could not be organized by acoustic or phonetic characteristics. Infants in the phonetic group transferred training to novel tokens produced by different talkers and in different vowel contexts. However, infants in the nonphonetic control group had difficulty learning the phonetically unrelated tokens that were introduced as the experiment progressed. These findings suggest that infants recognize the similarity of nasal consonants sharing place of articulation independent of variation in talker and vowel context.  相似文献   

19.
Ten patients who use the Ineraid cochlear implant were tested on a consonant identification task. The stimuli were 16 consonants in the "aCa" environment. The patients who scored greater than 60 percent correct were found to have high feature information scores for amplitude envelope features and for features requiring the detection of high-frequency energy. The patients who scored less than 60 percent correct exhibited lower scores for all features of the signal. The difference in performance between the two groups of patients may be due, at least in part, to differences in the detection or resolution of high-frequency components in the speech signal.  相似文献   

20.
Interaction of Korean and English stop systems in Korean-English bilinguals as a function of age of acquisition (AOA) of English was investigated. It was hypothesized that early bilinguals (mean AOA=3.8 years) would more likely be native-like in production of English and Korean stops and maintain greater independence between Korean and English stop systems than late bilinguals (mean AOA=21.4 years). Production of Korean and English stops was analyzed in terms of three acoustic-phonetic properties: voice-onset time, amplitude difference between the first two harmonics, and fundamental frequency. Late bilinguals were different from English monolinguals for English voiceless and voiced stops in all three properties. As for Korean stops, late bilinguals were different from Korean monolinguals for fortis stops in voice-onset time. Early bilinguals were not different from the monolinguals of either language. Considering the independence of the two stop systems, late bilinguals seem to have merged English voiceless and Korean aspirated stops and produced English voiced stops with similarities to both Korean fortis and lenis stops, whereas early bilinguals produced five distinct stop types. Thus, the early bilinguals seem to have two independent stop systems, whereas the late bilinguals likely have a merged Korean-English system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号