首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The phonetic identification ability of an individual (SS) who exhibits the best, or equal to the best, speech understanding of patients using the Symbion four-channel cochlear implant is described. It has been found that SS: (1) can use aspects of signal duration to form categories that are isomorphic with the phonetic categories established by listeners with normal auditory function; (2) can combine temporal and spectral cues in a normal fashion to form categories; (3) can use aspects of fricative noises to form categories that correspond to normal phonetic categories; (4) uses information from both F1 and higher formants in vowel identification; and (5) appears to identify stop consonant place of articulation on the basis of information provided by the center frequency of the burst and by the abruptness of frequency change following signal onset. SS has difficulty identifying stop consonants from the information provided by formant transitions and cannot differentially identify signals that have identical F1's and relatively low-frequency F2's. SS's performance suggests that simple speech processing strategies (filtering of the signal into four bands) and monopolar electrode design are viable options in the design of cochlear prostheses.  相似文献   

2.
Visual information from a speaker's face profoundly influences auditory perception of speech. However, relatively little is known about the extent to which visual influences may depend on experience, and extent to which new sources of visual speech information can be incorporated in speech perception. In the current study, participants were trained on completely novel visual cues for phonetic categories. Participants learned to accurately identify phonetic categories based on novel visual cues. These newly-learned visual cues influenced identification responses to auditory speech stimuli, but not to the same extent as visual cues from a speaker's face. The novel methods and results of the current study raise theoretical questions about the nature of information integration in speech perception, and open up possibilities for further research on learning in multimodal perception, which may have applications in improving speech comprehension among the hearing-impaired.  相似文献   

3.
4.
Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process.  相似文献   

5.
This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model.  相似文献   

6.
Acoustic analyses were undertaken to explore the durational characteristics of the fricatives [f,theta,s,v,delta z] as cues to initial consonant voicing in English. Based on reports on the perception of voiced-voiceless fricatives, it was expected that there would be clear-cut duration differences distinguishing voiced and voiceless fricatives. Preliminary results for three speakers indicate that, although differences emerged in the overall mean duration of voiced and voiceless fricatives, contrary to expectations, there was a great deal of overlap in the duration distribution of voiced and voiceless fricative tokens. Further research is needed to examine the role of duration as a cue to syllable-initial fricative consonant voicing in English.  相似文献   

7.
Phonemic and phonetic factors in adult cross-language speech perception   总被引:5,自引:0,他引:5  
Previous research has indicated that young infants can discriminate speech sounds across phonetic boundaries regardless of specific relevant experience, and that there is a modification in this ability during ontogeny such that adults often have difficulty discriminating phonetic contrasts which are not used contrastively in their native language. This pattern of findings has often been interpreted as suggesting that humans are endowed with innate auditory sensitivities which enable them to discriminate speech sounds according to universal phonetic boundaries and that there is a decline or loss in this ability after being exposed to a language which contrasts only a subset of those distinctions. The present experiments were designed to determine whether this modification represents a loss of sensorineural response capabilities or whether it shows a shift in attentional focus and/or processing strategies. In experiment 1, adult English-speaking subjects were tested on their ability to discriminate two non-English speech contrasts in a category-change discrimination task after first being predisposed to adopt one of four perceptual sets. In experiments 2, 3, and 4 subjects were tested in an AX (same/different) procedure, and the effects of both limited training and duration of the interstimulus interval were assessed. Results suggest that the previously observed ontogenetic modification in the perception of non-native phonetic contrasts involves a change in processing strategies rather than a sensorineural loss. Adult listeners can discriminate sounds across non-native phonetic categories in some testing conditions, but are not able to use that ability in testing conditions which have demands similar to those required in natural language processing.  相似文献   

8.
The perceptual mechanisms of assimilation and contrast in the phonetic perception of vowels were investigated. In experiment 1, 14 stimulus continua were generated using an /i/-/e/-/a/ vowel continuum. They ranged from a continuum with both ends belonging to the same phonemic category in Japanese, to a continuum with both ends belonging to different phonemic categories. The AXB method was employed and the temporal position of X was changed under three conditions. In each condition ten subjects were required to judge whether X was similar to A or to B. The results demonstrated that assimilation to the temporally closer sound occurs if the phonemic categories of A and B are the same and that contrast to the temporally closer sound occurs if A and B belong to different phonemic categories. It was observed that the transition from assimilation to contrast is continuous except in the /i'/-X-/e/ condition. In experiment 2, the total duration of t 1 (between A and X) and t 2 (between X and B) was changed under five conditions. One stimulus continuum consisted of the same phonemic category in Japanese and the other consisted of different phonemic categories. Six subjects were required to make similarity judgements of X. The results demonstrated that the occurrence of assimilation and contrast to the temporally closer sound seemed to be constant under each of the five conditions. The present findings suggest that assimilation and contrast are determined by three factors: the temporal position of the three stimuli, the acoustic distance between the three stimuli on the stimulus continuum, and the phonemic categories of the three stimuli.  相似文献   

9.
The perception of subphonemic differences between vowels was investigated using multidimensional scaling techniques. Three experiments were conducted with natural-sounding synthetic stimuli generated by linear predictive coding (LPC) formant synthesizers. In the first experiment, vowel sets near the pairs (i-I), (epsilon-ae), or (u-U) were synthesized containing 11 vowels each. Listeners judged the dissimilarities between all pairs of vowels within a set several times. These perceptual differences were mapped into distances between the vowels in an n-dimensional space using two-way multidimensional scaling. Results for each vowel set showed that the physical stimulus space, which was specified by the two parameters F1 and F2, was always mapped into a two-dimensional perceptual space. The best metric for modeling the perceptual distances was the Euclidean distance between F1 and F2 in barks. The second experiment investigated the perception of the same vowels from the first experiment, but embedded in a consonantal context. Following the same procedures as experiment 1, listeners' perception of the (bv) dissimilarities was not different from their perception of the isolated vowel dissimilarities. The third experiment investigated dissimilarity judgments for the three vowels (ae-alpha-lambda) located symmetrically in the F1 X F2 vowel space. While the perceptual space was again two dimensional, the influence of phonetic identity on vowel difference judgments was observed. Implications for determining metrics for subphonemic vowel differences using multidimensional scaling are discussed.  相似文献   

10.
The results of several experiments demonstrate that silence is an important cue for the perception of stop-consonant and affricate manner. In some circumstances, silence is necessary; in others, it is sufficient. But silence is not the only cue to these manners. There are other cues that are more or less equivalent in their perceptual effects, though they are quite different acoustically. Finally, silence is effective as a cue when it separates utterances produced by male and female speakers. These findings are taken to imply that, in these instances, perception is constrained as if by some abstract conception of what vocal tracts do when they make linguistically significant gestures.  相似文献   

11.
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.  相似文献   

12.
This study examines English speakers' relative weighting of two voicing cues in production and perception. Participants repeated words differing in initial consonant voicing ([b] or [p]) and labeled synthesized tokens ranging between [ba] and [pa] orthogonally according to voice onset time (VOT) and onset f0. Discriminant function analysis and logistic regression were used to calculate individuals' relative weighting of each cue. Production results showed a significant negative correlation of VOT and onset f0, while perception results showed a trend toward a positive correlation. No significant correlations were found across perception and production, suggesting a complex relationship between the two domains.  相似文献   

13.
Discrimination of speech-sound pairs drawn from a computer-generated continuum in which syllables varied along the place of articulation phonetic feature (/b,d,g/) was tested with macaques. The acoustic feature that was varied along the two-formant 15-step continuum was the starting frequency of the second-formant transition. Discrimination of stimulus pairs separated by two steps was tested along the entire continuum in a same-different task. Results demonstrated that peaks in the discrimination functions occur for macaques at the "phonetic boundaries" which separate the /b-d/ and /d-g/ categories for human listeners. The data support two conclusions. First, although current theoretical accounts of place perception by human adults suggest that isolated second-formant transitions are "secondary" cues, learned by association with primary cues, the animal data are more compatible with the notion that second-formant transitions are sufficient to allow the appropriate partitioning of a place continuum in the absence of associative pairing with other more complex cues. Second, we discuss two potential roles played by audition in the evolution of the acoustics of language. One is that audition provided a set of "natural psychophysical boundaries," based on rather simple acoustic properties, which guided the selection of the phonetic repertoire but did not solely determine it; the other is that audition provided a set of rules for the formation of "natural classes" of sound and that phonetic units met those criteria. The data provided in this experiment provide support for the former. Experiments that could more clearly differentiate the two hypotheses are described.  相似文献   

14.
15.
An acoustic cue for voicing is proposed based on the underlying processes associated with the production of the voicing contrast. This cue is based on the time asynchrony between the onsets of two amplitude-envelope signals derived from different bands of speech (i.e., envelopes derived from a lowpass-filtered band at 350 Hz and from a highpass-filtered band at 3000 Hz). Acoustic measurements made on the envelope signals of a set of 16 initial consonants represented through multiple tokens of C1VC2 syllables indicate that the onset-timing difference between the low- and high-frequency envelopes (Envelope-Onset Asynchrony or EOA) provides a reliable and robust cue for distinguishing voiced from voiceless consonants. This cue, which is simply derived in real-time, has applications to the design of sensory aids for persons with profound hearing impairments (e.g., as a supplement to lipreading), as well as to automatic speech recognition.  相似文献   

16.
Voice onset time (VOT) is a perceptual cue in voicing contrast of stops in the word initial position. The current study aims to acoustically and perceptually characterize VOT in one of the major South Indian languages — Tulu. Stimuli consisted of 2 pairs of meaningful words with velar [/p/-/b/] and bilabial stops and [/k/-g/] in the initial position. These words were uttered by 8 normal native speakers of Tulu and recorded using Praat software. Both spectrogram and waveform views were used to identify the VOT. For perceptual experiment, 4 adult native speakers of Tulu were asked to identify the stimulus from where voicing was truncated in steps of 5 to 7 ms till lead VOT was 0 and silence was added after the burst in 5 msec steps till the lag VOT was 50 msec. The reaction time and the accuracy in identification were measured. Results of acoustic measurement showed no significant mean difference between lead VOTs of two voiced consonants. However, there was a significant difference between means of lag VOTs of voiceless consonants. Results of Perceptual measurement showed that as lead VOT reduces, probability of indentification of /g/ responses reduces; whereas changing VOT had little effect on reaction time and identification of /b/ responses. These results probably indicate that VOT is not necessary to perceive voiceles constants in Tulu but is necessary in the perception of voiced consonants. Thus VOT is a constant specific cue in Tulu.  相似文献   

17.
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.  相似文献   

18.
Voice onset time (VOT) data for the plosives /p b t d k g/ in two vowel contexts (/i a/) for 5 groups of 46 boys and girls aged 5; 8 (5 years, 8 months) to 13;2 years were investigated to examine patterns of sex differences. Results indicated that there was some evidence of females displaying longer VOT values than the males. In addition, these were found to be most marked for the data of the 13;2-year olds. Furthermore, the sex differences in the VOT values displayed phonetic context effects. For example, the greatest sex differences were observed for the voiceless plosives, and within the context of the vowel /i/.  相似文献   

19.
Chinchillas were trained to discriminate a cosine-phase harmonic tone complex (COS) from wideband noise (WBN) and tested in a stimulus generalization paradigm with tone complexes in which phase differed between frequency regions. In this split-phase condition, responses to complexes made of random-phase low frequencies, cosine-phase high frequencies were similar to responses to the COS-training stimulus. However, responses to complexes made of cosine-phase low frequencies, random-phase high frequencies were generally lower than their responses to the COS-training stimulus. When tested with sine-phase (SIN) and random-phase (RND) tone complexes, responses were large for SIN, but were small for RND. Chinchillas were then trained to discriminate infinitely-iterated rippled noise (IIRN) from WBN and tested with noises in which the spectral ripple differed between frequency regions. In this split-spectrum condition, responses were large to noises made of rippled-spectrum low frequencies, flat-spectrum high frequencies, whereas responses were generally lower to noises made of flat-spectrum low frequencies, rippled-spectrum high frequencies. The results suggest that chinchillas listen across all frequencies, but attend to high frequencies when discriminating COS from WBN and attend to low frequencies when discriminating IIRN from WBN.  相似文献   

20.
There is limited documentation available on how sensorineurally hearing-impaired listeners use the various sources of phonemic information that are known to be distributed across time in the speech waveform. In this investigation, a group of normally hearing listeners and a group of sensorineurally hearing-impaired listeners (with and without the benefit of amplification) identified various consonant and vowel productions that had been systematically varied in duration. The consonants (presented in a /haCa/ environment) and the vowels (presented in a /bVd/ environment) were truncated in steps to eliminate various segments from the end of the stimulus. The results indicated that normally hearing listeners could extract more phonemic information, especially cues to consonant place, from the earlier occurring portions of the stimulus waveforms than could the hearing-impaired listeners. The use of amplification partially decreased the performance differences between the normally hearing listeners and the unaided hearing-impaired listeners. The results are relevant to current models of normal speech perception that emphasize the need for the listener to make phonemic identifications as quickly as possible.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号