首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The addition of low-passed (LP) speech or even a tone following the fundamental frequency (F0) of speech has been shown to benefit speech recognition for cochlear implant (CI) users with residual acoustic hearing. The mechanisms underlying this benefit are still unclear. In this study, eight bimodal subjects (CI users with acoustic hearing in the non-implanted ear) and eight simulated bimodal subjects (using vocoded and LP speech) were tested on vowel and consonant recognition to determine the relative contributions of acoustic and phonetic cues, including F0, to the bimodal benefit. Several listening conditions were tested (CI/Vocoder, LP, T(F0-env), CI/Vocoder + LP, CI/Vocoder + T(F0-env)). Compared with CI/Vocoder performance, LP significantly enhanced both consonant and vowel perception, whereas a tone following the F0 contour of target speech and modulated with an amplitude envelope of the maximum frequency of the F0 contour (T(F0-env)) enhanced only consonant perception. Information transfer analysis revealed a dual mechanism in the bimodal benefit: The tone representing F0 provided voicing and manner information, whereas LP provided additional manner, place, and vowel formant information. The data in actual bimodal subjects also showed that the degree of the bimodal benefit depended on the cutoff and slope of residual acoustic hearing.  相似文献   

2.
This study examined the perceptual specialization for native-language speech sounds, by comparing native Hindi and English speakers in their perception of a graded set of English /w/-/v/ stimuli that varied in similarity to natural speech. The results demonstrated that language experience does not affect general auditory processes for these types of sounds; there were strong cross-language differences for speech stimuli, and none for stimuli that were nonspeech. However, the cross-language differences extended into a gray area of speech-like stimuli that were difficult to classify, suggesting that the specialization occurred in phonetic processing prior to categorization.  相似文献   

3.
Phonemic and phonetic factors in adult cross-language speech perception   总被引:5,自引:0,他引:5  
Previous research has indicated that young infants can discriminate speech sounds across phonetic boundaries regardless of specific relevant experience, and that there is a modification in this ability during ontogeny such that adults often have difficulty discriminating phonetic contrasts which are not used contrastively in their native language. This pattern of findings has often been interpreted as suggesting that humans are endowed with innate auditory sensitivities which enable them to discriminate speech sounds according to universal phonetic boundaries and that there is a decline or loss in this ability after being exposed to a language which contrasts only a subset of those distinctions. The present experiments were designed to determine whether this modification represents a loss of sensorineural response capabilities or whether it shows a shift in attentional focus and/or processing strategies. In experiment 1, adult English-speaking subjects were tested on their ability to discriminate two non-English speech contrasts in a category-change discrimination task after first being predisposed to adopt one of four perceptual sets. In experiments 2, 3, and 4 subjects were tested in an AX (same/different) procedure, and the effects of both limited training and duration of the interstimulus interval were assessed. Results suggest that the previously observed ontogenetic modification in the perception of non-native phonetic contrasts involves a change in processing strategies rather than a sensorineural loss. Adult listeners can discriminate sounds across non-native phonetic categories in some testing conditions, but are not able to use that ability in testing conditions which have demands similar to those required in natural language processing.  相似文献   

4.
Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.  相似文献   

5.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

6.
How are laminar circuits of neocortex organized to generate conscious speech and language percepts? How does the brain restore information that is occluded by noise, or absent from an acoustic signal, by integrating contextual information over many milliseconds to disambiguate noise-occluded acoustical signals? How are speech and language heard in the correct temporal order, despite the influence of contexts that may occur many milliseconds before or after each perceived word? A neural model describes key mechanisms in forming conscious speech percepts, and quantitatively simulates a critical example of contextual disambiguation of speech and language; namely, phonemic restoration. Here, a phoneme deleted from a speech stream is perceptually restored when it is replaced by broadband noise, even when the disambiguating context occurs after the phoneme was presented. The model describes how the laminar circuits within a hierarchy of cortical processing stages may interact to generate a conscious speech percept that is embodied by a resonant wave of activation that occurs between acoustic features, acoustic item chunks, and list chunks. Chunk-mediated gating allows speech to be heard in the correct temporal order, even when what is heard depends upon future context.  相似文献   

7.
Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing.  相似文献   

8.
Acoustic and kinematic analyses, as well as perceptual evaluation, were conducted on the speech of Parkinsonian and normal geriatric adults. As a group, the Parkinsonian speakers had very limited jaw movement compared to the normal geriatrics. For opening gestures, jaw displacements and velocities produced by the Parkinsonian subjects were about half those produced by the normal geriatrics. Lower lip movement amplitude and velocity also were reduced for the Parkinsonian speakers relative to the normal geriatrics, but the magnitude of the reduction was not as great as that seen in the jaw. Lower lip closing velocities expressed as a function of movement amplitude were greater for the Parkinsonian speakers than for the normal geriatrics. This increased velocity of lower lip movement may reflect a difference in the control of lip elevation for the Parkinsonian speakers, an effect that increased with the severity of dysarthria. Acoustically, the Parkinsonian subjects had reduced durations of vocalic segments, reduced formant transitions, and increased voice onset time compared to the normal geriatrics. These effects were greater for the more severe, compared to the milder, dysarthrics and were most apparent in the more complex, vocalic gestures.  相似文献   

9.
This study examined the impact on speech processing of regional phonetic/phonological variation in the listener's native language. The perception of the /e/-/epsilon/ and /o/-/upside down c/ contrasts, produced by standard but not southern French native speakers, was investigated in these two populations. A repetition priming experiment showed that the latter but not the former perceived words such as /epe/ and /epepsilon/ as homophones. In contrast, both groups perceived the two words of /o/-/upside down c/ minimal pairs (/pom/-/p(uspide down c)m/) as being distinct. Thus, standard-French words can be perceived differently depending on the listener's regional accent.  相似文献   

10.
The perceptual mechanisms of assimilation and contrast in the phonetic perception of vowels were investigated. In experiment 1, 14 stimulus continua were generated using an /i/-/e/-/a/ vowel continuum. They ranged from a continuum with both ends belonging to the same phonemic category in Japanese, to a continuum with both ends belonging to different phonemic categories. The AXB method was employed and the temporal position of X was changed under three conditions. In each condition ten subjects were required to judge whether X was similar to A or to B. The results demonstrated that assimilation to the temporally closer sound occurs if the phonemic categories of A and B are the same and that contrast to the temporally closer sound occurs if A and B belong to different phonemic categories. It was observed that the transition from assimilation to contrast is continuous except in the /i'/-X-/e/ condition. In experiment 2, the total duration of t 1 (between A and X) and t 2 (between X and B) was changed under five conditions. One stimulus continuum consisted of the same phonemic category in Japanese and the other consisted of different phonemic categories. Six subjects were required to make similarity judgements of X. The results demonstrated that the occurrence of assimilation and contrast to the temporally closer sound seemed to be constant under each of the five conditions. The present findings suggest that assimilation and contrast are determined by three factors: the temporal position of the three stimuli, the acoustic distance between the three stimuli on the stimulus continuum, and the phonemic categories of the three stimuli.  相似文献   

11.
Reiterant speech, or nonsense syllable mimicry, has been proposed as a way to study prosody, particularly syllable and word durations, unconfounded by segmental influences. Researchers have shown that segmental influences on durations can be neutralized in reiterant speech. If it is to be a useful tool in the study of prosody, it must also be shown that reiterant speech preserves the suprasegmental duration and intonation differences relevant to perception. In the present study, syllable durations for nonreiterant and reiterant ambiguous sentences were measured to seek evidence of the duration differences which can enable listeners to resolve surface structure ambiguities in nonreiterant speech. These duration patterns were found in both nonreiterant and reiterant speech. A perceptual study tested listeners' perception of these ambiguous sentences as spoken by four "good" speakers--speakers who neutralized intrinsic duration differences and whose sentences were independently rated by skilled listeners as good imitations of normal speech. The listeners were able to choose the correct interpretation when the ambiguous sentences were in reiterant form as well as they did when the sentences were spoken normally. These results support the notion that reiterant speech is like nonreiterant speech in aspects which are important in the study of prosody.  相似文献   

12.
The present experiments examine the effects of listener age and hearing sensitivity on the ability to understand temporally altered speech in quiet when the proportion of a sentence processed by time compression is varied. Additional conditions in noise investigate whether or not listeners are affected by alterations in the presentation rate of background speech babble, relative to the presentation rate of the target speech signal. Younger and older adults with normal hearing and with mild-to-moderate sensorineural hearing losses served as listeners. Speech stimuli included sentences, syntactic sets, and random-order words. Presentation rate was altered via time compression applied to the entire stimulus or to selected phrases within the stimulus. Older listeners performed more poorly than younger listeners in most conditions involving time compression, and their performance decreased progressively with the proportion of the stimulus that was processed with time compression. Older listeners also performed more poorly than younger listeners in all noise conditions, but both age groups demonstrated better performance in conditions incorporating a mismatch in the presentation rate between target signal and background babble compared to conditions with matched rates. The age effects in quiet are consistent with the generalized slowing hypothesis of aging. Performance patterns in noise tentatively support the notion that altered rates of speech signal and background babble may provide a cue to enhance auditory figure-ground perception by both younger and older listeners.  相似文献   

13.
In a series of experiments, a variant of duplex perception was investigated. In its original form, duplex perception is created by presenting an isolated transition to one ear and the remainder of the syllable, the standard base, to the other ear. Listeners hear a chirp at the ear receiving the isolated transition, and a full syllable at the ear receiving the base. The new version of duplex perception was created by presenting a third-formant transition in isolation to one ear and the same transition electronically mixed with the base to the other ear; the modified base now has all the information necessary for syllabic perception. With the new procedure, listeners reported hearing a chirp centered in the middle of their head and a syllable in the ear presented the modified base that was clearer than that produced by the isolated transition and standard base. They could also reliably choose the patterns that contained the additional transition in the base when attending to either the phonetic or nonphonetic sides of the duplex percept. In addition, when the fundamental frequency, onset time, and intensity of the isolated third-formant transition were varied relative to the base, the phonetic and nonphonetic (lateralization) percepts were differentially affected, although not always reliably. In general, nonphonetic fusion was more affected by large differences in these variables than was phonetic fusion. However, when two isolated third-formant transitions were presented dichotically, fusion and the resulting central location of the chirp failed markedly with relatively small differences in each variable. The results were discussed in terms of the role of fusion in the new version of duplex perception and the nature of the information that undergoes both phonetic and nonphonetic fusion.  相似文献   

14.
Herein investigated are computationally simple microphone-array beamformers that are independent of the frequency-spectra of all signals, all interference, and all noises. These beamformers allow the listener to tune the desired azimuth-elevation "look direction." No prior information is needed of the interference. These beamformers deploy a physically compact triad of three collocated but orthogonally oriented velocity sensors. These proposed schemes' efficacy is verified by a jury test, using simulated data constructed with Mandarin Chinese (a.k.a. Putonghua) speech samples. For example, a desired speech signal, originally at a very adverse signal-to-interference-and-noise power ratio (SINR) of -30 dB, may be processed to become fully intelligible to the jury.  相似文献   

15.
This paper investigates the mechanisms controlling the phonemic quantity contrast and speech rate in nonsense p(1)Np(2)a words read by five Slovak speakers in normal and fast speech rate. N represents a syllable nucleus, which in Slovak corresponds to long and short vowels and liquid consonants. The movements of the lips and the tongue were recorded with an electromagnetometry system. Together with the acoustic durations of p(1), N, and p(2), gestural characteristics of three core movements were extracted: p(1) lip opening, tongue movement for (N)ucleus, and p(2) lip closing. The results show that, although consonantal and vocalic nuclei are predictably different on many kinematic measures, their common phonological behavior as syllabic nuclei may be linked to a stable temporal coordination of the consonantal gestures flanking the nucleus. The functional contrast between phonemic duration and speech rate was reflected in the bias in the control mechanisms they employed: the strategies robustly used for signaling phonemic duration, such as the degree of coproduction of the two lip movements, showed a minimal effect of speech rate, while measures greatly affected by speech rate, such as p(2) acoustic duration, or the degree of p(1)-N gestural coproduction, tended to be minimally influenced by phonemic quantity.  相似文献   

16.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

17.
Two experiments were conducted to investigate whether or not anchoring and selective adaptation induce basically the same psychological effects. The purpose of the first experiment is to show how an audiovisual anchor modifies the perception of consonant-vowel (CV) syllables. The anchors were two purely acoustical, two purely optical, and three audiovisual CV syllables. The results were compared with those of audiovisual speech selective-adaptation experiments conducted by Roberts and Summerfield [Percept. Psychophys. 30, 309-314 (1981)] and Salda?a and Rosenblum [J. Acoust. Soc. Am. 95, 3658-3661 (1994)]. The audiovisual anchoring effects were found to be very similar to the audiovisual selective-adaptation effects, but the incompatible audiovisual anchor produced more auditory-based contrast than the purely acoustical anchor or the compatible audiovisual anchor. This difference in contrast had not been found in the previous selective-adaptation experiments. The second experiment was conducted to directly compare audiovisual anchoring and selective-adaptation effects under the same stimuli and with the same subjects. It was found that the compatible audiovisual syllable (AbVb) caused more contrast in selective adaptation than in anchoring, although the discrepant audiovisual syllable (AbVg) caused no difference between anchoring and selective adaptation. It was also found that the anchor AbVg caused more auditory-based contrast than the anchor AbVb. It is suggested that the mechanisms behind these results are different.  相似文献   

18.
Integral processing of phonemes: evidence for a phonetic mode of perception   总被引:1,自引:0,他引:1  
To investigate the extent and locus of integral processing in speech perception, a speeded classification task was utilized with a set of noise-tone analogs of the fricative-vowel syllables (fae), (integral of ae), (fu), and (integral of u). Unlike the stimuli used in previous studies of selective perception of syllables, these stimuli did not contain consonant-vowel transitions. Subjects were asked to classify on the basis of one of the two syllable components. Some subjects were told that the stimuli were computer generated noise-tone sequences. These subjects processed the noise and tone separably. Irrelevant variation of the noise did not affect reaction times (RTs) for the classification of the tone, and vice versa. Other subjects were instructed to treat the stimuli as speech. For these subjects, irrelevant variation of the fricative increased RTs for the classification of the vowel, and vice versa. A second experiment employed naturally spoken fricative-vowel syllables with the same task. Classification RTs showed a pattern of integrality in that irrelevant variation of either component increased RTs to the other. These results indicate that knowledge of coarticulation (or its acoustic consequences) is a basic element of speech perception. Furthermore, the use of this knowledge in phonetic coding is mandatory, even in situations where the stimuli do not contain coarticulatory information.  相似文献   

19.
Three experiments were conducted to study relative contributions of speaking rate, temporal envelope, and temporal fine structure to clear speech perception. Experiment I used uniform time scaling to match the speaking rate between clear and conversational speech. Experiment II decreased the speaking rate in conversational speech without processing artifacts by increasing silent gaps between phonetic segments. Experiment III created "auditory chimeras" by mixing the temporal envelope of clear speech with the fine structure of conversational speech, and vice versa. Speech intelligibility in normal-hearing listeners was measured over a wide range of signal-to-noise ratios to derive speech reception thresholds (SRT). The results showed that processing artifacts in uniform time scaling, particularly time compression, reduced speech intelligibility. Inserting gaps in conversational speech improved the SRT by 1.3 dB, but this improvement might be a result of increased short-term signal-to-noise ratios during level normalization. Data from auditory chimeras indicated that the temporal envelope cue contributed more to the clear speech advantage at high signal-to-noise ratios, whereas the temporal fine structure cue contributed more at low signal-to-noise ratios. Taken together, these results suggest that acoustic cues for the clear speech advantage are multiple and distributed.  相似文献   

20.
Monolingual Peruvian Spanish listeners identified natural tokens of the Canadian French (CF) and Canadian English (CE) /?/ and /?/, produced in five consonantal contexts. The results demonstrate that while the CF vowels were mapped to two different native vowels, /e/ and /a/, in all consonantal contexts, the CE contrast was mapped to the single native vowel /a/ in four out of five contexts. Linear discriminant analysis revealed that acoustic similarity between native and target language vowels was a very good predictor of context-specific perceptual mappings. Predictions are made for Spanish learners of the /?/-/?/ contrast in CF and CE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号