首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
听觉模拟的语音增强方法   总被引:2,自引:0,他引:2  
本文通过分析听觉系统的信号提取方法,提出了适合于信号提取的动态多阈值的概念,并以此提出了实现语音增强的方法。实验结果表明,与传统的语音增强方法相比,听觉模拟的语音增强方法有更好的增强效果。  相似文献   

2.
3.
The phonetic identification ability of an individual (SS) who exhibits the best, or equal to the best, speech understanding of patients using the Symbion four-channel cochlear implant is described. It has been found that SS: (1) can use aspects of signal duration to form categories that are isomorphic with the phonetic categories established by listeners with normal auditory function; (2) can combine temporal and spectral cues in a normal fashion to form categories; (3) can use aspects of fricative noises to form categories that correspond to normal phonetic categories; (4) uses information from both F1 and higher formants in vowel identification; and (5) appears to identify stop consonant place of articulation on the basis of information provided by the center frequency of the burst and by the abruptness of frequency change following signal onset. SS has difficulty identifying stop consonants from the information provided by formant transitions and cannot differentially identify signals that have identical F1's and relatively low-frequency F2's. SS's performance suggests that simple speech processing strategies (filtering of the signal into four bands) and monopolar electrode design are viable options in the design of cochlear prostheses.  相似文献   

4.
Cochlear implant (CI) users in tone language environments report great difficulty in perceiving lexical tone. This study investigated the augmentation of simulated cochlear implant audio by visual (facial) speech information for tone. Native speakers of Mandarin and Australian English were asked to discriminate between minimal pairs of Mandarin tones in five conditions: Auditory-Only, Auditory-Visual, CI-simulated Auditory-Only, CI-simulated Auditory-Visual, and Visual-Only (silent video). Discrimination in CI-simulated audio conditions was poor compared with normal audio, and varied according to tone pair, with tone pairs with strong non-F0 cues discriminated the most easily. The availability of visual speech information also improved discrimination in the CI-simulated audio conditions, particularly on tone pairs with strong durational cues. In the silent Visual-Only condition, both Mandarin and Australian English speakers discriminated tones above chance levels. Interestingly, tone-nai?ve listeners outperformed native listeners in the Visual-Only condition, suggesting firstly that visual speech information for tone is available, and may in fact be under-used by normal-hearing tone language perceivers, and secondly that the perception of such information may be language-general, rather than the product of language-specific learning. This may find application in the development of methods to improve tone perception in CI users in tone language environments.  相似文献   

5.
Previous research has identified a "synchrony window" of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception.  相似文献   

6.
The hypothesis was investigated that selectively increasing the discrimination of low-frequency information (below 2600 Hz) by altering the frequency-to-electrode allocation would improve speech perception by cochlear implantees. Two experimental conditions were compared, both utilizing ten electrode positions selected based on maximal discrimination. A fixed frequency range (200-10513 Hz) was allocated either relatively evenly across the ten electrodes, or so that nine of the ten positions were allocated to the frequencies up to 2600 Hz. Two additional conditions utilizing all available electrode positions (15-18 electrodes) were assessed: one with each subject's usual frequency-to-electrode allocation; and the other using the same analysis filters as the other experimental conditions. Seven users of the Nucleus CI22 implant wore processors mapped with each experimental condition for 2-week periods away from the laboratory, followed by assessment of perception of words in quiet and sentences in noise. Performance with both ten-electrode maps was significantly poorer than with both full-electrode maps on at least one measure. Performance with the map allocating nine out of ten electrodes to low frequencies was equivalent to that with the full-electrode maps for vowel perception and sentences in noise, but was worse for consonant perception. Performance with the evenly allocated ten-electrode map was equivalent to that with the full-electrode maps for consonant perception, but worse for vowel perception and sentences in noise. Comparison of the two full-electrode maps showed that subjects could fully adapt to frequency shifts up to ratio changes of 1.3, given 2 weeks' experience. Future research is needed to investigate whether speech perception may be improved by the manipulation of frequency-to-electrode allocation in maps which have a full complement of electrodes in Nucleus implants.  相似文献   

7.
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.  相似文献   

8.
On the role of spectral transition for speech perception   总被引:2,自引:0,他引:2  
This paper examines the relationship between dynamic spectral features and the identification of Japanese syllables modified by initial and/or final truncation. The experiments confirm several main points. "Perceptual critical points," where the percent correct identification of the truncated syllable as a function of the truncation position changes abruptly, are related to maximum spectral transition positions. A speech wave of approximately 10 ms in duration that includes the maximum spectral transition position bears the most important information for consonant and syllable perception. Consonant and vowel identification scores simultaneously change as a function of the truncation position in the short period, including the 10-ms period for final truncation. This suggests that crucial information for both vowel and consonant identification is contained across the same initial part of each syllable. The spectral transition is more crucial than unvoiced and buzz bar periods for consonant (syllable) perception, although the latter features are of some perceptual importance. Also, vowel nuclei are not necessary for either vowel or syllable perception.  相似文献   

9.
This study measures the ability of observers to compare the intensities of two stimuli occupying different frequency regions. It includes three experiments, each experiment having two conditions. In one condition, the two stimuli to be compared were presented simultaneously within each interval; this condition has been called profile analysis. In the other condition, the two stimuli were presented successively within each interval. Because the overall level of the stimuli was randomized between intervals, the observers were encouraged to compare the intensities of the two stimuli within each observation interval rather than between intervals. The stimuli were two simple tones in experiment 1 and two tonal complexes in both experiments 2 and 3. The stimuli used in experiments 2 and 3 differed in frequency. The results show that simultaneous comparisons are superior to successive comparisons. For simple tones, the difference in threshold is about 8 dB; for complexes with 10 to 11 components, the difference in threshold is about 15 dB. These differences can be explained by assuming that internal noises in different channels were partially correlated when stimuli in those channels were presented simultaneously and were independent when the stimuli were presented successively. Cancellation of the correlated noise is therefore possible with simultaneous comparisons, making such discrimination better than that achievable with successive comparisons.  相似文献   

10.
This study examined the ability of six-month-old infants to recognize the perceptual similarity of syllables sharing a phonetic segment when variations were introduced in phonetic environment and talker. Infants in a "phonetic" group were visually reinforced for head turns when a change occurred from a background category of labial nasals to a comparison category of alveolar nasals . The infants were initially trained on a [ma]-[na] contrast produced by a male talker. Novel tokens differing in vowel environment and talker were introduced over several stages of increasing complexity. In the most complex stage infants were required to make a head turn when a change occurred from [ma,mi,mu] to [na,ni,nu], with the tokens in each category produced by both male and female talkers. A " nonphonetic " control group was tested using the same pool of stimuli as the phonetic condition. The only difference was that the stimuli in the background and comparison categories were chosen in such a way that the sounds could not be organized by acoustic or phonetic characteristics. Infants in the phonetic group transferred training to novel tokens produced by different talkers and in different vowel contexts. However, infants in the nonphonetic control group had difficulty learning the phonetically unrelated tokens that were introduced as the experiment progressed. These findings suggest that infants recognize the similarity of nasal consonants sharing place of articulation independent of variation in talker and vowel context.  相似文献   

11.
Tone languages differ from English in that the pitch pattern of a single-syllable word conveys lexical meaning. In the present study, dependence of tonal-speech perception on features of the stimulation was examined using an acoustic simulation of a CIS-type speech-processing strategy for cochlear prostheses. Contributions of spectral features of the speech signals were assessed by varying the number of filter bands, while contributions of temporal envelope features were assessed by varying the low-pass cutoff frequency used for extracting the amplitude envelopes. Ten normal-hearing native Mandarin Chinese speakers were tested. When the low-pass cutoff frequency was fixed at 512 Hz, consonant, vowel, and sentence recognition improved as a function of the number of channels and reached plateau at 4 to 6 channels. Subjective judgments of sound quality continued to improve as the number of channels increased to 12, the highest number tested. Tone recognition, i.e., recognition of the four Mandarin tone patterns, depended on both the number of channels and the low-pass cutoff frequency. The trade-off between the temporal and spectral cues for tone recognition indicates that temporal cues can compensate for diminished spectral cues for tone recognition and vice versa. An additional tone recognition experiment using syllables of equal duration showed a marked decrease in performance, indicating that duration cues contribute to tone recognition. A third experiment showed that recognition of processed FM patterns that mimic Mandarin tone patterns was poor when temporal envelope and duration cues were removed.  相似文献   

12.
The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time-frequency (T-F) representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at -10 dB SNR for all maskers tested. Performance was affected the most when the masker dominated T-F units were wrongly labeled as target-dominated T-F units. Performance plateaued near 100% correct for SNR thresholds ranging from -20 to 5 dB. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each T-F unit. This pattern directs the listener's attention to where the target is and enables them to segregate speech effectively in multitalker environments.  相似文献   

13.
14.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

15.
Many studies have noted great variability in speech perception ability among postlingually deafened adults with cochlear implants. This study examined phoneme misperceptions for 30 cochlear implant listeners using either the Nucleus-22 or Clarion version 1.2 device to examine whether listeners with better overall speech perception differed qualitatively from poorer listeners in their perception of vowel and consonant features. In the first analysis, simple regressions were used to predict the mean percent-correct scores for consonants and vowels for the better group of listeners from those of the poorer group. A strong relationship between the two groups was found for consonant identification, and a weak, nonsignificant relationship was found for vowel identification. In the second analysis, it was found that less information was transmitted for consonant and vowel features to the poorer listeners than to the better listeners; however, the pattern of information transmission was similar across groups. Taken together, results suggest that the performance difference between the two groups is primarily quantitative. The results underscore the importance of examining individuals' perception of individual phoneme features when attempting to relate speech perception to other predictor variables.  相似文献   

16.
本文针对语音信号稀疏表示及压缩感知问题,将听觉感知引入稀疏系数筛选过程,用掩蔽阈值筛选重要系数,以得到更符合听觉感受的语音稀疏表示。通过对一帧浊音信号分别采用掩蔽阈值和能量阈值方法进行系数筛选对比实验,结果表明掩蔽阈值法具有更好的稀疏表示效果。为验证听觉感知对语音压缩感知性能的影响,与能量阈值法对照对测试语音进行压缩感知观测和重构,通过压缩比、信噪比、主观平均意见分等主客观指标评价其性能,结果表明,掩蔽阈值法可有效地提高压缩比且保证重构语音具有较高的主观听觉质量。  相似文献   

17.
Twenty-one sensorineurally hearing-impaired adolescents were studied with an extensive battery of tone-perception, phoneme-perception, and speech-perception tests. Tests on loudness perception, frequency selectivity, and temporal resolution at the test frequencies of 500, 1000, and 2000 Hz were included. The mean values and the gradient across frequencies were used in further analysis. Phoneme-perception data were gathered by means of similarity judgments and phonemic confusions. Speech-reception thresholds were determined in quiet and in noise for unfiltered speech material, and with additional low-pass and high-pass filtering in noise. The results show that hearing loss for speech is related to both the frequency resolving power and temporal processing by the ear. Phoneme-perception parameters proved to be more related to the filtered-speech thresholds than to the thresholds for unfiltered speech. This finding may indicate that phoneme-perception parameters play only a secondary role, and for that reason their bridging function between tone perception and speech perception is only limited.  相似文献   

18.
19.
20.
Complex tone bursts were bandpass filtered, 22nd-30th harmonic, to produce waveforms with five regularly occurring envelope peaks ("pitch pulses") that evoked pitches associated with their repetition period. Two such tone bursts were presented sequentially and separated by an interpulse interval (IPI). When the IPI was varied, the pitch of the whole sequence was shifted by between +2% and -5%. When the IPI was greater than one period, little effect was seen. This is consistent with a pitch mechanism employing a long integration time for continuous stimuli that resets in response to temporal discontinuities of greater than about one period of the waveform. Similar pitch shifts were observed for fundamental frequencies from 100 to 250 Hz. The pitch shifts depended on the IPI duration relative to the period of the complex, not on the absolute IPI duration. The pitch shifts are inconsistent with the autocorrelation model of Meddis and O'Mard [J. Acoust. Soc. Am. 102, 1811-1820 (1997)], although a modified version of the weighted mean-interval model of Carlyon et al. [J. Acoust. Soc. Am. 112, 621-633 (2002)] was successful. The pitch shifts suggest that, when two pulses occur close together, one of the pulses is ignored on a probabilistic basis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号