首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Mathematical treatment of context effects in phoneme and word recognition   总被引:2,自引:0,他引:2  
Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.  相似文献   

2.
The ability to integrate information across sensory channels is critical for both within- and between-modality speech processing. The present study evaluated the hypothesis that inter- and intramodal integration abilities are related, in young and older adults. Further, the investigation asked if intramodal integration (auditory+auditory), and intermodal integration (auditory+visual) resist changes as a function of either aging or the presence of hearing loss. Three groups of adults (young with normal hearing, older with normal hearing, and older with hearing loss) were asked to identify words in sentence context. Intramodal integration ability was assessed by presenting disjoint passbands of speech (550-750 and 1650-2250 Hz) to either ear. Integration was indexed by factoring monotic from dichotic scores to control for potential hearing- or age-related influences on absolute performance. Intermodal integration ability was assessed by presenting the auditory and visual signals. Integration was indexed by a measure based on probabilistic models of auditory-visual integration, termed integration enhancement. Results suggested that both types of integration ability are largely resistant to changes with age and hearing loss. In addition, intra- and intermodal integration were shown to be not correlated. As measured here, these findings suggest that there is not a common mechanism that accounts for both inter- and intramodal integration performance.  相似文献   

3.
Cochlear implant users receive limited spectral and temporal information. Their speech recognition deteriorates dramatically in noise. The aim of the present study was to determine the relative contributions of spectral and temporal cues to speech recognition in noise. Spectral information was manipulated by varying the number of channels from 2 to 32 in a noise-excited vocoder. Temporal information was manipulated by varying the low-pass cutoff frequency of the envelope extractor from 1 to 512 Hz. Ten normal-hearing, native speakers of English participated in tests of phoneme recognition using vocoder processed consonants and vowels under three conditions (quiet, and +6 and 0 dB signal-to-noise ratios). The number of channels required for vowel-recognition performance to plateau increased from 12 in quiet to 16-24 in the two noise conditions. However, for consonant recognition, no further improvement in performance was evident when the number of channels was > or =12 in any of the three conditions. The contribution of temporal cues for phoneme recognition showed a similar pattern in both quiet and noise conditions. Similar to the quiet conditions, there was a trade-off between temporal and spectral cues for phoneme recognition in noise.  相似文献   

4.
The purpose of this study was to examine temporal resolution in normal-hearing preschool children. Word recognition was evaluated in quiet and in spectrally identical continuous and interrupted noise at signal-to-noise ratios (S/Ns) of 10, 0, and -10 dB. Sixteen children 4 to 5 years of age and eight adults participated. Performance decreased with decreasing S/N. At poorer S/Ns, participants demonstrated superior performance or a release from masking in the interrupted noise. Adults performed better than children, yet the release from masking was equivalent. Collectively these findings are consistent with the notion that preschool children suffer from poorer processing efficiency rather than temporal resolution per se.  相似文献   

5.
Temporal processing declines with age may reduce the processing of concurrent vowels. For this study, listeners categorized vowel pairs varying in temporal asynchrony as one sound, two overlapping sounds, or two sounds separated by a gap. Two boundaries separating the three response categories, multiplicity and gap-identification, were measured. Compared to young and middle-aged listeners, older listeners required longer temporal offsets for multiplicity. Middle-aged and older listeners also required longer offsets for gap-identification. For older listeners, correlations with various temporal processing tasks indicated that vowel temporal-order thresholds were related to multiplicity, while age and non-speech gap-detection thresholds were related to gap-identification.  相似文献   

6.
Some effects of talker variability on spoken word recognition   总被引:2,自引:0,他引:2  
The perceptual consequences of trial-to-trial changes in the voice of the talker on spoken word recognition were examined. The results from a series of experiments using perceptual identification and naming tasks demonstrated that perceptual performance decreases when the voice of the talker changes from trial to trial compared to performance when the voice on each trial remains the same. In addition, the effects of talker variability on word recognition appeared to be more robust and less dependent on task than the effects of word frequency and lexical structure. Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker's voice is intimately related to early perceptual processes that extract acoustic-phonetic information from the speech signal.  相似文献   

7.
To assess age-related differences in benefit from masker modulation, younger and older adults with normal hearing but not identical audiograms listened to nonsense syllables in each of two maskers: (1) a steady-state noise shaped to match the long-term spectrum of the speech, and (2) this same noise modulated by a 10-Hz square wave, resulting in an interrupted noise. An additional low-level broadband noise was always present which was shaped to produce equivalent masked thresholds for all subjects. This minimized differences in speech audibility due to differences in quiet thresholds among subjects. An additional goal was to determine if age-related differences in benefit from modulation could be explained by differences in thresholds measured in simultaneous and forward maskers. Accordingly, thresholds for 350-ms pure tones were measured in quiet and in each masker; thresholds for 20-ms signals in forward and simultaneous masking were also measured at selected signal frequencies. To determine if benefit from modulated maskers varied with masker spectrum and to provide a comparison with previous studies, a subgroup of younger subjects also listened in steady-state and interrupted noise that was not spectrally shaped. Articulation index (AI) values were computed and speech-recognition scores were predicted for steady-state and interrupted noise; predicted benefit from modulation was also determined. Masked thresholds of older subjects were slightly higher than those of younger subjects; larger age-related threshold differences were observed for short-duration than for long-duration signals. In steady-state noise, speech recognition for older subjects was poorer than for younger subjects, which was partially attributable to older subjects' slightly higher thresholds in these maskers. In interrupted noise, although predicted benefit was larger for older than younger subjects, scores improved more for younger than for older subjects, particularly at the higher noise level. This may be related to age-related increases in thresholds in steady-state noise and in forward masking, especially at higher frequencies. Benefit of interrupted maskers was larger for unshaped than for speech-shaped noise, consistent with AI predictions.  相似文献   

8.
Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features.  相似文献   

9.
This study investigated the effect of pulsatile stimulation rate on medial vowel and consonant recognition in cochlear implant listeners. Experiment 1 measured phoneme recognition as a function of stimulation rate in six Nucleus-22 cochlear implant listeners using an experimental four-channel continuous interleaved sampler (CIS) speech processing strategy. Results showed that all stimulation rates from 150 to 500 pulses/s/electrode produced equally good performance, while stimulation rates lower than 150 pulses/s/electrode produced significantly poorer performance. Experiment 2 measured phoneme recognition by implant listeners and normal-hearing listeners as a function of the low-pass cutoff frequency for envelope information. Results from both acoustic and electric hearing showed no significant difference in performance for all cutoff frequencies higher than 20 Hz. Both vowel and consonant scores dropped significantly when the cutoff frequency was reduced from 20 Hz to 2 Hz. The results of these two experiments suggest that temporal envelope information can be conveyed by relatively low stimulation rates. The pattern of results for both electrical and acoustic hearing is consistent with a simple model of temporal integration with an equivalent rectangular duration (ERD) of the temporal integrator of about 7 ms.  相似文献   

10.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

11.
12.
汉语耳语音孤立字识别研究   总被引:6,自引:0,他引:6       下载免费PDF全文
杨莉莉  林玮  徐柏龄 《应用声学》2006,25(3):187-192
耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%.  相似文献   

13.
This study investigated the relationship between audibility and predictions of speech recognition for children and adults with normal hearing. The Speech Intelligibility Index (SII) is used to quantify the audibility of speech signals and can be applied to transfer functions to predict speech recognition scores. Although the SII is used clinically with children, relatively few studies have evaluated SII predictions of children's speech recognition directly. Children have required more audibility than adults to reach maximum levels of speech understanding in previous studies. Furthermore, children may require greater bandwidth than adults for optimal speech understanding, which could influence frequency-importance functions used to calculate the SII. Speech recognition was measured for 116 children and 19 adults with normal hearing. Stimulus bandwidth and background noise level were varied systematically in order to evaluate speech recognition as predicted by the SII and derive frequency-importance functions for children and adults. Results suggested that children required greater audibility to reach the same level of speech understanding as adults. However, differences in performance between adults and children did not vary across frequency bands.  相似文献   

14.
The relationships among age-related differences in gap detection and word recognition in subjects with normal hearing or mild sensorineural hearing loss were explored in two studies. In the first study, gap thresholds were obtained for 40 younger and 40 older subjects. The gaps were carried by 150-ms, modulated, low-pass noise bursts with cutoff frequencies of 1 or 6 kHz. The noise bursts were presented at an overall level of 80 dB SPL in three background conditions. Mean gap thresholds ranged between 2.6 and 7.8 ms for the younger age group and between 3.4 and 10.0 ms for the older group. Mean gap thresholds were significantly larger for the older group in all six conditions. Gap thresholds were not significantly correlated with audiometric thresholds in either age group but the 1-kHz gap thresholds increased with age in the younger group. In the second study, the relationships among gap thresholds, spondee-in-babble thresholds, and audiometric thresholds of 66 subjects were examined. Compared with the older subjects, the younger group recognized the spondees at significantly lower (more difficult) spondee-to-babble ratios. In the younger group, spondee-in-babble thresholds were significantly correlated with gap thresholds in conditions of high-frequency masking. In the older group, spondee-in-babble thresholds, gap thresholds, and audiometric thresholds were not significantly correlated, but the spondee-in-babble thresholds and two audiometric thresholds increased significantly with age. These results demonstrate that significant age-related changes in auditory processing occur throughout adulthood. Specifically, age-related changes in temporal acuity may begin decades earlier than age-related changes in word recognition.  相似文献   

15.
This study examined the effects of age and hearing loss on short-term adaptation to accented speech. Data from younger and older listeners in a prior investigation [Gordon-Salant et al. (2010). J. Acoust. Soc. Am. 128, 444-455] were re-analyzed to examine changes in recognition over four administrations of equivalent lists of English stimuli recorded by native speakers of Spanish and English. Results showed improvement in recognition scores over four list administrations for the accented stimuli but not for the native English stimuli. Group effects emerged but were not involved in any interactions, suggesting that short-term adaptation to accented speech is preserved with aging and with hearing loss.  相似文献   

16.
Audio-visual identification of sentences was measured as a function of audio delay in untrained observers with normal hearing; the soundtrack was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. When the soundtrack was delayed by 160 ms, identification scores were no better than when no acoustical information at all was provided. Delays of up to 80 ms had little effect on group-mean performance, but a separate analysis of a subgroup of better lipreaders showed a significant trend of reduced scores with increased delay in the range from 0-80 ms. A second experiment tested the interpretation that, although the main disruptive effect of the delay occurred on a syllabic time scale, better lipreaders might be attempting to use intermodal timing cues at a phonemic level. Normal-hearing observers determined whether a 120-Hz complex tone started before or after the opening of a pair of liplike Lissajou figures. Group-mean difference limens (70.7% correct DLs) were - 79 ms (sound leading) and + 138 ms (sound lagging), with no significant correlation between DLs and sentence lipreading scores. It was concluded that most observers, whether good lipreaders or not, possess insufficient sensitivity to intermodal timing cues in audio-visual speech for them to be used analogously to voice onset time in auditory speech perception. The results of both experiments imply that delays of up to about 40 ms introduced by signal-processing algorithms in aids to lipreading should not materially affect audio-visual speech understanding.  相似文献   

17.
The effects of intensity on monosyllabic word recognition were studied in adults with normal hearing and mild-to-moderate sensorineural hearing loss. The stimuli were bandlimited NU#6 word lists presented in quiet and talker-spectrum-matched noise. Speech levels ranged from 64 to 99 dB SPL and S/N ratios from 28 to -4 dB. In quiet, the performance of normal-hearing subjects remained essentially constant in noise, at a fixed S/N ratio, it decreased as a linear function of speech level. Hearing-impaired subjects performed like normal-hearing subjects tested in noise when the data were corrected for the effects of audibility loss. From these and other results, it was concluded that: (1) speech intelligibility in noise decreases when speech levels exceed 69 dB SPL and the S/N ratio remains constant; (2) the effects of speech and noise level are synergistic; (3) the deterioration in intelligibility can be modeled as a relative increase in the effective masking level; (4) normal-hearing and hearing-impaired subjects are affected similarly by increased signal level when differences in speech audibility are considered; (5) the negative effects of increasing speech and noise levels on speech recognition are similar for all adult subjects, at least up to 80 years; and (6) the effective dynamic range of speech may be larger than the commonly assumed value of 30 dB.  相似文献   

18.
This study examined vowel perception by young normal-hearing (YNH) adults, in various listening conditions designed to simulate mild-to-moderate sloping sensorineural hearing loss. YNH listeners were individually age- and gender-matched to young hearing-impaired (YHI) listeners tested in a previous study [Richie et al., J. Acoust. Soc. Am. 114, 2923-2933 (2003)]. YNH listeners were tested in three conditions designed to create equal audibility with the YHI listeners; a low signal level with and without a simulated hearing loss, and a high signal level with a simulated hearing loss. Listeners discriminated changes in synthetic vowel tokens /I e epsilon alpha ae/ when Fl or F2 varied in frequency. Comparison of YNH with YHI results failed to reveal significant differences between groups in terms of performance on vowel discrimination, in conditions of similar audibility by using both noise masking to elevate the hearing thresholds of the YNH and applying frequency-specific gain to the YHI listeners. Further, analysis of learning curves suggests that while the YHI listeners completed an average of 46% more test blocks than YNH listeners, the YHI achieved a level of discrimination similar to that of the YNH within the same number of blocks. Apparently, when age and gender are closely matched between young hearing-impaired and normal-hearing adults, performance on vowel tasks may be explained by audibility alone.  相似文献   

19.
A relatively new management strategy for the treatment of voice disorders is the use of laryngeal (LB) and velopharyngeal biofeedback (VB). The main purpose of the present pilot study is to document the outcome of vocal and velopharyngeal performances after a well-defined LB and VB treatment. Four subjects were studied pretreatment (1 week before LB or VB treatment) and posttreatment (1 week after the LB or VB treatment). To measure and compare the effect of LB and VB, objective and subjective assessment techniques were used. Perceptual voice assessment included a perceptual rating of the voice using the GRBAS scale. Furthermore, the vocal quality in this population is modeled by means of the Dysphonia Severity Index. For the objective assessment of nasal resonance, the Nasometer and the Glatzel test were used. A perceptual evaluation of speech, the Gutzmann test, and the tests from Bzoch were used as subjective assessment techniques. Both patients selected for LB and VB treatment showed improvement of their performances. The resulting improvement, as measured by means of an objective approach, is in agreement with the perceived (auditory) improvement of voice and resonance. The use of LB and VB treatment in patients, especially in some subjects who are not responding to traditional voice or velopharyngeal therapy, must be encouraged.  相似文献   

20.
The relative abilities of word frequency, contextual diversity, and semantic distinctiveness to predict accuracy of spoken word recognition in noise were compared using two data sets. Word frequency is the number of times a word appears in a corpus of text. Contextual diversity is the number of different documents in which the word appears in that corpus. Semantic distinctiveness takes into account the number of different semantic contexts in which the word appears. Semantic distinctiveness and contextual diversity were both able to explain variance above and beyond that explained by word frequency, which by itself explained little unique variance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号