首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

2.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

3.
For all but the most profoundly hearing-impaired (HI) individuals, auditory-visual (AV) speech has been shown consistently to afford more accurate recognition than auditory (A) or visual (V) speech. However, the amount of AV benefit achieved (i.e., the superiority of AV performance in relation to unimodal performance) can differ widely across HI individuals. To begin to explain these individual differences, several factors need to be considered. The most obvious of these are deficient A and V speech recognition skills. However, large differences in individuals' AV recognition scores persist even when unimodal skill levels are taken into account. These remaining differences might be attributable to differing efficiency in the operation of a perceptual process that integrates A and V speech information. There is at present no accepted measure of the putative integration process. In this study, several possible integration measures are compared using both congruent and discrepant AV nonsense syllable and sentence recognition tasks. Correlations were tested among the integration measures, and between each integration measure and independent measures of AV benefit for nonsense syllables and sentences in noise. Integration measures derived from tests using nonsense syllables were significantly correlated with each other; on these measures, HI subjects show generally high levels of integration ability. Integration measures derived from sentence recognition tests were also significantly correlated with each other, but were not significantly correlated with the measures derived from nonsense syllable tests. Similarly, the measures of AV benefit based on nonsense syllable recognition tests were found not to be significantly correlated with the benefit measures based on tests involving sentence materials. Finally, there were significant correlations between AV integration and benefit measures derived from the same class of speech materials, but nonsignificant correlations between integration and benefit measures derived from different classes of materials. These results suggest that the perceptual processes underlying AV benefit and the integration of A and V speech information might not operate in the same way on nonsense syllable and sentence input.  相似文献   

4.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

5.
The effects of six-channel compression and expansion amplification on the intelligibility of nonsense syllables embedded in speech spectrum noise were examined for four hearing-impaired subjects. For one condition (linear) the stimulus was given six-channel amplification with frequency shaping to suit the subject's hearing loss. The other condition (nonlinear) was the same except that low level inputs, to any given channel, received expansion amplification and high level inputs received compression. For each condition, each subject received the nonsense syllables at three different input levels, representing low, average, and high intensity speech. The results of this study, like those of most other studies of multichannel compression, are mainly negative. Nonlinear processing (mainly expansion) of low intensity speech resulted in a significant degradation of speech intelligibility for two subjects and in no improvement for the others. One subject showed a significant improvement in intelligibility for the nonlinearly processed average intensity speech and another subject showed significant improvement for the high intensity input (mainly compression). Clearly, nonlinear processing is beneficial for some subjects, under some listening conditions, but further research is needed to identify the relevent characteristics of such subjects. An acoustic analysis of selected items revealed that the failure of expansion to improve intelligibility was primarily due to the very low intensity consonants /e/ and /k/, in final position, being presented at an even lower intensity in the expansion condition than in the linear condition. Expansion may be worth further investigation with different parameters. Several other problems caused by the multichannel processing were also revealed. These included alteration of spectral shapes and band interaction effects. Ways of overcoming these problems, and of capitalizing on the likely advantages of multichannel amplification, are currently being investigated.  相似文献   

6.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

7.
Mathematical treatment of context effects in phoneme and word recognition   总被引:2,自引:0,他引:2  
Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.  相似文献   

8.
Regions in the cochlea with very few functioning inner hair cells and/or neurons are called "dead regions" (DRs). Previously, we measured the recognition of highpass-filtered nonsense syllables as a function of filter cutoff frequency for hearing-impaired people with and without low-frequency (apical) DRs [J. Acoust. Soc. Am. 122, 542-553 (2007)]. DRs were diagnosed using the TEN(HL) test, and psychophysical tuning curves were used to define the edge frequency (fe) more precisely. Stimuli were amplified differently for each ear, using the "Cambridge formula." The present study was similar, but the speech was presented in speech-shaped noise at a signal-to-noise ratio of 3 dB. For subjects with low-frequency hearing loss but without DRs, scores were high (65-80%) for low cutoff frequencies and worsened with increasing cutoff frequency above about 430 Hz. For subjects with low-frequency DRs, performance was poor (20-40%) for the lowest cutoff frequency, improved with increasing cutoff frequency up to about 0.56fe, and then worsened. As for speech in quiet, these results indicate that people with low-frequency DRs are able to make effective use of frequency components that fall in the range 0.56fe to fe, but that frequency components below 0.56fe have deleterious effects.  相似文献   

9.
Frequency resolution was evaluated for two normal-hearing and seven hearing-impaired subjects with moderate, flat sensorineural hearing loss by measuring percent correct detection of a 2000-Hz tone as the width of a notch in band-reject noise increased. The level of the tone was fixed for each subject at a criterion performance level in broadband noise. Discrimination of synthetic speech syllables that differed in spectral content in the 2000-Hz region was evaluated as a function of the notch width in the same band-reject noise. Recognition of natural speech consonant/vowel syllables in quiet was also tested; results were analyzed for percent correct performance and relative information transmitted for voicing and place features. In the hearing-impaired subjects, frequency resolution at 2000 Hz was significantly correlated with the discrimination of synthetic speech information in the 2000-Hz region and was not related to the recognition of natural speech nonsense syllables unless (a) the speech stimuli contained the vowel /i/ rather than /a/, and (b) the score reflected information transmitted for place of articulation rather than percent correct.  相似文献   

10.
11.
The role of different modulation frequencies in the speech envelope were studied by means of the manipulation of vowel-consonant-vowel (VCV) syllables. The envelope of the signal was extracted from the speech and the fine-structure was replaced by speech-shaped noise. The temporal envelopes in every critical band of the speech signal were notch filtered in order to assess the relative importance of different modulation frequency regions between 0 and 20 Hz. For this purpose notch filters around three center frequencies (8, 12, and 16 Hz) with three different notch widths (4-, 8-, and 12-Hz wide) were used. These stimuli were used in a consonant-recognition task in which ten normal-hearing subjects participated, and their results were analyzed in terms of recognition scores. More qualitative information was obtained with a multidimensional scaling method (INDSCAL) and sequential information analysis (SINFA). Consonant recognition is very robust for the removal of certain modulation frequency areas. Only when a wide notch around 8 Hz is applied does the speech signal become heavily degraded. As expected, the voicing information is lost, while there are different effects on plosiveness and nasality. Even the smallest filtering has a substantial effect on the transfer of the plosiveness feature, while on the other hand, filtering out only the low-modulation frequencies has a substantial effect on the transfer of nasality cues.  相似文献   

12.
The speech perception of two multiple-channel cochlear implant patients was compared with that of three normally hearing listeners using an acoustic model of the implant for 22 different speech tests. The tests used included a minimal auditory capabilities battery, both closed-set and open-set word and sentence tests, speech tracking and a 12-consonant confusion study using nonsense syllables. The acoustic model represented electrical current pulses by bursts of noise and the effects of different electrodes were represented by using bandpass filters with different center frequencies. All subjects used a speech processor that coded the fundamental voicing frequency of speech as a pulse rate and the second formant frequency of speech as the electrode position in the cochlea, or the center frequency of the bandpass filter. Very good agreement was found for the two groups of subjects, indicating that the acoustic model is a useful tool for the development and evaluation of alternative cochlear implant speech processing strategies.  相似文献   

13.
Frequency response characteristics were selected for 14 hearing-impaired ears, according to six procedures. Three procedures were based on MCL measurements with speech bands of three bandwidths (1/3 octave, 1 octave, and 1 2/3 octaves). The other procedures were based on hearing thresholds, pure-tone MCLs, and pure-tone LDLs. The procedures were evaluated by speech discrimination testing, using nonsense syllables in noise, and by paired comparison judgments of the intelligibility and pleasantness of running speech. Speech discrimination testing showed significant differences between pairs of responses for only seven test ears. Nasals and glides were most affected by frequency response variations. Both intelligibility and pleasantness judgments showed significant differences for all test ears. Intelligibility in noise was less affected by frequency response differences than was intelligibility in quiet or pleasantness in quiet or in noise. For some ears, the ranking of responses depended on whether intelligibility or pleasantness was being judged and on whether the speech was in quiet or in noise. Overall, the three speech band MCL procedures were far superior to the others. Thus the studies strongly support the frequency response selection rationale of amplifying all frequency bands of speech to MCL. They also highlight some of the complications involved in achieving this aim.  相似文献   

14.
Regions in the cochlea with no (or very few) functioning inner hair cells and/or neurons are called "dead regions" (DRs). The recognition of high-pass filtered nonsense syllables was measured as a function of filter cutoff frequency for hearing-impaired people with and without low-frequency (apical) cochlear DRs. The diagnosis of any DR was made using the TEN(HL) test, and psychophysical tuning curves were used to define the edge frequency (f(e)) more precisely. Stimuli were amplified differently for each ear, using the "Cambridge formula." For subjects with low-frequency hearing loss but without DRs, scores were high (about 78%) for low cutoff frequencies, remained approximately constant for cutoff frequencies up to 862 Hz, and then worsened with increasing cutoff frequency. For subjects with low-frequency DRs, performance was typically poor for the lowest cutoff frequency (100 Hz), improved as the cutoff frequency was increased to about 0.57f(e), and worsened with further increases. These results indicate that people with low-frequency DRs are able to make effective use of frequency components that fall in the range 0.57f(e) to f(e), but that frequency components below 0.57f(e) have deleterious effects. The results have implications for the fitting of hearing aids to people with low-frequency DRs.  相似文献   

15.
The psychophysical method of magnitude production was used to obtain suprathreshold vibratory sensation magnitude functions from a group of ten young adult subjects. The test frequency was 250 Hz, and the body sites tested were the anterior midline section of the dorsum of the tongue, the thenar eminence of the right hand, and the distal pad of the middle finger of the right hand. Results showed that the mechanoreceptive mechanisms located within these three body locations can produce suprathreshold magnitude functions that are compatible with each other as well as with those described in the literature.  相似文献   

16.
Fifty-four of the better cochlear-implant patients from Europe and the United States were tested on two consonant recognition tests using nonsense syllables. One was produced in an accent appropriate for their own language by a male and a female talker. Recorded tokens of /ibi, idi, igi, ipi, iti, iki, ifi, ivi, ifi, isi, izi, imi, ini/ were presented. With the French syllables, six patients with the Chorimac device averaged 18% correct (6%-29%). With the German syllables, nine patients with the 3M/Vienna device averaged 34% correct (17%-44%), ten patients with the Nucleus device (tested in Hannover) averaged 31% correct (19%-42%), and ten patients with the Duren/Cologne device averaged 27% correct (10%-56%). With the English syllables, ten patients with the Nucleus device (tested in the United States) averaged 42% correct (29%-62%), and nine patients with the Symbion device averaged 46% correct (31%-69%). An information-transmission analysis and sequential information-transfer analysis of the confusions suggested that different implants provided differing amounts of feature information. The place of articulation feature was typically the most difficult to code for all implants. In the second test a male and a female talker recorded the stimuli /ibi, idi, igi, imi, ini, ifi, isi, izi/ in a single manner that was appropriate for all three languages. Six patients with the Chorimac device averaged 27% (13%-48%), ten patients with the Duren/Cologne implant averaged 29% (15%-75%), ten patients with the Nucleus device (tested in Hannover) averaged 40% (25%-58%), ten patients with the Nucleus device (tested in the United States) averaged 49% (40%-60%), nine patients with the Symbion device averaged 61% (40%-75%), and nine patients with the 3M/Vienna device averaged 41% (29%-52%) correct.  相似文献   

17.
In contrast to the availability of consonant confusion studies with adults, to date, no investigators have compared children's consonant confusion patterns in noise to those of adults in a single study. To examine whether children's error patterns are similar to those of adults, three groups of children (24 each in 4-5, 6-7, and 8-9 yrs. old) and 24 adult native speakers of American English (AE) performed a recognition task for 15 AE consonants in /ɑ/-consonant-/ɑ/ nonsense syllables presented in a background of speech-shaped noise. Three signal-to-noise ratios (SNR: 0, +5, and +10 dB) were used. Although the performance improved as a function of age, the overall consonant recognition accuracy as a function of SNR improved at a similar rate for all groups. Detailed analyses using phonetic features (manner, place, and voicing) revealed that stop consonants were the most problematic for all groups. In addition, for the younger children, front consonants presented in the 0 dB SNR condition were more error prone than others. These results suggested that children's use of phonetic cues do not develop at the same rate for all phonetic features.  相似文献   

18.
The dichotic listening performance of 40 listeners was assessed for consonant-vowel (CV) nonsense syllables with two procedures. One was a conventional two-ear monitoring task in which listeners attended to both ears and provided two responses for each pair of syllables. The ear advantage was described by % RE-% LE. The second was target monitoring, a yes/no task in which listeners attended to only one ear and listened for the presence of a target syllable. That procedure provided both hit and false alarm rates for each ear, and the ear advantage was described by P(C)maxRE-P(C)maxLE, which is insensitive to decision variables. Although both procedures yielded mean right-ear advantages (REA), the mean REA of +7.5% with two-ear monitoring was significantly different from the mean REA of +2.6% with target monitoring. In addition, although 62% of the listeners had a significant REA with the conventional procedure, only 40% had a significant REA with target monitoring. Decision variables, which are not controlled with conventional dichotic testing methods, may contribute to the ear advantage as it is described frequently in the literature.  相似文献   

19.
通用实时语言识别系统——RTSRS(01)   总被引:2,自引:0,他引:2       下载免费PDF全文
俞铁城 《物理学报》1978,27(5):508-515
本文描述一个通用实时语言识别系统——RTSRS(01)。在以前工作的基础上,每条口呼命令的参数在时间域上规正,采用二值频谱,大大压缩了参考音的参数存贮量,同时应用新的求差距的办法,使得识别所需的时间大为缩短,以致字表为200时能实时识别。专人的识别结果为:口呼数字,99.7%;20句话(每句7字),99.7%;四字成语100个,99.5%;四字成语150个,99.3%;四字成语200个,98.8%;四字成语400个,99.7%。非正式的实验表明,对于不同音节数的字表,乃至口呼英语数字或BASIC语句名字等,都有高的正确识别率。 关键词:  相似文献   

20.
The purpose of this experiment was to determine the applicability of the Articulation Index (AI) model for characterizing the speech recognition performance of listeners with mild-to-moderate hearing loss. Performance-intensity functions were obtained from five normal-hearing listeners and 11 hearing-impaired listeners using a closed-set nonsense syllable test for two frequency responses (uniform and high-frequency emphasis). For each listener, the fitting constant Q of the nonlinear transfer function relating AI and speech recognition was estimated. Results indicated that the function mapping AI onto performance was approximately the same for normal and hearing-impaired listeners with mild-to-moderate hearing loss and high speech recognition scores. For a hearing-impaired listener with poor speech recognition ability, the AI procedure was a poor predictor of performance. The AI procedure as presently used is inadequate for predicting performance of individuals with reduced speech recognition ability and should be used conservatively in applications predicting optimal or acceptable frequency response characteristics for hearing-aid amplification systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号