首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Twelve normal-hearing subjects rated the intelligibility of 35-s, hearing-aid-processed continuous discourse (CD) passages. Three talkers (two male, one female), four hearing aids, and two signal-to-babble (S/B) ratios were used in a completely crossed design. Research questions concerned: (1) ability of listeners to rate intelligibility, (2) sensitivity of hearing aid rankings were based on intelligibility ratings for three CD passages per instrument, and (3) dependence of hearing aid rankings on (a) S/B ratio, and (b) talker characteristics. Results were: (1) listeners were able to rate intelligibility, (2) rankings based on intelligibility ratings of three CD passages per hearing aid were capable of identifying two superior instruments within a group of four hearing aids that were similar in frequency/gain function, (3) listening in a more difficult S/B ratio substantially decreased the sensitivity of the hearing aid rankings for the female talker but had only minor effects on the rankings for the male talkers, and (4) hearing aid intelligibility rankings were found to be different for different talkers. Applications to hearing aid selection are discussed.  相似文献   

2.
Evaluating the articulation index for auditory-visual input   总被引:4,自引:0,他引:4  
An investigation of the auditory-visual (AV) articulation index (AI) correction procedure outlined in the ANSI standard [ANSI S3.5-1969 (R1986)] was made by evaluating auditory (A), visual (V), and auditory-visual sentence identification for both wideband speech degraded by additive noise and a variety of bandpass-filtered speech conditions presented in quiet and in noise. When the data for each of the different listening conditions were averaged across talkers and subjects, the procedure outlined in the standard was fairly well supported, although deviations from the predicted AV score were noted for individual subjects as well as individual talkers. For filtered speech signals with AIA less than 0.25, there was a tendency for the standard to underpredict AV scores. Conversely, for signals with AIA greater than 0.25, the standard consistently overpredicted AV scores. Additionally, synergistic effects, where the AIA obtained from the combination of different bandpass-filtered conditions was greater than the sum of the individual AIA's, were observed for all nonadjacent filter-band combinations (e.g., the addition of a low-pass band with a 630-Hz cutoff and a high-pass band with a 3150-Hz cutoff). These latter deviations from the standard violate the basic assumption of additivity stated by Articulation Theory, but are consistent with earlier reports by Pollack [I. Pollack, J. Acoust. Soc. Am. 20, 259-266 (1948)], Licklider [J. C. R. Licklider, Psychology: A Study of a Science, Vol. 1, edited by S. Koch (McGraw-Hill, New York, 1959), pp. 41-144], and Kryter [K. D. Kryter, J. Acoust. Soc. Am. 32, 547-556 (1960)].  相似文献   

3.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

4.
Intelligibility of average talkers in typical listening environments   总被引:1,自引:0,他引:1  
Intelligibility of conversationally produced speech for normal hearing listeners was studied for three male and three female talkers. Four typical listening environments were used. These simulated a quiet living room, a classroom, and social events in two settings with different reverberation characteristics. For each talker, overall intelligibility and intelligibility for vowels, consonant voicing, consonant continuance, and consonant place were quantified using the speech pattern contrast (SPAC) test. Results indicated that significant intelligibility differences are observed among normal talkers even in listening environments that permit essentially full intelligibility for everyday conversations. On the whole, talkers maintained their relative intelligibility across the four environments, although there was one exception which suggested that some voices may be particularly susceptible to degradation due to reverberation. Consonant place was the most poorly perceived feature, followed by continuance, voicing, and vowel intelligibility. However, there were numerous significant interactions between talkers and speech features, indicating that a talker of average overall intelligibility may produce certain speech features with intelligibility that is considerably higher or lower than average. Neither long-term rms speech spectrum nor articulation rate was found to be an adequate single criterion for selecting a talker of average intelligibility. Ultimately, an average talker was chosen on the basis of four speech contrasts: initial consonant place, and final consonant place, voicing, and continuance.  相似文献   

5.
People vary in the intelligibility of their speech. This study investigated whether across-talker intelligibility differences observed in normally-hearing listeners are also found in cochlear implant (CI) users. Speech perception for male, female, and child pairs of talkers differing in intelligibility was assessed with actual and simulated CI processing and in normal hearing. While overall speech recognition was, as expected, poorer for CI users, differences in intelligibility across talkers were consistent across all listener groups. This suggests that the primary determinants of intelligibility differences are preserved in the CI-processed signal, though no single critical acoustic property could be identified.  相似文献   

6.
The word recognition ability of 4 normal-hearing and 13 cochlearly hearing-impaired listeners was evaluated. Filtered and unfiltered speech in quiet and in noise were presented monaurally through headphones. The noise varied over listening situations with regard to spectrum, level, and temporal envelope. Articulation index theory was applied to predict the results. Two calculation methods were used, both based on the ANSI S3.5-1969 20-band method [S3.5-1969 (American National Standards Institute, New York)]. Method I was almost identical to the ANSI method. Method II included a level- and hearing-loss-dependent calculation of masking of stationary and on-off gated noise signals and of self-masking of speech. Method II provided the best prediction capability, and it is concluded that speech intelligibility of cochlearly hearing-impaired listeners may also, to a first approximation, be predicted from articulation index theory.  相似文献   

7.
8.
This study complements earlier experiments on the perception of the [m]-[n] distinction in CV syllables [B. H. Repp, J. Acoust. Soc. Am. 79, 1987-1999 (1986); B. H. Repp, J. Acoust. Soc. Am. 82, 1525-1538 (1987)]. Six talkers produced VC syllables consisting of [m] or [n] preceded by [i, a, u]. In listening experiments, these syllables were truncated from the beginning and/or from the end, or waveform portions surrounding the point of closure were replaced with noise, so as to map out the distribution of the place of articulation information for consonant perception. These manipulations revealed that the vocalic formant transitions alone conveyed about as much place of articulation information as did the nasal murmur alone, and both signal portions were about as informative in VC as in CV syllables. Nevertheless, full VC syllables were less accurately identified than full CV syllables, especially in female speech. The reason for this was hypothesized to be the relative absence of a salient spectral change between the vowel and the murmur in VC syllables. This hypothesis was supported by the relative ineffectiveness of two additional manipulations meant to disrupt the perception of relational spectral information (channel separation or temporal separation of vowel and murmur) and by subjects' poor identification scores for brief excerpts including the point of maximal spectral change. While, in CV syllables, the abrupt spectral change from the murmur to the vowel provides important additional place of articulation information, for VC syllables it seems as if the format transitions in the vowel and the murmur spectrum functioned as independent cues.  相似文献   

9.
This study investigated acoustic-phonetic correlates of intelligibility for adult and child talkers, and whether the relative intelligibility of different talkers was dependent on listener characteristics. In experiment 1, word intelligibility was measured for 45 talkers (18 women, 15 men, 6 boys, 6 girls) from a homogeneous accent group. The material consisted of 124 words familiar to 7-year-olds that adequately covered all frequent consonant confusions; stimuli were presented to 135 adult and child listeners in low-level background noise. Seven-to-eight-year-old listeners made significantly more errors than 12-year-olds or adults, but the relative intelligibility of individual talkers was highly consistent across groups. In experiment 2, listener ratings on a number of voice dimensions were obtained for the adults talkers identified in experiment 1 as having the highest and lowest intelligibility. Intelligibility was significantly correlated with subjective dimensions reflecting articulation, voice dynamics, and general quality. Finally, in experiment 3, measures of fundamental frequency, long-term average spectrum, word duration, consonant-vowel intensity ratio, and vowel space size were obtained for all talkers. Overall, word intelligibility was significantly correlated with the total energy in the 1- to 3-kHz region and word duration; these measures predicted 61% of the variability in intelligibility. The fact that the relative intelligibility of individual talkers was remarkably consistent across listener age groups suggests that the acoustic-phonetic characteristics of a talker's utterance are the primary factor in determining talker intelligibility. Although some acoustic-phonetic correlates of intelligibility were identified, variability in the profiles of the "best" talkers suggests that high intelligibility can be achieved through a combination of different acoustic-phonetic characteristics.  相似文献   

10.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

11.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

12.
The relative importance of different parts of the auditory spectrum to recognition of the Diagnostic Rhyme Test (DRT) and its six speech feature subtests was determined. Three normal hearing subjects were tested twice in each of 70 experimental conditions. The analytical procedures of French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)] were applied to the data to derive frequency importance functions for each of the DRT subtests and the test as a whole over the frequency range 178-8912 Hz. For the DRT as a whole, the low frequencies were found to be more important than is the case for nonsense syllables. Importance functions for the feature subtests also differed from those for nonsense syllables and from each other as well. These results suggest that test materials loaded with different proportions of particular phonemes have different frequency importance functions. Comparison of the results with those from other studies suggests that importance functions depend to a degree on the available response options as well.  相似文献   

13.
Two signal-processing algorithms, derived from those described by Stubbs and Summerfield [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 84, 1236-1249 (1988)], were used to separate the voiced speech of two talkers speaking simultaneously, at similar intensities, in a single channel. Both algorithms use fundamental frequency (FO) as the basis for segregation. One attenuates the interfering voice by filtering the cepstrum of the signal. The other is a hybrid algorithm that combines cepstral filtering with the technique of harmonic selection [T.W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)]. The algorithms were evaluated and compared in perceptual experiments involving listeners with normal hearing and listeners with cochlear hearing impairments. In experiment 1 the processing was used to separate voiced sentences spoken on a monotone. Both algorithms gave significant increases in intelligibility to both groups of listeners. The improvements were equivalent to an increase of 3-4 dB in the effective signal-to-noise ratio (SNR). In experiment 2 the processing was used to separate voiced sentences spoken with time-varying intonation. For normal-hearing listeners, cepstral filtering gave a significant increase in intelligibility, while the hybrid algorithm gave an increase that was on the margins of significance (p = 0.06). The improvements were equivalent to an increase of 2-3 dB in the effective SNR. For impaired listeners, no intelligibility improvements were demonstrated with intoned sentences. The decrease in performance for intoned material is attributed to limitations of the algorithms when FO is nonstationary.  相似文献   

14.
The effects of six-channel compression and expansion amplification on the intelligibility of nonsense syllables embedded in speech spectrum noise were examined for four hearing-impaired subjects. For one condition (linear) the stimulus was given six-channel amplification with frequency shaping to suit the subject's hearing loss. The other condition (nonlinear) was the same except that low level inputs, to any given channel, received expansion amplification and high level inputs received compression. For each condition, each subject received the nonsense syllables at three different input levels, representing low, average, and high intensity speech. The results of this study, like those of most other studies of multichannel compression, are mainly negative. Nonlinear processing (mainly expansion) of low intensity speech resulted in a significant degradation of speech intelligibility for two subjects and in no improvement for the others. One subject showed a significant improvement in intelligibility for the nonlinearly processed average intensity speech and another subject showed significant improvement for the high intensity input (mainly compression). Clearly, nonlinear processing is beneficial for some subjects, under some listening conditions, but further research is needed to identify the relevent characteristics of such subjects. An acoustic analysis of selected items revealed that the failure of expansion to improve intelligibility was primarily due to the very low intensity consonants /e/ and /k/, in final position, being presented at an even lower intensity in the expansion condition than in the linear condition. Expansion may be worth further investigation with different parameters. Several other problems caused by the multichannel processing were also revealed. These included alteration of spectral shapes and band interaction effects. Ways of overcoming these problems, and of capitalizing on the likely advantages of multichannel amplification, are currently being investigated.  相似文献   

15.
The extension to the speech intelligibility index (SII; ANSI S3.5-1997 (1997)) proposed by Rhebergen and Versfeld [Rhebergen, K.S., and Versfeld, N.J. (2005). J. Acoust. Soc. Am. 117(4), 2181-2192] is able to predict for normal-hearing listeners the speech intelligibility in both stationary and fluctuating noise maskers with reasonable accuracy. The extended SII model was validated with speech reception threshold (SRT) data from the literature. However, further validation is required and the present paper describes SRT experiments with nonstationary noise conditions that are critical to the extended model. From these data, it can be concluded that the extended SII model is able to predict the SRTs for the majority of conditions, but that predictions are better when the extended SII model includes a function to account for forward masking.  相似文献   

16.
The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.  相似文献   

17.
In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners.  相似文献   

18.
Frequency response characteristics were selected for 14 hearing-impaired ears, according to six procedures. Three procedures were based on MCL measurements with speech bands of three bandwidths (1/3 octave, 1 octave, and 1 2/3 octaves). The other procedures were based on hearing thresholds, pure-tone MCLs, and pure-tone LDLs. The procedures were evaluated by speech discrimination testing, using nonsense syllables in noise, and by paired comparison judgments of the intelligibility and pleasantness of running speech. Speech discrimination testing showed significant differences between pairs of responses for only seven test ears. Nasals and glides were most affected by frequency response variations. Both intelligibility and pleasantness judgments showed significant differences for all test ears. Intelligibility in noise was less affected by frequency response differences than was intelligibility in quiet or pleasantness in quiet or in noise. For some ears, the ranking of responses depended on whether intelligibility or pleasantness was being judged and on whether the speech was in quiet or in noise. Overall, the three speech band MCL procedures were far superior to the others. Thus the studies strongly support the frequency response selection rationale of amplifying all frequency bands of speech to MCL. They also highlight some of the complications involved in achieving this aim.  相似文献   

19.
These experiments are concerned with the intelligibility of target speech in the presence of a background talker. Using a noise vocoder, Stone and Moore [J. Acoust. Soc. Am. 114, 1023-1034 (2003)] showed that single-channel fast-acting compression degraded intelligibility, but slow compression did not. Stone and Moore [J. Acoust. Soc. Am. 116, 2311-2323 (2004)] showed that intelligibility was lower when fast single-channel compression was applied to the target and background after mixing rather than before, and suggested that this was partly due to compression after mixing introducing "comodulation" between the target and background talkers. Experiment 1 here showed a similar effect for multi-channel compression. In experiment 2, intelligibility was measured as a function of the speed of multi-channel compression applied after mixing. For both eight- and 12-channel vocoders with one compressor per channel, intelligibility decreased as compression speed increased. For the eight-channel vocoder, a compressor that only affected modulation depth for rates below 2 Hz still reduced intelligibility. Experiment 3 used 12- or 18-channel vocoders. There were between 1 and 12 compression channels, and four speeds of compression. Intelligibility decreased as the number and speed of compression channels increased. The results are interpreted using several measures of the effects of compression, especially "across-source modulation correlation."  相似文献   

20.
Talkers adjust their vocal effort to communicate at different distances, aiming to compensate for the sound propagation losses. The present paper studies the influence of four acoustically different rooms on the speech produced by 13 male talkers addressing a listener at four distances. Talkers raised their vocal intensity by between 1.3 and 2.2 dB per double distance to the listener and lowered it as a linear function of the quantity "room gain" at a rate of -3.6 dB/dB. There were also significant variations in the mean fundamental frequency, both across distance (3.8 Hz per double distance) and among environments (4.3 Hz), and in the long-term standard deviation of the fundamental frequency among rooms (4 Hz). In the most uncomfortable rooms to speak in, talkers prolonged the voiced segments of the speech they produced, either as a side-effect of increased vocal intensity or in order to compensate for a decrease in speech intelligibility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号