期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech recognition and the Articulation Index for normal and hearing-impaired listeners

C A Kamm D D Dirks T S Bell 《The Journal of the Acoustical Society of America》1985,77(1):281-288

The purpose of this experiment was to determine the applicability of the Articulation Index (AI) model for characterizing the speech recognition performance of listeners with mild-to-moderate hearing loss. Performance-intensity functions were obtained from five normal-hearing listeners and 11 hearing-impaired listeners using a closed-set nonsense syllable test for two frequency responses (uniform and high-frequency emphasis). For each listener, the fitting constant Q of the nonlinear transfer function relating AI and speech recognition was estimated. Results indicated that the function mapping AI onto performance was approximately the same for normal and hearing-impaired listeners with mild-to-moderate hearing loss and high speech recognition scores. For a hearing-impaired listener with poor speech recognition ability, the AI procedure was a poor predictor of performance. The AI procedure as presently used is inadequate for predicting performance of individuals with reduced speech recognition ability and should be used conservatively in applications predicting optimal or acceptable frequency response characteristics for hearing-aid amplification systems. 相似文献

2.

Stop-consonant recognition for normal-hearing listeners and listeners with high-frequency hearing loss. I: The contribution of selected frequency regions

J R Dubno D D Dirks D E Ellison 《The Journal of the Acoustical Society of America》1989,85(1):347-354

The purpose of this study is to specify the contribution of certain frequency regions to consonant place perception for normal-hearing listeners and listeners with high-frequency hearing loss, and to characterize the differences in stop-consonant place perception among these listeners. Stop-consonant recognition and error patterns were examined at various speech-presentation levels and under conditions of low- and high-pass filtering. Subjects included 18 normal-hearing listeners and a homogeneous group of 10 young, hearing-impaired individuals with high-frequency sensorineural hearing loss. Differential filtering effects on consonant place perception were consistent with the spectral composition of acoustic cues. Differences in consonant recognition and error patterns between normal-hearing and hearing-impaired listeners were observed when the stimulus bandwidth included regions of threshold elevation for the hearing-impaired listeners. Thus place-perception differences among listeners are, for the most part, associated with stimulus bandwidths corresponding to regions of hearing loss. 相似文献

3.

Evaluation of orthogonal polynomial compression

H Levitt A C Neuman 《The Journal of the Acoustical Society of America》1991,90(1):241-252

In orthogonal polynomial compression, the short-term speech spectrum is first approximated by a family of orthogonal polynomials. The coefficients of each polynomial, which vary over time, are then adjusted in terms of their average value and range of variation. These adjustments can be used to compress (or expand) temporal variations in the average level, slope, and various forms of curvature of the short-term speech spectrum. The analysis and reconstruction of the short-term speech spectrum using orthogonal polynomials was implemented using a digital master hearing aid. This method of compression was evaluated on eight sensorineurally hearing-impaired listeners. Speech recognition scores were obtained for a range of compression conditions and input levels with and without frequency shaping. The results showed significant advantages over conventional linear amplification when temporal variations in the average level of the short-term spectrum were compressed, a result comparable to that obtained with conventional amplitude compression. A subset of the subjects showed further improvement when temporal variations in spectrum slope were compressed, but these subjects also showed similar improvements when frequency shaping was combined with level-only compression. None of the subjects showed improved speech recognition scores when variations in quadratic curvature were compressed in addition to level and slope compression. 相似文献

4.

Intelligibility of speech in noise at high presentation levels: effects of hearing loss and frequency region

Summers V Cord MT 《The Journal of the Acoustical Society of America》2007,122(2):1130-1137

These experiments examined how high presentation levels influence speech recognition for high- and low-frequency stimuli in noise. Normally hearing (NH) and hearing-impaired (HI) listeners were tested. In Experiment 1, high- and low-frequency bandwidths yielding 70%-correct word recognition in quiet were determined at levels associated with broadband speech at 75 dB SPL. In Experiment 2, broadband and band-limited sentences (based on passbands measured in Experiment 1) were presented at this level in speech-shaped noise filtered to the same frequency bandwidths as targets. Noise levels were adjusted to produce approximately 30%-correct word recognition. Frequency bandwidths and signal-to-noise ratios supporting criterion performance in Experiment 2 were tested at 75, 87.5, and 100 dB SPL in Experiment 3. Performance tended to decrease as levels increased. For NH listeners, this "rollover" effect was greater for high-frequency and broadband materials than for low-frequency stimuli. For HI listeners, the 75- to 87.5-dB increase improved signal audibility for high-frequency stimuli and rollover was not observed. However, the 87.5- to 100-dB increase produced qualitatively similar results for both groups: scores decreased most for high-frequency stimuli and least for low-frequency materials. Predictions of speech intelligibility by quantitative methods such as the Speech Intelligibility Index may be improved if rollover effects are modeled as frequency dependent. 相似文献

5.

Benefits of amplification for speech recognition in background noise

Turner CW Henry BA 《The Journal of the Acoustical Society of America》2002,112(4):1675-1680

The purpose of the present study was to examine the benefits of providing audible speech to listeners with sensorineural hearing loss when the speech is presented in a background noise. Previous studies have shown that when listeners have a severe hearing loss in the higher frequencies, providing audible speech (in a quiet background) to these higher frequencies usually results in no improvement in speech recognition. In the present experiments, speech was presented in a background of multitalker babble to listeners with various severities of hearing loss. The signal was low-pass filtered at numerous cutoff frequencies and speech recognition was measured as additional high-frequency speech information was provided to the hearing-impaired listeners. It was found in all cases, regardless of hearing loss or frequency range, that providing audible speech resulted in an increase in recognition score. The change in recognition as the cutoff frequency was increased, along with the amount of audible speech information in each condition (articulation index), was used to calculate the "efficiency" of providing audible speech. Efficiencies were positive for all degrees of hearing loss. However, the gains in recognition were small, and the maximum score obtained by an listener was low, due to the noise background. An analysis of error patterns showed that due to the limited speech audibility in a noise background, even severely impaired listeners used additional speech audibility in the high frequencies to improve their perception of the "easier" features of speech including voicing. 相似文献

6.

Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners

Ferguson SH Kewley-Port D 《The Journal of the Acoustical Society of America》2002,112(1):259-271

Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels. 相似文献

7.

Stop-consonant recognition for normal-hearing listeners and listeners with high-frequency hearing loss. II: Articulation index predictions

J R Dubno D D Dirks A B Schaefer 《The Journal of the Acoustical Society of America》1989,85(1):355-364

Articulation index (AI) theory was used to evaluate stop-consonant recognition of normal-hearing listeners and listeners with high-frequency hearing loss. From results reported in a companion article [Dubno et al., J. Acoust. Soc. Am. 85, 347-354 (1989)], a transfer function relating the AI to stop-consonant recognition was established, and a frequency importance function was determined for the nine stop-consonant-vowel syllables used as test stimuli. The calculations included the rms and peak levels of the speech that had been measured in 1/3 octave bands; the internal noise was estimated from the thresholds for each subject. The AI model was then used to predict performance for the hearing-impaired listeners. A majority of the AI predictions for the hearing-impaired subjects fell within +/- 2 standard deviations of the normal-hearing listeners' results. However, as observed in previous data, the AI tended to overestimate performance of the hearing-impaired listeners. The accuracy of the predictions decreased with the magnitude of high-frequency hearing loss. Thus, with the exception of performance for listeners with severe high-frequency hearing loss, the results suggest that poorer speech recognition among hearing-impaired listeners results from reduced audibility within critical spectral regions of the speech stimuli. 相似文献

8.

Algorithms for separating the speech of interfering talkers: evaluations with voiced sentences, and normal-hearing and hearing-impaired listeners

R J Stubbs Q Summerfield 《The Journal of the Acoustical Society of America》1990,87(1):359-372

Two signal-processing algorithms, derived from those described by Stubbs and Summerfield [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 84, 1236-1249 (1988)], were used to separate the voiced speech of two talkers speaking simultaneously, at similar intensities, in a single channel. Both algorithms use fundamental frequency (FO) as the basis for segregation. One attenuates the interfering voice by filtering the cepstrum of the signal. The other is a hybrid algorithm that combines cepstral filtering with the technique of harmonic selection [T.W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)]. The algorithms were evaluated and compared in perceptual experiments involving listeners with normal hearing and listeners with cochlear hearing impairments. In experiment 1 the processing was used to separate voiced sentences spoken on a monotone. Both algorithms gave significant increases in intelligibility to both groups of listeners. The improvements were equivalent to an increase of 3-4 dB in the effective signal-to-noise ratio (SNR). In experiment 2 the processing was used to separate voiced sentences spoken with time-varying intonation. For normal-hearing listeners, cepstral filtering gave a significant increase in intelligibility, while the hybrid algorithm gave an increase that was on the margins of significance (p = 0.06). The improvements were equivalent to an increase of 2-3 dB in the effective SNR. For impaired listeners, no intelligibility improvements were demonstrated with intoned sentences. The decrease in performance for intoned material is attributed to limitations of the algorithms when FO is nonstationary. 相似文献

9.

Effects of acoustic modification on consonant recognition by elderly hearing-impaired subjects

S Gordon-Salant 《The Journal of the Acoustical Society of America》1987,81(4):1199-1202

In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986)], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also. 相似文献

10.

Listener sensitivity to individual talker differences in voice-onset-time

Allen JS Miller JL 《The Journal of the Acoustical Society of America》2004,115(6):3171-3183

Recent findings in the domains of word and talker recognition reveal that listeners use previous experience with an individual talker's voice to facilitate subsequent perceptual processing of that talker's speech. These findings raise the possibility that listeners are sensitive to talker-specific acoustic-phonetic properties. The present study tested this possibility directly by examining listeners' sensitivity to talker differences in the voice-onset-time (VOT) associated with a word-initial voiceless stop consonant. Listeners were trained on the speech of two talkers. Speech synthesis was used to manipulate the VOTs of these talkers so that one had short VOTs and the other had long VOTs (counterbalanced across listeners). The results of two experiments using a paired-comparison task revealed that, when presented with a short- versus long-VOT variant of a given talker's speech, listeners could select the variant consistent with their experience of that talker's speech during training. This was true when listeners were tested on the same word heard during training and when they were tested on a different word spoken by the same talker, indicating that listeners generalized talker-specific VOT information to a novel word. Such sensitivity to talker-specific acoustic-phonetic properties may subserve at least in part listeners' capacity to benefit from talker-specific experience. 相似文献

11.

Cochlear implant speech recognition with speech maskers

Stickney GS Zeng FG Litovsky R Assmann P 《The Journal of the Acoustical Society of America》2004,116(2):1081-1091

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced. 相似文献

12.

Recognition of accented English in quiet and noise by younger and older listeners

Gordon-Salant S Yeni-Komshian GH Fitzgibbons PJ 《The Journal of the Acoustical Society of America》2010,128(5):3152-3160

This study investigated the effects of age and hearing loss on perception of accented speech presented in quiet and noise. The relative importance of alterations in phonetic segments vs. temporal patterns in a carrier phrase with accented speech also was examined. English sentences recorded by a native English speaker and a native Spanish speaker, together with hybrid sentences that varied the native language of the speaker of the carrier phrase and the final target word of the sentence were presented to younger and older listeners with normal hearing and older listeners with hearing loss in quiet and noise. Effects of age and hearing loss were observed in both listening environments, but varied with speaker accent. All groups exhibited lower recognition performance for the final target word spoken by the accented speaker compared to that spoken by the native speaker, indicating that alterations in segmental cues due to accent play a prominent role in intelligibility. Effects of the carrier phrase were minimal. The findings indicate that recognition of accented speech, especially in noise, is a particularly challenging communication task for older people. 相似文献

13.

Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults.

P G Stelmachowicz A L Pittman B M Hoover D E Lewis 《The Journal of the Acoustical Society of America》2001,110(4):2183-2190

Recent studies with adults have suggested that amplification at 4 kHz and above fails to improve speech recognition and may even degrade performance when high-frequency thresholds exceed 50-60 dB HL. This study examined the extent to which high frequencies can provide useful information for fricative perception for normal-hearing and hearing-impaired children and adults. Eighty subjects (20 per group) participated. Nonsense syllables containing the phonemes /s/, /f/, and /O/, produced by a male, female, and child talker, were low-pass filtered at 2, 3, 4, 5, 6, and 9 kHz. Frequency shaping was provided for the hearing-impaired subjects only. Results revealed significant differences in recognition between the four groups of subjects. Specifically, both groups of children performed more poorly than their adult counterparts at similar bandwidths. Likewise, both hearing-impaired groups performed more poorly than their normal-hearing counterparts. In addition, significant talker effects for /s/ were observed. For the male talker, optimum performance was reached at a bandwidth of approximately 4-5 kHz, whereas optimum performance for the female and child talkers did not occur until a bandwidth of 9 kHz. 相似文献

14.

Compression and expansion of the temporal envelope: evaluation of speech intelligibility and sound quality.

R A van Buuren J M Festen T Houtgast 《The Journal of the Acoustical Society of America》1999,105(5):2903-2913

Sensorineural hearing loss is accompanied by loudness recruitment, a steeper-than-normal rise of perceived loudness with presentation level. To compensate for this abnormality, amplitude compression is often applied (e.g., in a hearing aid). Alternatively, since speech intelligibility has been modeled as the perception of fast energy fluctuations, enlarging these (by means of expansion) may improve speech intelligibility. Still, even if these signal-processing techniques prove useful in terms of speech intelligibility, practical application might be hindered by unacceptably low sound quality. Therefore, both speech intelligibility and sound quality were evaluated for syllabic compression and expansion of the temporal envelope. Speech intelligibility was evaluated with an adaptive procedure, based on short everyday sentences either in noise or with a competing speaker. Sound quality was measured by means of a rating-scale procedure, for both speech and music. In a systematic setup, both the ratio of compression or expansion and the number of independent processing bands were varied. Individual hearing thresholds were compensated for by a listener-specific filter and amplification. Both listeners with normal hearing and listeners with sensorineural hearing impairment participated as paid volunteers. The results show that, on average, both compression and expansion fail to show better speech intelligibility or sound quality than linear amplification. 相似文献

15.

Within-ear and across-ear interference in a cocktail-party listening task

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2002,112(6):2985-2995

Although many researchers have shown that listeners are able to selectively attend to a target speech signal when a masking talker is present in the same ear as the target speech or when a masking talker is present in a different ear than the target speech, little is known about selective auditory attention in tasks with a target talker in one ear and independent masking talkers in both ears at the same time. In this series of experiments, listeners were asked to respond to a target speech signal spoken by one of two competing talkers in their right (target) ear while ignoring a simultaneous masking sound in their left (unattended) ear. When the masking sound in the unattended ear was noise, listeners were able to segregate the competing talkers in the target ear nearly as well as they could with no sound in the unattended ear. When the masking sound in the unattended ear was speech, however, speech segregation in the target ear was substantially worse than with no sound in the unattended ear. When the masking sound in the unattended ear was time-reversed speech, speech segregation was degraded only when the target speech was presented at a lower level than the masking speech in the target ear. These results show that within-ear and across-ear speech segregation are closely related processes that cannot be performed simultaneously when the interfering sound in the unattended ear is qualitatively similar to speech. 相似文献

16.

Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers

Darwin CJ Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2003,114(5):2913-2922

Three experiments used the Coordinated Response Measure task to examine the roles that differences in F0 and differences in vocal-tract length have on the ability to attend to one of two simultaneous speech signals. The first experiment asked how increases in the natural F0 difference between two sentences (originally spoken by the same talker) affected listeners' ability to attend to one of the sentences. The second experiment used differences in vocal-tract length, and the third used both F0 and vocal-tract length differences. Differences in F0 greater than 2 semitones produced systematic improvements in performance. Differences in vocal-tract length produced systematic improvements in performance when the ratio of lengths was 1.08 or greater, particularly when the shorter vocal tract belonged to the target talker. Neither of these manipulations produced improvements in performance as great as those produced by a different-sex talker. Systematic changes in both F0 and vocal-tract length that simulated an incremental shift in gender produced substantially larger improvements in performance than did differences in F0 or vocal-tract length alone. In general, shifting one of two utterances spoken by a female voice towards a male voice produces a greater improvement in performance than shifting male towards female. The increase in performance varied with the intonation patterns of individual talkers, being smallest for those talkers who showed most variability in their intonation patterns between different utterances. 相似文献

17.

The role of perceived spatial separation in the unmasking of speech 总被引：12，自引：0，他引：12

Freyman RL Helfer KS McCall DD Clifton RK 《The Journal of the Acoustical Society of America》1999,106(6):3578-3588

Spatial separation of speech and noise in an anechoic space creates a release from masking that often improves speech intelligibility. However, the masking release is severely reduced in reverberant spaces. This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections. Listeners' identification of nonsense sentences spoken by a female talker was measured in the presence of either speech-spectrum noise or other sentences spoken by a second female talker. Target and interference stimuli were presented in an anechoic chamber from loudspeakers directly in front and 60 degrees to the right in single-source and precedence-effect (lead-lag) conditions. For speech-spectrum noise, the spatial separation advantage for speech recognition (8 dB) was predictable from articulation index computations based on measured release from masking for narrow-band stimuli. The spatial separation advantage was only 1 dB in the lead-lag condition, despite the fact that a large perceptual separation was produced by the precedence effect. For the female talker interference, a much larger advantage occurred, apparently because informational masking was reduced by differences in perceived locations of target and interference. 相似文献

18.

Intelligibility of average talkers in typical listening environments 总被引：1，自引：0，他引：1

R M Cox G C Alexander C Gilmore 《The Journal of the Acoustical Society of America》1987,81(5):1598-1608

Intelligibility of conversationally produced speech for normal hearing listeners was studied for three male and three female talkers. Four typical listening environments were used. These simulated a quiet living room, a classroom, and social events in two settings with different reverberation characteristics. For each talker, overall intelligibility and intelligibility for vowels, consonant voicing, consonant continuance, and consonant place were quantified using the speech pattern contrast (SPAC) test. Results indicated that significant intelligibility differences are observed among normal talkers even in listening environments that permit essentially full intelligibility for everyday conversations. On the whole, talkers maintained their relative intelligibility across the four environments, although there was one exception which suggested that some voices may be particularly susceptible to degradation due to reverberation. Consonant place was the most poorly perceived feature, followed by continuance, voicing, and vowel intelligibility. However, there were numerous significant interactions between talkers and speech features, indicating that a talker of average overall intelligibility may produce certain speech features with intelligibility that is considerably higher or lower than average. Neither long-term rms speech spectrum nor articulation rate was found to be an adequate single criterion for selecting a talker of average intelligibility. Ultimately, an average talker was chosen on the basis of four speech contrasts: initial consonant place, and final consonant place, voicing, and continuance. 相似文献

19.

Vongpaisal T Trehub SE Glenn Schellenberg E van Lieshout P 《The Journal of the Acoustical Society of America》2012,131(1):501-508

Temporal information provided by cochlear implants enables successful speech perception in quiet, but limited spectral information precludes comparable success in voice perception. Talker identification and speech decoding by young hearing children (5-7 yr), older hearing children (10-12 yr), and hearing adults were examined by means of vocoder simulations of cochlear implant processing. In Experiment 1, listeners heard vocoder simulations of sentences from a man, woman, and girl and were required to identify the talker from a closed set. Younger children identified talkers more poorly than older listeners, but all age groups showed similar benefit from increased spectral information. In Experiment 2, children and adults provided verbatim repetition of vocoded sentences from the same talkers. The youngest children had more difficulty than older listeners, but all age groups showed comparable benefit from increasing spectral resolution. At comparable levels of spectral degradation, performance on the open-set task of speech decoding was considerably more accurate than on the closed-set task of talker identification. Hearing children's ability to identify talkers and decode speech from spectrally degraded material sheds light on the difficulty of these domains for child implant users. 相似文献

20.

Effects of fundamental frequency and vocal-tract length cues on sentence segregation by listeners with hearing loss

Mackersie CL Dewey J Guthrie LA 《The Journal of the Acoustical Society of America》2011,130(2):1006-1019

相似文献