期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Companding to improve cochlear-implant speech recognition in speech-shaped noise

Bhattacharya A Zeng FG 《The Journal of the Acoustical Society of America》2007,122(2):1079-1089

Nonlinear sensory and neural processing mechanisms have been exploited to enhance spectral contrast for improvement of speech understanding in noise. The "companding" algorithm employs both two-tone suppression and adaptive gain mechanisms to achieve spectral enhancement. This study implemented a 50-channel companding strategy and evaluated its efficiency as a front-end noise suppression technique in cochlear implants. The key parameters were identified and evaluated to optimize the companding performance. Both normal-hearing (NH) listeners and cochlear-implant (CI) users performed phoneme and sentence recognition tests in quiet and in steady-state speech-shaped noise. Data from the NH listeners showed that for noise conditions, the implemented strategy improved vowel perception but not consonant and sentence perception. However, the CI users showed significant improvements in both phoneme and sentence perception in noise. Maximum average improvement for vowel recognition was 21.3 percentage points (p<0.05) at 0 dB signal-to-noise ratio (SNR), followed by 17.7 percentage points (p<0.05) at 5 dB SNR for sentence recognition and 12.1 percentage points (p<0.05) at 5 dB SNR for consonant recognition. While the observed results could be attributed to the enhanced spectral contrast, it is likely that the corresponding temporal changes caused by companding also played a significant role and should be addressed by future studies. 相似文献

2.

Spectral peak resolution and speech recognition in quiet: normal hearing, hearing impaired, and cochlear implant listeners

Henry BA Turner CW Behrens A 《The Journal of the Acoustical Society of America》2005,118(2):1111-1121

Spectral peak resolution was investigated in normal hearing (NH), hearing impaired (HI), and cochlear implant (CI) listeners. The task involved discriminating between two rippled noise stimuli in which the frequency positions of the log-spaced peaks and valleys were interchanged. The ripple spacing was varied adaptively from 0.13 to 11.31 ripples/octave, and the minimum ripple spacing at which a reversal in peak and trough positions could be detected was determined as the spectral peak resolution threshold for each listener. Spectral peak resolution was best, on average, in NH listeners, poorest in CI listeners, and intermediate for HI listeners. There was a significant relationship between spectral peak resolution and both vowel and consonant recognition in quiet across the three listener groups. The results indicate that the degree of spectral peak resolution required for accurate vowel and consonant recognition in quiet backgrounds is around 4 ripples/octave, and that spectral peak resolution poorer than around 1-2 ripples/octave may result in highly degraded speech recognition. These results suggest that efforts to improve spectral peak resolution for HI and CI users may lead to improved speech recognition. 相似文献

3.

Talker-identification training using simulations of binaurally combined electric and acoustic hearing: generalization to speech and emotion recognition

Krull V Luo X Iler Kirk K 《The Journal of the Acoustical Society of America》2012,131(4):3069-3078

Understanding speech in background noise, talker identification, and vocal emotion recognition are challenging for cochlear implant (CI) users due to poor spectral resolution and limited pitch cues with the CI. Recent studies have shown that bimodal CI users, that is, those CI users who wear a hearing aid (HA) in their non-implanted ear, receive benefit for understanding speech both in quiet and in noise. This study compared the efficacy of talker-identification training in two groups of young normal-hearing adults, listening to either acoustic simulations of unilateral CI or bimodal (CI+HA) hearing. Training resulted in improved identification of talkers for both groups with better overall performance for simulated bimodal hearing. Generalization of learning to sentence and emotion recognition also was assessed in both subject groups. Sentence recognition in quiet and in noise improved for both groups, no matter if the talkers had been heard during training or not. Generalization to improvements in emotion recognition for two unfamiliar talkers also was noted for both groups with the simulated bimodal-hearing group showing better overall emotion-recognition performance. Improvements in sentence recognition were retained a month after training in both groups. These results have potential implications for aural rehabilitation of conventional and bimodal CI users. 相似文献

4.

Spectral and temporal cues for phoneme recognition in noise

Xu L Zheng Y 《The Journal of the Acoustical Society of America》2007,122(3):1758

Cochlear implant users receive limited spectral and temporal information. Their speech recognition deteriorates dramatically in noise. The aim of the present study was to determine the relative contributions of spectral and temporal cues to speech recognition in noise. Spectral information was manipulated by varying the number of channels from 2 to 32 in a noise-excited vocoder. Temporal information was manipulated by varying the low-pass cutoff frequency of the envelope extractor from 1 to 512 Hz. Ten normal-hearing, native speakers of English participated in tests of phoneme recognition using vocoder processed consonants and vowels under three conditions (quiet, and +6 and 0 dB signal-to-noise ratios). The number of channels required for vowel-recognition performance to plateau increased from 12 in quiet to 16-24 in the two noise conditions. However, for consonant recognition, no further improvement in performance was evident when the number of channels was > or =12 in any of the three conditions. The contribution of temporal cues for phoneme recognition showed a similar pattern in both quiet and noise conditions. Similar to the quiet conditions, there was a trade-off between temporal and spectral cues for phoneme recognition in noise. 相似文献

5.

Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants 总被引：18，自引：0，他引：18

Friesen LM Shannon RV Baskent D Wang X 《The Journal of the Acoustical Society of America》2001,110(2):1150-1163

Speech recognition was measured as a function of spectral resolution (number of spectral channels) and speech-to-noise ratio in normal-hearing (NH) and cochlear-implant (CI) listeners. Vowel, consonant, word, and sentence recognition were measured in five normal-hearing listeners, ten listeners with the Nucleus-22 cochlear implant, and nine listeners with the Advanced Bionics Clarion cochlear implant. Recognition was measured as a function of the number of spectral channels (noise bands or electrodes) at signal-to-noise ratios of + 15, + 10, +5, 0 dB, and in quiet. Performance with three different speech processing strategies (SPEAK, CIS, and SAS) was similar across all conditions, and improved as the number of electrodes increased (up to seven or eight) for all conditions. For all noise levels, vowel and consonant recognition with the SPEAK speech processor did not improve with more than seven electrodes, while for normal-hearing listeners, performance continued to increase up to at least 20 channels. Speech recognition on more difficult speech materials (word and sentence recognition) showed a marginally significant increase in Nucleus-22 listeners from seven to ten electrodes. The average implant score on all processing strategies was poorer than scores of NH listeners with similar processing. However, the best CI scores were similar to the normal-hearing scores for that condition (up to seven channels). CI listeners with the highest performance level increased in performance as the number of electrodes increased up to seven, while CI listeners with low levels of speech recognition did not increase in performance as the number of electrodes was increased beyond four. These results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant. 相似文献

6.

Effects of directional microphone and adaptive multichannel noise reduction algorithm on cochlear implant performance

Chung K Zeng FG Acker KN 《The Journal of the Acoustical Society of America》2006,120(4):2216-2227

Although cochlear implant (CI) users have enjoyed good speech recognition in quiet, they still have difficulties understanding speech in noise. We conducted three experiments to determine whether a directional microphone and an adaptive multichannel noise reduction algorithm could enhance CI performance in noise and whether Speech Transmission Index (STI) can be used to predict CI performance in various acoustic and signal processing conditions. In Experiment I, CI users listened to speech in noise processed by 4 hearing aid settings: omni-directional microphone, omni-directional microphone plus noise reduction, directional microphone, and directional microphone plus noise reduction. The directional microphone significantly improved speech recognition in noise. Both directional microphone and noise reduction algorithm improved overall preference. In Experiment II, normal hearing individuals listened to the recorded speech produced by 4- or 8-channel CI simulations. The 8-channel simulation yielded similar speech recognition results as in Experiment I, whereas the 4-channel simulation produced no significant difference among the 4 settings. In Experiment III, we examined the relationship between STIs and speech recognition. The results suggested that STI could predict actual and simulated CI speech intelligibility with acoustic degradation and the directional microphone, but not the noise reduction algorithm. Implications for intelligibility enhancement are discussed. 相似文献

7.

Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearing

Carroll J Tiaden S Zeng FG 《The Journal of the Acoustical Society of America》2011,130(4):2054-2062

Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. 相似文献

8.

Spectral and threshold effects on recognition of speech at higher-than-normal levels

Dubno JR Horwitz AR Ahlstrom JB 《The Journal of the Acoustical Society of America》2006,120(1):310-320

To examine spectral and threshold effects for speech and noise at high levels, recognition of nonsense syllables was assessed for low-pass-filtered speech and speech-shaped maskers and high-pass-filtered speech and speech-shaped maskers at three speech levels, with signal-to-noise ratio held constant. Subjects were younger adults with normal hearing and older adults with normal hearing but significantly higher average quiet thresholds. A broadband masker was always present to minimize audibility differences between subject groups and across presentation levels. For subjects with lower thresholds, the declines in recognition of low-frequency syllables in low-frequency maskers were attributed to nonlinear growth of masking which reduced "effective" signal-to-noise ratio at high levels, whereas the decline for subjects with higher thresholds was not fully explained by nonlinear masking growth. For all subjects, masking growth did not entirely account for declines in recognition of high-frequency syllables in high-frequency maskers at high levels. Relative to younger subjects with normal hearing and lower quiet thresholds, older subjects with normal hearing and higher quiet thresholds had poorer consonant recognition in noise, especially for high-frequency speech in high-frequency maskers. Age-related effects on thresholds and task proficiency may be determining factors in the recognition of speech in noise at high levels. 相似文献

9.

Relative contribution of off- and on-frequency spectral components of background noise to the masking of unprocessed and vocoded speech

Apoux F Healy EW 《The Journal of the Acoustical Society of America》2010,128(4):2075-2084

The present study examined the relative influence of the off- and on-frequency spectral components of modulated and unmodulated maskers on consonant recognition. Stimuli were divided into 30 contiguous equivalent rectangular bandwidths. The temporal fine structure (TFS) in each "target" band was either left intact or replaced with tones using vocoder processing. Recognition scores for 10, 15 and 20 target bands randomly located in frequency were obtained in quiet and in the presence of all 30 masker bands, only the off-frequency masker bands, or only the on-frequency masker bands. The amount of masking produced by the on-frequency bands was generally comparable to that produced by the broadband masker. However, the difference between these two conditions was often significant, indicating an influence of the off-frequency masker bands, likely through modulation interference or spectral restoration. Although vocoder processing systematically lead to poorer consonant recognition scores, the deficit observed in noise could often be attributed to that observed in quiet. These data indicate that (i) speech recognition is affected by the off-frequency components of the background and (ii) the nature of the target TFS does not systematically affect speech recognition in noise, especially when energetic masking and/or the number of target bands is limited. 相似文献

10.

Contribution of low-frequency acoustic information to Chinese speech recognition in cochlear implant simulations

Luo X Fu QJ 《The Journal of the Acoustical Society of America》2006,120(4):2260-2266

Chinese sentence recognition strongly relates to the reception of tonal information. For cochlear implant (CI) users with residual acoustic hearing, tonal information may be enhanced by restoring low-frequency acoustic cues in the nonimplanted ear. The present study investigated the contribution of low-frequency acoustic information to Chinese speech recognition in Mandarin-speaking normal-hearing subjects listening to acoustic simulations of bilaterally combined electric and acoustic hearing. Subjects listened to a 6-channel CI simulation in one ear and low-pass filtered speech in the other ear. Chinese tone, phoneme, and sentence recognition were measured in steady-state, speech-shaped noise, as a function of the cutoff frequency for low-pass filtered speech. Results showed that low-frequency acoustic information below 500 Hz contributed most strongly to tone recognition, while low-frequency acoustic information above 500 Hz contributed most strongly to phoneme recognition. For Chinese sentences, speech reception thresholds (SRTs) improved with increasing amounts of low-frequency acoustic information, and significantly improved when low-frequency acoustic information above 500 Hz was preserved. SRTs were not significantly affected by the degree of spectral overlap between the CI simulation and low-pass filtered speech. These results suggest that, for CI patients with residual acoustic hearing, preserving low-frequency acoustic information can improve Chinese speech recognition in noise. 相似文献

11.

Recognition of time-distorted sentences by normal-hearing and cochlear-implant listeners

Fu QJ Galvin JJ Wang X 《The Journal of the Acoustical Society of America》2001,109(1):379-384

This study evaluated the effects of time compression and expansion on sentence recognition by normal-hearing (NH) listeners and cochlear-implant (CI) recipients of the Nucleus-22 device. Sentence recognition was measured in five CI users using custom 4-channel continuous interleaved sampler (CIS) processors and five NH listeners using either 4-channel or 32-channel noise-band processors. For NH listeners, recognition was largely unaffected by time expansion, regardless of spectral resolution. However, recognition of time-compressed speech varied significantly with spectral resolution. When fine spectral resolution (32 channels) was available, speech recognition was unaffected even when the duration of sentences was shortened to 40% of their original length (equivalent to a mean duration of 40 ms/phoneme). However, a mean duration of 60 ms/phoneme was required to achieve the same level of recognition when only coarse spectral resolution (4 channels) was available. Recognition patterns were highly variable across CI listeners. The best CI listener performed as well as NH subjects listening to corresponding spectral conditions; however, three out of five CI listeners performed significantly poorer in recognizing time-compressed speech. Further investigation revealed that these three poorer-performing CI users also had more difficulty with simple temporal gap-detection tasks. The results indicate that limited spectral resolution reduces the ability to recognize time-compressed speech. Some CI listeners have more difficulty with time-compressed speech, as produced by rapid speakers, because of reduced spectral resolution and deficits in auditory temporal processing. 相似文献

12.

The relative phonetic contributions of a cochlear implant and residual acoustic hearing to bimodal speech perception

Sheffield BM Zeng FG 《The Journal of the Acoustical Society of America》2012,131(1):518-530

The addition of low-passed (LP) speech or even a tone following the fundamental frequency (F0) of speech has been shown to benefit speech recognition for cochlear implant (CI) users with residual acoustic hearing. The mechanisms underlying this benefit are still unclear. In this study, eight bimodal subjects (CI users with acoustic hearing in the non-implanted ear) and eight simulated bimodal subjects (using vocoded and LP speech) were tested on vowel and consonant recognition to determine the relative contributions of acoustic and phonetic cues, including F0, to the bimodal benefit. Several listening conditions were tested (CI/Vocoder, LP, T(F0-env), CI/Vocoder + LP, CI/Vocoder + T(F0-env)). Compared with CI/Vocoder performance, LP significantly enhanced both consonant and vowel perception, whereas a tone following the F0 contour of target speech and modulated with an amplitude envelope of the maximum frequency of the F0 contour (T(F0-env)) enhanced only consonant perception. Information transfer analysis revealed a dual mechanism in the bimodal benefit: The tone representing F0 provided voicing and manner information, whereas LP provided additional manner, place, and vowel formant information. The data in actual bimodal subjects also showed that the degree of the bimodal benefit depended on the cutoff and slope of residual acoustic hearing. 相似文献

13.

Channel weights for speech recognition in cochlear implant users

Mehr MA Turner CW Parkinson A 《The Journal of the Acoustical Society of America》2001,109(1):359-366

The purpose of this study was to develop and validate a method of estimating the relative "weight" that a multichannel cochlear implant user places on individual channels, indicating its contribution to overall speech recognition. The correlational method as applied to speech recognition was used both with normal-hearing listeners and with cochlear implant users fitted with six-channel speech processors. Speech was divided into frequency bands corresponding to the bands of the processor and a randomly chosen level of corresponding filtered noise was added to each channel on each trial. Channels in which the signal-to-noise ratio was more highly correlated with performance have higher weights, and conversely, channels in which the correlations were smaller have lower weights. Normal-hearing listeners showed approximately equal weights across frequency bands. In contrast, cochlear implant users showed unequal weighting across bands, and varied from individual to individual with some channels apparently not contributing significantly to speech recognition. To validate these channel weights, individual channels were removed and speech recognition in quiet was tested. A strong correlation was found between the relative weight of the channel removed and the decrease in speech recognition, thus providing support for use of the correlational method for cochlear implant users. 相似文献

14.

Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners

Litvak LM Spahr AJ Saoji AA Fridman GY 《The Journal of the Acoustical Society of America》2007,122(2):982-991

Spectral resolution has been reported to be closely related to vowel and consonant recognition in cochlear implant (CI) listeners. One measure of spectral resolution is spectral modulation threshold (SMT), which is defined as the smallest detectable spectral contrast in the spectral ripple stimulus. SMT may be determined by the activation pattern associated with electrical stimulation. In the present study, broad activation patterns were simulated using a multi-band vocoder to determine if similar impairments in speech understanding scores could be produced in normal-hearing listeners. Tokens were first decomposed into 15 logarithmically spaced bands and then re-synthesized by multiplying the envelope of each band by matched filtered noise. Various amounts of current spread were simulated by adjusting the drop-off of the noise spectrum away from the peak (40-5 dBoctave). The average SMT (0.25 and 0.5 cyclesoctave) increased from 6.3 to 22.5 dB, while average vowel identification scores dropped from 86% to 19% and consonant identification scores dropped from 93% to 59%. In each condition, the impairments in speech understanding were generally similar to those found in CI listeners with similar SMTs, suggesting that variability in spread of neural activation largely accounts for the variability in speech perception of CI listeners. 相似文献

15.

Interactive factors in consonant confusion patterns

T S Bell D D Dirks E C Carterette 《The Journal of the Acoustical Society of America》1989,85(1):339-346

Confusion patterns among English consonants were examined using log-linear modeling techniques to assess the influence of low-pass filtering, shaped noise, presentation level, and consonant position. Ten normal-hearing listeners were presented consonant-vowel (CV) and vowel-consonant (VC) syllables containing the vowel /a/. Stimuli were presented in quiet and in noise, and were either filtered or broadband. The noise was shaped such that the effective signal level in each 1/3 octave band was equivalent in quiet and noise listening conditions. Three presentation levels were analyzed corresponding to the overall rms level of the combined speech stimuli. Error patterns were affected significantly by presentation level, filtering, and consonant position as a complex interaction. The effect of filtering was dependent on presentation level and consonant position. The effects stemming from the noise were less pronounced. Specific confusions responsible for these effects were isolated, and an acoustical interaction is suggested, stressing the spectral characteristics of the signals and their modification by presentation level and filtering. 相似文献

16.

Frequency-to-electrode allocation and speech perception with cochlear implants

McKay CM Henshall KR 《The Journal of the Acoustical Society of America》2002,111(2):1036-1044

The hypothesis was investigated that selectively increasing the discrimination of low-frequency information (below 2600 Hz) by altering the frequency-to-electrode allocation would improve speech perception by cochlear implantees. Two experimental conditions were compared, both utilizing ten electrode positions selected based on maximal discrimination. A fixed frequency range (200-10513 Hz) was allocated either relatively evenly across the ten electrodes, or so that nine of the ten positions were allocated to the frequencies up to 2600 Hz. Two additional conditions utilizing all available electrode positions (15-18 electrodes) were assessed: one with each subject's usual frequency-to-electrode allocation; and the other using the same analysis filters as the other experimental conditions. Seven users of the Nucleus CI22 implant wore processors mapped with each experimental condition for 2-week periods away from the laboratory, followed by assessment of perception of words in quiet and sentences in noise. Performance with both ten-electrode maps was significantly poorer than with both full-electrode maps on at least one measure. Performance with the map allocating nine out of ten electrodes to low frequencies was equivalent to that with the full-electrode maps for vowel perception and sentences in noise, but was worse for consonant perception. Performance with the evenly allocated ten-electrode map was equivalent to that with the full-electrode maps for consonant perception, but worse for vowel perception and sentences in noise. Comparison of the two full-electrode maps showed that subjects could fully adapt to frequency shifts up to ratio changes of 1.3, given 2 weeks' experience. Future research is needed to investigate whether speech perception may be improved by the manipulation of frequency-to-electrode allocation in maps which have a full complement of electrodes in Nucleus implants. 相似文献

17.

Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies

Donaldson GS Nelson DA 《The Journal of the Acoustical Society of America》2000,107(3):1645-1658

Two related studies investigated the relationship between place-pitch sensitivity and consonant recognition in cochlear implant listeners using the Nucleus MPEAK and SPEAK speech processing strategies. Average place-pitch sensitivity across the electrode array was evaluated as a function of electrode separation, using a psychophysical electrode pitch-ranking task. Consonant recognition was assessed by analyzing error matrices obtained with a standard consonant confusion procedure to obtain relative transmitted information (RTI) measures for three features: stimulus (RTI stim), envelope (RTI env[plc]), and place-of-articulation (RTI plc[env]). The first experiment evaluated consonant recognition performance with MPEAK and SPEAK in the same subjects. Subjects were experienced users of the MPEAK strategy who used the SPEAK strategy on a daily basis for one month and were tested with both processors. It was hypothesized that subjects with good place-pitch sensitivity would demonstrate better consonant place-cue perception with SPEAK than with MPEAK, by virtue of their ability to make use of SPEAK's enhanced representation of spectral speech cues. Surprisingly, all but one subject demonstrated poor consonant place-cue performance with both MPEAK and SPEAK even though most subjects demonstrated good or excellent place-pitch sensitivity. Consistent with this, no systematic relationship between place-pitch sensitivity and consonant place-cue performance was observed. Subjects' poor place-cue perception with SPEAK was subsequently attributed to the relatively short period of experience that they were given with the SPEAK strategy. The second study reexamined the relationship between place-pitch sensitivity and consonant recognition in a group of experienced SPEAK users. For these subjects, a positive relationship was observed between place-pitch sensitivity and consonant place-cue performance, supporting the hypothesis that good place-pitch sensitivity facilitates subjects' use of spectral cues to consonant identity. A strong, linear relationship was also observed between measures of envelope- and place-cue extraction, with place-cue performance increasing as a constant proportion (approximately 0.8) of envelope-cue performance. To the extent that the envelope-cue measure reflects subjects' abilities to resolve amplitude fluctuations in the speech envelope, this finding suggests that both envelope- and place-cue perception depend strongly on subjects' envelope-processing abilities. Related to this, the data suggest that good place-cue perception depends both on envelope-processing abilities and place-pitch sensitivity, and that either factor may limit place-cue perception in a given cochlear implant listener. Data from both experiments indicate that subjects with small electric dynamic ranges (< 8 dB for 125-Hz, 205-microsecond/ph pulse trains) are more likely to demonstrate poor electrode pitch-ranking skills and poor consonant recognition performance than subjects with larger electric dynamic ranges. 相似文献

18.

A comparative intelligibility study of single-microphone noise reduction algorithms 总被引：1，自引：0，他引：1

Hu Y Loizou PC 《The Journal of the Acoustical Society of America》2007,122(3):1777

The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores. 相似文献

19.

Relative importance of temporal information in various frequency regions for consonant identification in quiet and in noise

Apoux F Bacon SP 《The Journal of the Acoustical Society of America》2004,116(3):1671-1680

The relative importance of temporal information in broad spectral regions for consonant identification was assessed in normal-hearing listeners. For the purpose of forcing listeners to use primarily temporal-envelope cues, speech sounds were spectrally degraded using four-noise-band vocoder processing Frequency-weighting functions were determined using two methods. The first method consisted of measuring the intelligibility of speech with a hole in the spectrum either in quiet or in noise. The second method consisted of correlating performance with the randomly and independently varied signal-to-noise ratio within each band. Results demonstrated that all bands contributed equally to consonant identification when presented in quiet. In noise, however, both methods indicated that listeners consistently placed relatively more weight upon the highest frequency band. It is proposed that the explanation for the difference in results between quiet and noise relates to the shape of the modulation spectra in adjacent frequency bands. Overall, the results suggest that normal-hearing listeners use a common listening strategy in a given condition. However, this strategy may be influenced by the competing sounds, and thus may vary according to the context. Some implications of the results for cochlear implantees and hearing-impaired listeners are discussed. 相似文献

20.

Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience

Whitmal NA Poissant SF Freyman RL Helfer KS 《The Journal of the Acoustical Society of America》2007,122(4):2376-2388

Channel vocoders using either tone or band-limited noise carriers have been used in experiments to simulate cochlear implant processing in normal-hearing listeners. Previous results from these experiments have suggested that the two vocoder types produce speech of nearly equal intelligibility in quiet conditions. The purpose of this study was to further compare the performance of tone and noise-band vocoders in both quiet and noisy listening conditions. In each of four experiments, normal-hearing subjects were better able to identify tone-vocoded sentences and vowel-consonant-vowel syllables than noise-vocoded sentences and syllables, both in quiet and in the presence of either speech-spectrum noise or two-talker babble. An analysis of consonant confusions for listening in both quiet and speech-spectrum noise revealed significantly different error patterns that were related to each vocoder's ability to produce tone or noise output that accurately reflected the consonant's manner of articulation. Subject experience was also shown to influence intelligibility. Simulations using a computational model of modulation detection suggest that the noise vocoder's disadvantage is in part due to the intrinsic temporal fluctuations of its carriers, which can interfere with temporal fluctuations that convey speech recognition cues. 相似文献