期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Liu C Galvin J Fu QJ Narayanan SS 《The Journal of the Acoustical Society of America》2008,123(5):2836-2847

In cochlear implants (CIs), different talkers often produce different levels of speech understanding because of the spectrally distorted speech patterns provided by the implant device. A spectral normalization approach was used to transform the spectral characteristics of one talker to those of another talker. In Experiment 1, speech recognition with two talkers was measured in CI users, with and without spectral normalization. Results showed that the spectral normalization algorithm had small but significant effect on performance. In Experiment 2, the effects of spectral normalization were measured in CI users and normal-hearing (NH) subjects; a pitch-stretching technique was used to simulate six talkers with different fundamental frequencies and vocal tract configurations. NH baseline performance was nearly perfect with these pitch-shift transformations. For CI subjects, while there was considerable intersubject variability in performance with the different pitch-shift transformations, spectral normalization significantly improved the intelligibility of these simulated talkers. The results from Experiments 1 and 2 demonstrate that spectral normalization toward more-intelligible talkers significantly improved CI users' speech understanding with less-intelligible talkers. The results suggest that spectral normalization using optimal reference patterns for individual CI patients may compensate for some of the acoustic variability across talkers. 相似文献

2.

The relative phonetic contributions of a cochlear implant and residual acoustic hearing to bimodal speech perception

Sheffield BM Zeng FG 《The Journal of the Acoustical Society of America》2012,131(1):518-530

The addition of low-passed (LP) speech or even a tone following the fundamental frequency (F0) of speech has been shown to benefit speech recognition for cochlear implant (CI) users with residual acoustic hearing. The mechanisms underlying this benefit are still unclear. In this study, eight bimodal subjects (CI users with acoustic hearing in the non-implanted ear) and eight simulated bimodal subjects (using vocoded and LP speech) were tested on vowel and consonant recognition to determine the relative contributions of acoustic and phonetic cues, including F0, to the bimodal benefit. Several listening conditions were tested (CI/Vocoder, LP, T(F0-env), CI/Vocoder + LP, CI/Vocoder + T(F0-env)). Compared with CI/Vocoder performance, LP significantly enhanced both consonant and vowel perception, whereas a tone following the F0 contour of target speech and modulated with an amplitude envelope of the maximum frequency of the F0 contour (T(F0-env)) enhanced only consonant perception. Information transfer analysis revealed a dual mechanism in the bimodal benefit: The tone representing F0 provided voicing and manner information, whereas LP provided additional manner, place, and vowel formant information. The data in actual bimodal subjects also showed that the degree of the bimodal benefit depended on the cutoff and slope of residual acoustic hearing. 相似文献

3.

Effects of short-term auditory deprivation on speech production in adult cochlear implant users.

M A Svirsky H Lane J S Perkell J Wozniak 《The Journal of the Acoustical Society of America》1992,92(3):1284-1300

Speech production parameters of three postlingually deafened adults who use cochlear implants were measured: after 24 h of auditory deprivation (which was achieved by turning the subject's speech processor off); after turning the speech processor back on; and after turning the speech processor off again. The measured parameters included vowel acoustics [F1, F2, F0, sound-pressure level (SPL), duration and H1-H2, the amplitude difference between the first two spectral harmonics, a correlate of breathiness] while reading word lists, and average airflow during the reading of passages. Changes in speech processor state (on-to-off or vice versa) were accompanied by numerous changes in speech production parameters. Many changes were in the direction of normalcy, and most were consistent with long-term speech production changes in the same subjects following activation of the processors of their cochlear implants [Perkell et al., J. Acoust. Soc. Am. 91, 2961-2978 (1992)]. Changes in mean airflow were always accompanied by H1-H2 (breathiness) changes in the same direction, probably due to underlying changes in laryngeal posture. Some parameters (different combinations of SPL, F0, H1-H2 and formants for different subjects) showed very rapid changes when turning the speech processor on or off. Parameter changes were faster and more pronounced, however, when the speech processor was turned on than when it was turned off. The picture that emerges from the present study is consistent with a dual role for auditory feedback in speech production: long-term calibration of articulatory parameters as well as feedback mechanisms with relatively short time constants. 相似文献

4.

Relationship between perception of spectral ripple and speech recognition in cochlear implant and vocoder listeners

Litvak LM Spahr AJ Saoji AA Fridman GY 《The Journal of the Acoustical Society of America》2007,122(2):982-991

Spectral resolution has been reported to be closely related to vowel and consonant recognition in cochlear implant (CI) listeners. One measure of spectral resolution is spectral modulation threshold (SMT), which is defined as the smallest detectable spectral contrast in the spectral ripple stimulus. SMT may be determined by the activation pattern associated with electrical stimulation. In the present study, broad activation patterns were simulated using a multi-band vocoder to determine if similar impairments in speech understanding scores could be produced in normal-hearing listeners. Tokens were first decomposed into 15 logarithmically spaced bands and then re-synthesized by multiplying the envelope of each band by matched filtered noise. Various amounts of current spread were simulated by adjusting the drop-off of the noise spectrum away from the peak (40-5 dBoctave). The average SMT (0.25 and 0.5 cyclesoctave) increased from 6.3 to 22.5 dB, while average vowel identification scores dropped from 86% to 19% and consonant identification scores dropped from 93% to 59%. In each condition, the impairments in speech understanding were generally similar to those found in CI listeners with similar SMTs, suggesting that variability in spread of neural activation largely accounts for the variability in speech perception of CI listeners. 相似文献

5.

Phoneme recognition by cochlear implant users as a function of signal-to-noise ratio and nonlinear amplitude mapping. 总被引：2，自引：0，他引：2

Q J Fu R V Shannon 《The Journal of the Acoustical Society of America》1999,106(2):L18-L23

相似文献

6.

Recognition of spectrally asynchronous speech by normal-hearing listeners and Nucleus-22 cochlear implant users

Fu QJ Galvin JJ 《The Journal of the Acoustical Society of America》2001,109(3):1166-1172

This experiment examined the effects of spectral resolution and fine spectral structure on recognition of spectrally asynchronous sentences by normal-hearing and cochlear implant listeners. Sentence recognition was measured in six normal-hearing subjects listening to either full-spectrum or noise-band processors and five Nucleus-22 cochlear implant listeners fitted with 4-channel continuous interleaved sampling (CIS) processors. For the full-spectrum processor, the speech signals were divided into either 4 or 16 channels. For the noise-band processor, after band-pass filtering into 4 or 16 channels, the envelope of each channel was extracted and used to modulate noise of the same bandwidth as the analysis band, thus eliminating the fine spectral structure available in the full-spectrum processor. For the 4-channel CIS processor, the amplitude envelopes extracted from four bands were transformed to electric currents by a power function and the resulting electric currents were used to modulate pulse trains delivered to four electrode pairs. For all processors, the output of each channel was time-shifted relative to other channels, varying the channel delay across channels from 0 to 240 ms (in 40-ms steps). Within each delay condition, all channels were desynchronized such that the cross-channel delays between adjacent channels were maximized, thereby avoiding local pockets of channel synchrony. Results show no significant difference between the 4- and 16-channel full-spectrum speech processor for normal-hearing listeners. Recognition scores dropped significantly only when the maximum delay reached 200 ms for the 4-channel processor and 240 ms for the 16-channel processor. When fine spectral structures were removed in the noise-band processor, sentence recognition dropped significantly when the maximum delay was 160 ms for the 16-channel noise-band processor and 40 ms for the 4-channel noise-band processor. There was no significant difference between implant listeners using the 4-channel CIS processor and normal-hearing listeners using the 4-channel noise-band processor. The results imply that when fine spectral structures are not available, as in the implant listener's case, increased spectral resolution is important for overcoming cross-channel asynchrony in speech signals. 相似文献

7.

Use of intonation contours for speech recognition in noise by cochlear implant recipients

Meister H Landwehr M Pyschny V Grugel L Walger M 《The Journal of the Acoustical Society of America》2011,129(5):EL204-EL209

The corruption of intonation contours has detrimental effects on sentence-based speech recognition in normal-hearing listeners Binns and Culling [(2007). J. Acoust. Soc. Am. 122, 1765-1776]. This paper examines whether this finding also applies to cochlear implant (CI) recipients. The subjects' F0-discrimination and speech perception in the presence of noise were measured, using sentences with regular and inverted F0-contours. The results revealed that speech recognition for regular contours was significantly better than for inverted contours. This difference was related to the subjects' F0-discrimination providing further evidence that the perception of intonation patterns is important for the CI-mediated speech recognition in noise. 相似文献

8.

Effect of stimulus level and place of stimulation on temporal pitch perception by cochlear implant users 总被引：1，自引：0，他引：1

Carlyon RP Lynch C Deeks JM 《The Journal of the Acoustical Society of America》2010,127(5):2997-3008

Three experiments studied the effect of pulse rate on temporal pitch perception by cochlear implant users. Experiment 1 measured rate discrimination for pulse trains presented in bipolar mode to either an apical, middle, or basal electrode and for standard rates of 100 and 200 pps. In each block of trials the signals could have a level of -0.35, 0, or +0.35 dB re the standard, and performance for each signal level was recorded separately. Signal level affected performance for just over half of the combinations of subject, electrode, and standard rate studied. Performance was usually, but not always, better at the higher signal level. Experiment 2 showed that, for a given subject and condition, the direction of the effect was similar in monopolar and bipolar mode. Experiment 3 employed a pitch comparison procedure without feedback, and showed that the signal levels in experiment 1 that produced the best performance for a given subject and condition also led to the signal having a higher pitch. It is concluded that small level differences can have a robust and substantial effect on pitch judgments and argue that these effects are not entirely due to response biases or to co-variation of place-of-excitation with level. 相似文献

9.

The effect of channel interactions on speech recognition in cochlear implant subjects: predictions from an acoustic model 总被引：2，自引：0，他引：2

Throckmorton CS Collins LM 《The Journal of the Acoustical Society of America》2002,112(1):285-296

Acoustic models that produce speech signals with information content similar to that provided to cochlear implant users provide a mechanism by which to investigate the effect of various implant-specific processing or hardware parameters independent of other complicating factors. This study compares speech recognition of normal-hearing subjects listening through normal and impaired acoustic models of cochlear implant speech processors. The channel interactions that were simulated to impair the model were based on psychophysical data measured from cochlear implant subjects and include pitch reversals, indiscriminable electrodes, and forward masking effects. In general, spectral interactions degraded speech recognition more than temporal interactions. These effects were frequency dependent with spectral interactions that affect lower-frequency information causing the greatest decrease in speech recognition, and interactions that affect higher-frequency information having the least impact. The results of this study indicate that channel interactions, quantified psychophysically, affect speech recognition to different degrees. Investigation of the effects that channel interactions have on speech recognition may guide future research whose goal is compensating for psychophysically measured channel interactions in cochlear implant subjects. 相似文献

10.

Cross-language specialization in phonetic processing: English and Hindi perception of /w/-/v/ speech and nonspeech

Iverson P Wagner A Pinet M Rosen S 《The Journal of the Acoustical Society of America》2011,130(5):EL297-EL303

This study examined the perceptual specialization for native-language speech sounds, by comparing native Hindi and English speakers in their perception of a graded set of English /w/-/v/ stimuli that varied in similarity to natural speech. The results demonstrated that language experience does not affect general auditory processes for these types of sounds; there were strong cross-language differences for speech stimuli, and none for stimuli that were nonspeech. However, the cross-language differences extended into a gray area of speech-like stimuli that were difficult to classify, suggesting that the specialization occurred in phonetic processing prior to categorization. 相似文献

11.

Effect of acoustic dynamic range on phoneme recognition in quiet and noise by cochlear implant users

Fu QJ Shannon RV 《The Journal of the Acoustical Society of America》1999,106(6):L65-L70

相似文献

12.

Patterns of phoneme perception errors by listeners with cochlear implants as a function of overall speech perception ability

Munson B Donaldson GS Allen SL Collison EA Nelson DA 《The Journal of the Acoustical Society of America》2003,113(2):925-935

Many studies have noted great variability in speech perception ability among postlingually deafened adults with cochlear implants. This study examined phoneme misperceptions for 30 cochlear implant listeners using either the Nucleus-22 or Clarion version 1.2 device to examine whether listeners with better overall speech perception differed qualitatively from poorer listeners in their perception of vowel and consonant features. In the first analysis, simple regressions were used to predict the mean percent-correct scores for consonants and vowels for the better group of listeners from those of the poorer group. A strong relationship between the two groups was found for consonant identification, and a weak, nonsignificant relationship was found for vowel identification. In the second analysis, it was found that less information was transmitted for consonant and vowel features to the poorer listeners than to the better listeners; however, the pattern of information transmission was similar across groups. Taken together, results suggest that the performance difference between the two groups is primarily quantitative. The results underscore the importance of examining individuals' perception of individual phoneme features when attempting to relate speech perception to other predictor variables. 相似文献

13.

Chinese subjects' perception of the word-final English /t/-/d/ contrast: performance before and after training

J E Flege 《The Journal of the Acoustical Society of America》1989,86(5):1684-1697

Chinese words may begin with /t/ and /d/, but a /t/-/d/ contrast does not exist in word-final position. The question addressed by experiment 1 was whether Chinese speakers of English could identify the final stop in words like beat and bead. The Chinese subjects examined approached the near-perfect identification rates of native English adults and children for words that were unedited, but performed poorly for words from which final release bursts had been removed. Removing closure voicing had a small effect on the Chinese but not the English listeners' sensitivity. A regression analysis indicated that the Chinese subjects' native language (Mandarin, Taiwanese, Shanghainese) and their scores on an English comprehension test accounted for a significant amount of variance in sensitivity to the (burstless) /t/-/d/ contrast. In experiment 2, a small amount of feedback training administered to Chinese subjects led to a small, nonsignificant increase in sensitivity to the English /t/-/d/ contrast. In experiment 3, more training trials were presented for a smaller number of words. A slightly larger and significant effect of training was obtained. The Chinese subjects who were native speakers of a language that permits obstruents in word-final position seemed to benefit more from the training than those whose native language (L1) has no word-final obstruents. This was interpreted to mean that syllable-processing strategies established during L1 acquisition may influence later L2 learning. 相似文献

14.

Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies

Donaldson GS Nelson DA 《The Journal of the Acoustical Society of America》2000,107(3):1645-1658

Two related studies investigated the relationship between place-pitch sensitivity and consonant recognition in cochlear implant listeners using the Nucleus MPEAK and SPEAK speech processing strategies. Average place-pitch sensitivity across the electrode array was evaluated as a function of electrode separation, using a psychophysical electrode pitch-ranking task. Consonant recognition was assessed by analyzing error matrices obtained with a standard consonant confusion procedure to obtain relative transmitted information (RTI) measures for three features: stimulus (RTI stim), envelope (RTI env[plc]), and place-of-articulation (RTI plc[env]). The first experiment evaluated consonant recognition performance with MPEAK and SPEAK in the same subjects. Subjects were experienced users of the MPEAK strategy who used the SPEAK strategy on a daily basis for one month and were tested with both processors. It was hypothesized that subjects with good place-pitch sensitivity would demonstrate better consonant place-cue perception with SPEAK than with MPEAK, by virtue of their ability to make use of SPEAK's enhanced representation of spectral speech cues. Surprisingly, all but one subject demonstrated poor consonant place-cue performance with both MPEAK and SPEAK even though most subjects demonstrated good or excellent place-pitch sensitivity. Consistent with this, no systematic relationship between place-pitch sensitivity and consonant place-cue performance was observed. Subjects' poor place-cue perception with SPEAK was subsequently attributed to the relatively short period of experience that they were given with the SPEAK strategy. The second study reexamined the relationship between place-pitch sensitivity and consonant recognition in a group of experienced SPEAK users. For these subjects, a positive relationship was observed between place-pitch sensitivity and consonant place-cue performance, supporting the hypothesis that good place-pitch sensitivity facilitates subjects' use of spectral cues to consonant identity. A strong, linear relationship was also observed between measures of envelope- and place-cue extraction, with place-cue performance increasing as a constant proportion (approximately 0.8) of envelope-cue performance. To the extent that the envelope-cue measure reflects subjects' abilities to resolve amplitude fluctuations in the speech envelope, this finding suggests that both envelope- and place-cue perception depend strongly on subjects' envelope-processing abilities. Related to this, the data suggest that good place-cue perception depends both on envelope-processing abilities and place-pitch sensitivity, and that either factor may limit place-cue perception in a given cochlear implant listener. Data from both experiments indicate that subjects with small electric dynamic ranges (< 8 dB for 125-Hz, 205-microsecond/ph pulse trains) are more likely to demonstrate poor electrode pitch-ranking skills and poor consonant recognition performance than subjects with larger electric dynamic ranges. 相似文献

15.

Threshold and channel interaction in cochlear implant users: evaluation of the tripolar electrode configuration

Bierer JA 《The Journal of the Acoustical Society of America》2007,121(3):1642-1653

The efficacy of cochlear implants is limited by spatial and temporal interactions among channels. This study explores the spatially restricted tripolar electrode configuration and compares it to bipolar and monopolar stimulation. Measures of threshold and channel interaction were obtained from nine subjects implanted with the Clarion HiFocus-I electrode array. Stimuli were biphasic pulses delivered at 1020 pulses/s. Threshold increased from monopolar to bipolar to tripolar stimulation and was most variable across channels with the tripolar configuration. Channel interaction, quantified by the shift in threshold between single- and two-channel stimulation, occurred for all three configurations but was largest for the monopolar and simultaneous conditions. The threshold shifts with simultaneous tripolar stimulation were slightly smaller than with bipolar and were not as strongly affected by the timing of the two channel stimulation as was monopolar. The subjects' performances on clinical speech tests were correlated with channel-to-channel variability in tripolar threshold, such that greater variability was related to poorer performance. The data suggest that tripolar channels with high thresholds may reveal cochlear regions of low neuron survival or poor electrode placement. 相似文献

16.

Mathematical modeling of vowel perception by users of analog multichannel cochlear implants: temporal and channel-amplitude cues

Svirsky MA 《The Journal of the Acoustical Society of America》2000,107(3):1521-1529

相似文献

17.

Faciliation of Mandarin tone perception by visual speech in clear and degraded audio: implications for cochlear implants

Smith D Burnham D 《The Journal of the Acoustical Society of America》2012,131(2):1480-1489

Cochlear implant (CI) users in tone language environments report great difficulty in perceiving lexical tone. This study investigated the augmentation of simulated cochlear implant audio by visual (facial) speech information for tone. Native speakers of Mandarin and Australian English were asked to discriminate between minimal pairs of Mandarin tones in five conditions: Auditory-Only, Auditory-Visual, CI-simulated Auditory-Only, CI-simulated Auditory-Visual, and Visual-Only (silent video). Discrimination in CI-simulated audio conditions was poor compared with normal audio, and varied according to tone pair, with tone pairs with strong non-F0 cues discriminated the most easily. The availability of visual speech information also improved discrimination in the CI-simulated audio conditions, particularly on tone pairs with strong durational cues. In the silent Visual-Only condition, both Mandarin and Australian English speakers discriminated tones above chance levels. Interestingly, tone-nai?ve listeners outperformed native listeners in the Visual-Only condition, suggesting firstly that visual speech information for tone is available, and may in fact be under-used by normal-hearing tone language perceivers, and secondly that the perception of such information may be language-general, rather than the product of language-specific learning. This may find application in the development of methods to improve tone perception in CI users in tone language environments. 相似文献

18.

Production of the word-final English /t/-/d/ contrast by native speakers of English, Mandarin, and Spanish.

J E Flege M J Munro L Skelton 《The Journal of the Acoustical Society of America》1992,92(1):128-143

The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues. 相似文献

19.

Trading relations in the perception of /r/-/l/ by Japanese learners of English

M Underbakke L Polka T L Gottfried W Strange 《The Journal of the Acoustical Society of America》1988,84(1):90-100

The role of language-specific factors in phonetically based trading relations was examined by assessing the ability of 20 native Japanese speakers to identify and discriminate stimuli of two synthetic /r/-/l/ series that varied temporal and spectral parameters independently. Results of forced-choice identification and oddity discrimination tasks showed that the nine Japanese subjects who were able to identify /r/ and /l/ reliably demonstrated a trading relation similar to that of Americans. Discrimination results reflected the perceptual equivalence of temporal and spectral parameters. Discrimination by the 11 Japanese subjects who were unable to identify the /r/-/l/ series differed significantly from the skilled Japanese subjects and native English speakers. However, their performance could not be predicted on the basis of acoustic dissimilarity alone. These results provide evidence that the trading relation between temporal and spectral cues for the /r/-/l/ contrast is not solely attributable to general auditory or language-universal phonetic processing constraints, but rather is also a function of phonemic processes that can be modified in the course of learning a second language. 相似文献

20.

Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants 总被引：18，自引：0，他引：18

Friesen LM Shannon RV Baskent D Wang X 《The Journal of the Acoustical Society of America》2001,110(2):1150-1163

Speech recognition was measured as a function of spectral resolution (number of spectral channels) and speech-to-noise ratio in normal-hearing (NH) and cochlear-implant (CI) listeners. Vowel, consonant, word, and sentence recognition were measured in five normal-hearing listeners, ten listeners with the Nucleus-22 cochlear implant, and nine listeners with the Advanced Bionics Clarion cochlear implant. Recognition was measured as a function of the number of spectral channels (noise bands or electrodes) at signal-to-noise ratios of + 15, + 10, +5, 0 dB, and in quiet. Performance with three different speech processing strategies (SPEAK, CIS, and SAS) was similar across all conditions, and improved as the number of electrodes increased (up to seven or eight) for all conditions. For all noise levels, vowel and consonant recognition with the SPEAK speech processor did not improve with more than seven electrodes, while for normal-hearing listeners, performance continued to increase up to at least 20 channels. Speech recognition on more difficult speech materials (word and sentence recognition) showed a marginally significant increase in Nucleus-22 listeners from seven to ten electrodes. The average implant score on all processing strategies was poorer than scores of NH listeners with similar processing. However, the best CI scores were similar to the normal-hearing scores for that condition (up to seven channels). CI listeners with the highest performance level increased in performance as the number of electrodes increased up to seven, while CI listeners with low levels of speech recognition did not increase in performance as the number of electrodes was increased beyond four. These results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant. 相似文献