期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cochlear implant speech recognition with speech maskers

Stickney GS Zeng FG Litovsky R Assmann P 《The Journal of the Acoustical Society of America》2004,116(2):1081-1091

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced. 相似文献

2.

Effect of target-masker similarity on across-ear interference in a dichotic cocktail-party listening task

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2007,122(3):1724

Similarity between the target and masking voices is known to have a strong influence on performance in monaural and binaural selective attention tasks, but little is known about the role it might play in dichotic listening tasks with a target signal and one masking voice in the one ear and a second independent masking voice in the opposite ear. This experiment examined performance in a dichotic listening task with a target talker in one ear and same-talker, same-sex, or different-sex maskers in both the target and the unattended ears. The results indicate that listeners were most susceptible to across-ear interference with a different-sex within-ear masker and least susceptible with a same-talker within-ear masker, suggesting that the amount of across-ear interference cannot be predicted from the difficulty of selectively attending to the within-ear masking voice. The results also show that the amount of across-ear interference consistently increases when the across-ear masking voice is more similar to the target speech than the within-ear masking voice is, but that no corresponding decline in across-ear interference occurs when the across-ear voice is less similar to the target than the within-ear voice. These results are consistent with an "integrated strategy" model of speech perception where the listener chooses a segregation strategy based on the characteristics of the masker present in the target ear and the amount of across-ear interference is determined by the extent to which this strategy can also effectively be used to suppress the masker in the unattended ear. 相似文献

3.

The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2002,112(2):664-676

Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source. 相似文献

4.

Spatial release from masking in normally hearing and hearing-impaired listeners as a function of the temporal overlap of competing talkers

Best V Mason CR Kidd G 《The Journal of the Acoustical Society of America》2011,129(3):1616-1625

Listeners with sensorineural hearing loss are poorer than listeners with normal hearing at understanding one talker in the presence of another. This deficit is more pronounced when competing talkers are spatially separated, implying a reduced "spatial benefit" in hearing-impaired listeners. This study tested the hypothesis that this deficit is due to increased masking specifically during the simultaneous portions of competing speech signals. Monosyllabic words were compressed to a uniform duration and concatenated to create target and masker sentences with three levels of temporal overlap: 0% (non-overlapping in time), 50% (partially overlapping), or 100% (completely overlapping). Listeners with hearing loss performed particularly poorly in the 100% overlap condition, consistent with the idea that simultaneous speech sounds are most problematic for these listeners. However, spatial release from masking was reduced in all overlap conditions, suggesting that increased masking during periods of temporal overlap is only one factor limiting spatial unmasking in hearing-impaired listeners. 相似文献

5.

Stimulus factors influencing spatial release from speech-on-speech masking

Kidd G Mason CR Best V Marrone N 《The Journal of the Acoustical Society of America》2010,128(4):1965-1978

This study examined spatial release from masking (SRM) when a target talker was masked by competing talkers or by other types of sounds. The focus was on the role of interaural time differences (ITDs) and time-varying interaural level differences (ILDs) under conditions varying in the strength of informational masking (IM). In the first experiment, a target talker was masked by two other talkers that were either colocated with the target or were symmetrically spatially separated from the target with the stimuli presented through loudspeakers. The sounds were filtered into different frequency regions to restrict the available interaural cues. The largest SRM occurred for the broadband condition followed by a low-pass condition. However, even the highest frequency bandpass-filtered condition (3-6 kHz) yielded a significant SRM. In the second experiment the stimuli were presented via earphones. The listeners identified the speech of a target talker masked by one or two other talkers or noises when the maskers were colocated with the target or were perceptually separated by ITDs. The results revealed a complex pattern of masking in which the factors affecting performance in colocated and spatially separated conditions are to a large degree independent. 相似文献

6.

Within-ear and across-ear interference in a dichotic cocktail party listening task: effects of masker uncertainty

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2004,115(1):301-310

Increases in masker variability have been shown to increase the effects of informational masking in non-speech listening tasks, but relatively little is known about the influence that masker uncertainty has on the informational components of speech-on-speech masking. In this experiment, listeners were asked to extract information from a target phrase that was presented in their right ear while ignoring masking phrases that were presented in the same ear as the target phrase and in the ear opposite the target phrase. The level of masker uncertainty was varied by holding constant or "freezing" the talkers speaking the masking phrases, the semantic content used in the masking phrases, or both the talkers and the semantic content in the masking phrases within each block of 120 trials. The results showed that freezing the semantic content of the masking phrase in the target ear was the only reduction in masker uncertainty that ever resulted in a significant improvement in performance. Providing feedback after each trial improved performance overall, but did not prevent the listeners from making incorrect responses that matched the content of the frozen target-ear masking phrase. However, removing the target-ear contents corresponding to the masking phrase from the response set resulted in a dramatic improvement in performance. This suggests that the listeners were generally able to understand both of the phrases presented to the target ear, and that their incorrect responses in the task were almost entirely a result of their inability to determine which words were spoken by the target talker. 相似文献

7.

Informational masking of speech in children: effects of ipsilateral and contralateral distracters

Wightman FL Kistler DJ 《The Journal of the Acoustical Society of America》2005,118(5):3164-3176

Using a closed-set speech recognition paradigm thought to be heavily influenced by informational masking, auditory selective attention was measured in 38 children (ages 4-16 years) and 8 adults (ages 20-30 years). The task required attention to a monaural target speech message that was presented with a time-synchronized distracter message in the same ear. In some conditions a second distracter message or a speech-shaped noise was presented to the other ear. Compared to adults, children required higher target/distracter ratios to reach comparable performance levels, reflecting more informational masking in these listeners. Informational masking in most conditions was confirmed by the fact that a large proportion of the errors made by the listeners were contained in the distracter message(s). There was a monotonic age effect, such that even the children in the oldest age group (13.6-16 years) demonstrated poorer performance than adults. For both children and adults, presentation of an additional distracter in the contralateral ear significantly reduced performance, even when the distracter messages were produced by a talker of different sex than the target talker. The results are consistent with earlier reports from pure-tone masking studies that informational masking effects are much larger in children than in adults. 相似文献

8.

Aging, spatial cues, and single- versus dual-task performance in competing speech perception

Helfer KS Chevalier J Freyman RL 《The Journal of the Acoustical Society of America》2010,128(6):3625-3633

Older individuals often report difficulty coping in situations with multiple conversations in which they at times need to "tune out" the background speech and at other times seek to monitor competing messages. The present study was designed to simulate this type of interaction by examining the cost of requiring listeners to perform a secondary task in conjunction with understanding a target talker in the presence of competing speech. The ability of younger and older adults to understand a target utterance was measured with and without requiring the listener to also determine how many masking voices were presented time-reversed. Also of interest was how spatial separation affected the ability to perform these two tasks. Older adults demonstrated slightly reduced overall speech recognition and obtained less spatial release from masking, as compared to younger listeners. For both younger and older listeners, spatial separation increased the costs associated with performing both tasks together. The meaningfulness of the masker had a greater detrimental effect on speech understanding for older participants than for younger participants. However, the results suggest that the problems experienced by older adults in complex listening situations are not necessarily due to a deficit in the ability to switch and/or divide attention among talkers. 相似文献

9.

Speech intelligibility and localization in a multi-source environment. 总被引：1，自引：0，他引：1

M L Hawley R Y Litovsky H S Colburn 《The Journal of the Acoustical Society of America》1999,105(6):3436-3448

Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability. 相似文献

10.

Informational and energetic masking effects in the perception of two simultaneous talkers

Brungart DS 《The Journal of the Acoustical Society of America》2001,109(3):1101-1109

Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages. 相似文献

11.

Spatial release from energetic and informational masking in a divided speech identification task

Ihlefeld A Shinn-Cunningham B 《The Journal of the Acoustical Society of America》2008,123(6):4380-4392

When listening selectively to one talker in a two-talker environment, performance generally improves with spatial separation of the sources. The current study explores the role of spatial separation in divided listening, when listeners reported both of two simultaneous messages processed to have little spectral overlap (limiting "energetic masking" between the messages). One message was presented at a fixed level, while the other message level varied from equal to 40 dB less than that of the fixed-level message. Results demonstrate that spatial separation of the competing messages improved divided-listening performance. Most errors occurred because listeners failed to report the content of the less-intense talker. Moreover, performance generally improved as the broadband energy ratio of the variable-level to the fixed-level talker increased. The error patterns suggest that spatial separation improves the intelligibility of the less-intense talker by improving the ability to (1) hear portions of the signal that would otherwise be masked, (2) segregate the two talkers properly into separate perceptual streams, and (3) selectively focus attention on the less-intense talker. Spatial configuration did not noticeably affect the ability to report the more-intense talker, suggesting that it was processed differently than the less-intense talker, which was actively attended. 相似文献

12.

Informational masking of speech in children: auditory-visual integration

Wightman F Kistler D Brungart D 《The Journal of the Acoustical Society of America》2006,119(6):3940-3949

The focus of this study was the release from informational masking that could be obtained in a speech task by viewing a video of the target talker. A closed-set speech recognition paradigm was used to measure informational masking in 23 children (ages 6-16 years) and 10 adults. An audio-only condition required attention to a monaural target speech message that was presented to the same ear with a time-synchronized distracter message. In an audiovisual condition, a synchronized video of the target talker was also presented to assess the release from informational masking that could be achieved by speechreading. Children required higher target/distracter ratios than adults to reach comparable performance levels in the audio-only condition, reflecting a greater extent of informational masking in these listeners. There was a monotonic age effect, such that even the children in the oldest age group (12-16.9 years) demonstrated performance somewhat poorer than adults. Older children and adults improved significantly in the audiovisual condition, producing a release from informational masking of 15 dB or more in some adult listeners. Audiovisual presentation produced no informational masking release for the youngest children. Across all ages, the benefit of a synchronized video was strongly associated with speechreading ability. 相似文献

13.

Effects of cross-language voice training on speech perception: whose familiar voices are more intelligible?

Levi SV Winters SJ Pisoni DB 《The Journal of the Acoustical Society of America》2011,130(6):4053-4062

Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. 相似文献

14.

The interlanguage speech intelligibility benefit 总被引：1，自引：0，他引：1

Bent T Bradlow AR 《The Journal of the Acoustical Society of America》2003,114(3):1600-1610

This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication. 相似文献

15.

Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation

Drullman R Bronkhorst AW 《The Journal of the Acoustical Society of America》2000,107(4):2224-2235

In a 3D auditory display, sounds are presented over headphones in a way that they seem to originate from virtual sources in a space around the listener. This paper describes a study on the possible merits of such a display for bandlimited speech with respect to intelligibility and talker recognition against a background of competing voices. Different conditions were investigated: speech material (words/sentences), presentation mode (monaural/binaural/3D), number of competing talkers (1-4), and virtual position of the talkers (in 45 degrees-steps around the front horizontal plane). Average results for 12 listeners show an increase of speech intelligibility for 3D presentation for two or more competing talkers compared to conventional binaural presentation. The ability to recognize a talker is slightly better and the time required for recognition is significantly shorter for 3D presentation in the presence of two or three competing talkers. Although absolute localization of a talker is rather poor, spatial separation appears to have a significant effect on communication. For either speech intelligibility, talker recognition, or localization, no difference is found between the use of an individualized 3D auditory display and a general display. 相似文献

16.

Monaural speech segregation using synthetic speech signals

Brungart DS Iyer N Simpson BD 《The Journal of the Acoustical Society of America》2006,119(4):2327-2333

When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments. 相似文献

17.

Listener sensitivity to individual talker differences in voice-onset-time

Allen JS Miller JL 《The Journal of the Acoustical Society of America》2004,115(6):3171-3183

Recent findings in the domains of word and talker recognition reveal that listeners use previous experience with an individual talker's voice to facilitate subsequent perceptual processing of that talker's speech. These findings raise the possibility that listeners are sensitive to talker-specific acoustic-phonetic properties. The present study tested this possibility directly by examining listeners' sensitivity to talker differences in the voice-onset-time (VOT) associated with a word-initial voiceless stop consonant. Listeners were trained on the speech of two talkers. Speech synthesis was used to manipulate the VOTs of these talkers so that one had short VOTs and the other had long VOTs (counterbalanced across listeners). The results of two experiments using a paired-comparison task revealed that, when presented with a short- versus long-VOT variant of a given talker's speech, listeners could select the variant consistent with their experience of that talker's speech during training. This was true when listeners were tested on the same word heard during training and when they were tested on a different word spoken by the same talker, indicating that listeners generalized talker-specific VOT information to a novel word. Such sensitivity to talker-specific acoustic-phonetic properties may subserve at least in part listeners' capacity to benefit from talker-specific experience. 相似文献

18.

Perceptual learning in speech: stability over time

Eisner F McQueen JM 《The Journal of the Acoustical Society of America》2006,119(4):1950-1953

Perceptual representations of phonemes are flexible and adapt rapidly to accommodate idiosyncratic articulation in the speech of a particular talker. This letter addresses whether such adjustments remain stable over time and under exposure to other talkers. During exposure to a story, listeners learned to interpret an ambiguous sound as [f] or [s]. Perceptual adjustments measured after 12 h were as robust as those measured immediately after learning. Equivalent effects were found when listeners heard speech from other talkers in the 12 h interval, and when they had the opportunity to consolidate learning during sleep. 相似文献

19.

Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task

Brungart DS Simpson BD Darwin CJ Arbogast TL Kidd G 《The Journal of the Acoustical Society of America》2005,117(1):292-304

Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task. 相似文献

20.

Effect of masker type and age on speech intelligibility and spatial release from masking in children and adults

Johnstone PM Litovsky RY 《The Journal of the Acoustical Society of America》2006,120(4):2177-2189

Speech recognition in noisy environments improves when the speech signal is spatially separated from the interfering sound. This effect, known as spatial release from masking (SRM), was recently shown in young children. The present study compared SRM in children of ages 5-7 with adults for interferers introducing energetic, informational, and/or linguistic components. Three types of interferers were used: speech, reversed speech, and modulated white noise. Two female voices with different long-term spectra were also used. Speech reception thresholds (SRTs) were compared for: Quiet (target 0 degrees front, no interferer), Front (target and interferer both 0 degrees front), and Right (interferer 90 degrees right, target 0 degrees front). Children had higher SRTs and greater masking than adults. When spatial cues were not available, adults, but not children, were able to use differences in interferer type to separate the target from the interferer. Both children and adults showed SRM. Children, unlike adults, demonstrated large amounts of SRM for a time-reversed speech interferer. In conclusion, masking and SRM vary with the type of interfering sound, and this variation interacts with age; SRM may not depend on the spectral peculiarities of a particular type of voice when the target speech and interfering speech are different sex talkers. 相似文献