首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

2.
Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.  相似文献   

3.
The interlanguage speech intelligibility benefit   总被引:1,自引:0,他引:1  
This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication.  相似文献   

4.
Intelligibility of average talkers in typical listening environments   总被引:1,自引:0,他引:1  
Intelligibility of conversationally produced speech for normal hearing listeners was studied for three male and three female talkers. Four typical listening environments were used. These simulated a quiet living room, a classroom, and social events in two settings with different reverberation characteristics. For each talker, overall intelligibility and intelligibility for vowels, consonant voicing, consonant continuance, and consonant place were quantified using the speech pattern contrast (SPAC) test. Results indicated that significant intelligibility differences are observed among normal talkers even in listening environments that permit essentially full intelligibility for everyday conversations. On the whole, talkers maintained their relative intelligibility across the four environments, although there was one exception which suggested that some voices may be particularly susceptible to degradation due to reverberation. Consonant place was the most poorly perceived feature, followed by continuance, voicing, and vowel intelligibility. However, there were numerous significant interactions between talkers and speech features, indicating that a talker of average overall intelligibility may produce certain speech features with intelligibility that is considerably higher or lower than average. Neither long-term rms speech spectrum nor articulation rate was found to be an adequate single criterion for selecting a talker of average intelligibility. Ultimately, an average talker was chosen on the basis of four speech contrasts: initial consonant place, and final consonant place, voicing, and continuance.  相似文献   

5.
This study aimed to clarify the basic auditory and cognitive processes that affect listeners' performance on two spatial listening tasks: sound localization and speech recognition in spatially complex, multi-talker situations. Twenty-three elderly listeners with mild-to-moderate sensorineural hearing impairments were tested on the two spatial listening tasks, a measure of monaural spectral ripple discrimination, a measure of binaural temporal fine structure (TFS) sensitivity, and two (visual) cognitive measures indexing working memory and attention. All auditory test stimuli were spectrally shaped to restore (partial) audibility for each listener on each listening task. Eight younger normal-hearing listeners served as a control group. Data analyses revealed that the chosen auditory and cognitive measures could predict neither sound localization accuracy nor speech recognition when the target and maskers were separated along the front-back dimension. When the competing talkers were separated along the left-right dimension, however, speech recognition performance was significantly correlated with the attentional measure. Furthermore, supplementary analyses indicated additional effects of binaural TFS sensitivity and average low-frequency hearing thresholds. Altogether, these results are in support of the notion that both bottom-up and top-down deficits are responsible for the impaired functioning of elderly hearing-impaired listeners in cocktail party-like situations.  相似文献   

6.
In cochlear implants (CIs), different talkers often produce different levels of speech understanding because of the spectrally distorted speech patterns provided by the implant device. A spectral normalization approach was used to transform the spectral characteristics of one talker to those of another talker. In Experiment 1, speech recognition with two talkers was measured in CI users, with and without spectral normalization. Results showed that the spectral normalization algorithm had small but significant effect on performance. In Experiment 2, the effects of spectral normalization were measured in CI users and normal-hearing (NH) subjects; a pitch-stretching technique was used to simulate six talkers with different fundamental frequencies and vocal tract configurations. NH baseline performance was nearly perfect with these pitch-shift transformations. For CI subjects, while there was considerable intersubject variability in performance with the different pitch-shift transformations, spectral normalization significantly improved the intelligibility of these simulated talkers. The results from Experiments 1 and 2 demonstrate that spectral normalization toward more-intelligible talkers significantly improved CI users' speech understanding with less-intelligible talkers. The results suggest that spectral normalization using optimal reference patterns for individual CI patients may compensate for some of the acoustic variability across talkers.  相似文献   

7.
Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.  相似文献   

8.
Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages.  相似文献   

9.
Although many researchers have shown that listeners are able to selectively attend to a target speech signal when a masking talker is present in the same ear as the target speech or when a masking talker is present in a different ear than the target speech, little is known about selective auditory attention in tasks with a target talker in one ear and independent masking talkers in both ears at the same time. In this series of experiments, listeners were asked to respond to a target speech signal spoken by one of two competing talkers in their right (target) ear while ignoring a simultaneous masking sound in their left (unattended) ear. When the masking sound in the unattended ear was noise, listeners were able to segregate the competing talkers in the target ear nearly as well as they could with no sound in the unattended ear. When the masking sound in the unattended ear was speech, however, speech segregation in the target ear was substantially worse than with no sound in the unattended ear. When the masking sound in the unattended ear was time-reversed speech, speech segregation was degraded only when the target speech was presented at a lower level than the masking speech in the target ear. These results show that within-ear and across-ear speech segregation are closely related processes that cannot be performed simultaneously when the interfering sound in the unattended ear is qualitatively similar to speech.  相似文献   

10.
Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking.  相似文献   

11.
The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was derived from the speech intelligibility results and the efficiency of the ERs was determined as the ratio of the useful ER energy to the total ER energy. Even though ER energy contributed to speech intelligibility, DS energy was always more efficient, leading to better speech intelligibility for both groups of listeners. The efficiency loss for the ERs was mainly ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed.  相似文献   

12.
In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners.  相似文献   

13.
Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers.  相似文献   

14.
The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.  相似文献   

15.
Twelve normal-hearing subjects rated the intelligibility of 35-s, hearing-aid-processed continuous discourse (CD) passages. Three talkers (two male, one female), four hearing aids, and two signal-to-babble (S/B) ratios were used in a completely crossed design. Research questions concerned: (1) ability of listeners to rate intelligibility, (2) sensitivity of hearing aid rankings were based on intelligibility ratings for three CD passages per instrument, and (3) dependence of hearing aid rankings on (a) S/B ratio, and (b) talker characteristics. Results were: (1) listeners were able to rate intelligibility, (2) rankings based on intelligibility ratings of three CD passages per hearing aid were capable of identifying two superior instruments within a group of four hearing aids that were similar in frequency/gain function, (3) listening in a more difficult S/B ratio substantially decreased the sensitivity of the hearing aid rankings for the female talker but had only minor effects on the rankings for the male talkers, and (4) hearing aid intelligibility rankings were found to be different for different talkers. Applications to hearing aid selection are discussed.  相似文献   

16.
The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.  相似文献   

17.
Individual talkers differ in the acoustic properties of their speech, and at least some of these differences are in acoustic properties relevant for phonetic perception. Recent findings from studies of speech perception have shown that listeners can exploit such differences to facilitate both the recognition of talkers' voices and the recognition of words spoken by familiar talkers. These findings motivate the current study, whose aim is to examine individual talker variation in a particular phonetically-relevant acoustic property, voice-onset-time (VOT). VOT is a temporal property that robustly specifies voicing in stop consonants. From the broad literature involving VOT, it appears that individual talkers differ from one another in their VOT productions. The current study confirmed this finding for eight talkers producing monosyllabic words beginning with voiceless stop consonants. Moreover, when differences in VOT due to variability in speaking rate across the talkers were factored out using hierarchical linear modeling, individual talkers still differed from one another in VOT, though these differences were attenuated. These findings provide evidence that VOT varies systematically from talker to talker and may therefore be one phonetically-relevant acoustic property underlying listeners' capacity to benefit from talker-specific experience.  相似文献   

18.
Although both perceived vocal effort and intensity are known to influence the perceived distance of speech, little is known about the processes listeners use to integrate these two parameters into a single estimate of talker distance. In this series of experiments, listeners judged the distances of prerecorded speech samples presented over headphones in a large open field. In the first experiment, virtual synthesis techniques were used to simulate speech signals produced by a live talker at distances ranging from 0.25 to 64 m. In the second experiment, listeners judged the apparent distances of speech stimuli produced over a 60-dB range of different vocal effort levels (production levels) and presented over a 34-dB range of different intensities (presentation levels). In the third experiment, the listeners judged the distances of time-reversed speech samples. The results indicate that production level and presentation level influence distance perception differently for each of three distinct categories of speech. When the stimulus was high-level voiced speech (produced above 66 dB SPL 1 m from the talker's mouth), the distance judgments doubled with each 8-dB increase in production level and each 12-dB decrease in presentation level. When the stimulus was low-level voiced speech (produced at or below 66 dB SPL at 1 m), the distance judgments doubled with each 15-dB increase in production level but were relatively insensitive to changes in presentation level at all but the highest intensity levels tested. When the stimulus was whispered speech, the distance judgments were unaffected by changes in production level and only decreased with increasing presentation level when the intensity of the stimulus exceeded 66 dB SPL. The distance judgments obtained in these experiments were consistent across a range of different talkers, listeners, and utterances, suggesting that voice-based distance cueing could provide a robust way to control the apparent distances of speech sounds in virtual audio displays.  相似文献   

19.
Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source.  相似文献   

20.
This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced (a) with communicative intent under more ideal conditions and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. Forty talkers were recorded in pairs while engaged in "spot the difference" picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range, and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutors' needs, even when not directly experiencing the challenging listening condition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号