首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A glimpsing model of speech perception in noise   总被引:5,自引:0,他引:5  
Do listeners process noisy speech by taking advantage of "glimpses"-spectrotemporal regions in which the target signal is least affected by the background? This study used an automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise. Twelve masking conditions were chosen to create a range of glimpse sizes. Several different glimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used for detection, the minimum glimpse size, and the use of information in the masked regions. Recognition results were compared with behavioral data. A quantitative analysis demonstrated that the proportion of the time-frequency plane glimpsed is a good predictor of intelligibility. Recognition scores in each noise condition confirmed that sufficient information exists in glimpses to support consonant identification. Close fits to listeners' performance were obtained at two local SNR thresholds: one at around 8 dB and another in the range -5 to -2 dB. A transmitted information analysis revealed that cues to voicing are degraded more in the model than in human auditory processing.  相似文献   

2.
This study assessed the acoustic and perceptual effect of noise on vowel and stop-consonant spectra. Multi-talker babble and speech-shaped noise were added to vowel and stop stimuli at -5 to +10 dB S/N, and the effect of noise was quantified in terms of (a) spectral envelope differences between the noisy and clean spectra in three frequency bands, (b) presence of reliable F1 and F2 information in noise, and (c) changes in burst frequency and slope. Acoustic analysis indicated that F1 was detected more reliably than F2 and the largest spectral envelope differences between the noisy and clean vowel spectra occurred in the mid-frequency band. This finding suggests that in extremely noisy conditions listeners must be relying on relatively accurate F1 frequency information along with partial F2 information to identify vowels. Stop consonant recognition remained high even at -5 dB despite the disruption of burst cues due to additive noise, suggesting that listeners must be relying on other cues, perhaps formant transitions, to identify stops.  相似文献   

3.
The purpose of this study is to determine the relative impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners. Sentences were presented in two conditions wherein reverberant consonant segments were replaced with clean consonants, and in another condition wherein reverberant vowel segments were replaced with clean vowels. The underlying assumption is that self-masking effects would dominate in the first condition, whereas overlap-masking effects would dominate in the second condition. Results indicated that the degradation of speech intelligibility in reverberant conditions is caused primarily by self-masking effects that give rise to flattened formant transitions.  相似文献   

4.
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.  相似文献   

5.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

6.
This study compared how normal-hearing listeners (NH) and listeners with moderate to moderately severe cochlear hearing loss (HI) use and combine information within and across frequency regions in the perceptual separation of competing vowels with fundamental frequency differences (deltaF0) ranging from 0 to 9 semitones. Following the procedure of Culling and Darwin [J. Acoust. Soc. Am. 93, 3454-3467 (1993)], eight NH listeners and eight HI listeners identified competing vowels with either a consistent or inconsistent harmonic structure. Vowels were amplified to assure audibility for HI listeners. The contribution of frequency region depended on the value of deltaF0 between the competing vowels. When deltaF0 was small, both groups of listeners effectively utilized deltaF0 cues in the low-frequency region. In contrast, HI listeners derived significantly less benefit than NH listeners from deltaF0 cues conveyed by the high-frequency region at small deltaF0's. At larger deltaF0's, both groups combined deltaF0 cues from the low and high formant-frequency regions. Cochlear impairment appears to negatively impact the ability to use F0 cues for within-formant grouping in the high-frequency region. However, cochlear loss does not appear to disrupt the ability to use within-formant F0 cues in the low-frequency region or to group F0 cues across formant regions.  相似文献   

7.
Hearing-aid wearers have reported sound source locations as being perceptually internalized (i.e., inside their head). The contribution of hearing-aid design to internalization has, however, received little attention. This experiment compared the sensitivity of hearing-impaired (HI) and normal-hearing listeners to externalization cues when listening with their own ears and simulated behind-the-ear hearing-aids in increasingly complex listening situations and reduced pinna cues. Participants rated the degree of externalization using a multiple-stimulus listening test for mixes of internalized and externalized speech stimuli presented over headphones. The results showed that HI listeners had a contracted perception of externalization correlated with high-frequency hearing loss.  相似文献   

8.
There is limited documentation available on how sensorineurally hearing-impaired listeners use the various sources of phonemic information that are known to be distributed across time in the speech waveform. In this investigation, a group of normally hearing listeners and a group of sensorineurally hearing-impaired listeners (with and without the benefit of amplification) identified various consonant and vowel productions that had been systematically varied in duration. The consonants (presented in a /haCa/ environment) and the vowels (presented in a /bVd/ environment) were truncated in steps to eliminate various segments from the end of the stimulus. The results indicated that normally hearing listeners could extract more phonemic information, especially cues to consonant place, from the earlier occurring portions of the stimulus waveforms than could the hearing-impaired listeners. The use of amplification partially decreased the performance differences between the normally hearing listeners and the unaided hearing-impaired listeners. The results are relevant to current models of normal speech perception that emphasize the need for the listener to make phonemic identifications as quickly as possible.  相似文献   

9.
In contrast to the availability of consonant confusion studies with adults, to date, no investigators have compared children's consonant confusion patterns in noise to those of adults in a single study. To examine whether children's error patterns are similar to those of adults, three groups of children (24 each in 4-5, 6-7, and 8-9 yrs. old) and 24 adult native speakers of American English (AE) performed a recognition task for 15 AE consonants in /ɑ/-consonant-/ɑ/ nonsense syllables presented in a background of speech-shaped noise. Three signal-to-noise ratios (SNR: 0, +5, and +10 dB) were used. Although the performance improved as a function of age, the overall consonant recognition accuracy as a function of SNR improved at a similar rate for all groups. Detailed analyses using phonetic features (manner, place, and voicing) revealed that stop consonants were the most problematic for all groups. In addition, for the younger children, front consonants presented in the 0 dB SNR condition were more error prone than others. These results suggested that children's use of phonetic cues do not develop at the same rate for all phonetic features.  相似文献   

10.
The relative importance of temporal information in broad spectral regions for consonant identification was assessed in normal-hearing listeners. For the purpose of forcing listeners to use primarily temporal-envelope cues, speech sounds were spectrally degraded using four-noise-band vocoder processing Frequency-weighting functions were determined using two methods. The first method consisted of measuring the intelligibility of speech with a hole in the spectrum either in quiet or in noise. The second method consisted of correlating performance with the randomly and independently varied signal-to-noise ratio within each band. Results demonstrated that all bands contributed equally to consonant identification when presented in quiet. In noise, however, both methods indicated that listeners consistently placed relatively more weight upon the highest frequency band. It is proposed that the explanation for the difference in results between quiet and noise relates to the shape of the modulation spectra in adjacent frequency bands. Overall, the results suggest that normal-hearing listeners use a common listening strategy in a given condition. However, this strategy may be influenced by the competing sounds, and thus may vary according to the context. Some implications of the results for cochlear implantees and hearing-impaired listeners are discussed.  相似文献   

11.
Channel vocoders using either tone or band-limited noise carriers have been used in experiments to simulate cochlear implant processing in normal-hearing listeners. Previous results from these experiments have suggested that the two vocoder types produce speech of nearly equal intelligibility in quiet conditions. The purpose of this study was to further compare the performance of tone and noise-band vocoders in both quiet and noisy listening conditions. In each of four experiments, normal-hearing subjects were better able to identify tone-vocoded sentences and vowel-consonant-vowel syllables than noise-vocoded sentences and syllables, both in quiet and in the presence of either speech-spectrum noise or two-talker babble. An analysis of consonant confusions for listening in both quiet and speech-spectrum noise revealed significantly different error patterns that were related to each vocoder's ability to produce tone or noise output that accurately reflected the consonant's manner of articulation. Subject experience was also shown to influence intelligibility. Simulations using a computational model of modulation detection suggest that the noise vocoder's disadvantage is in part due to the intrinsic temporal fluctuations of its carriers, which can interfere with temporal fluctuations that convey speech recognition cues.  相似文献   

12.
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.  相似文献   

13.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

14.
The present study examined the benefits of providing amplified speech to the low- and mid-frequency regions of listeners with various degrees of sensorineural hearing loss. Nonsense syllables were low-pass filtered at various cutoff frequencies and consonant recognition was measured as the bandwidth of the signal was increased. In addition, error patterns were analyzed to determine the types of speech cues that were, or were not, transmitted to the listeners. For speech frequencies of 2800 Hz and below, a positive benefit of amplified speech was observed in every case, although the benefit provided was very often less than that observed in normal-hearing listeners who received the same increase in speech audibility. There was no dependence of this benefit upon the degree of hearing loss. Error patterns suggested that the primary difficulty that hearing-impaired individuals have in using amplified speech is due to their poor ability to perceive the place of articulation of consonants, followed by a reduced ability to perceive manner information.  相似文献   

15.
Most binary-mask studies assume a fine time-frequency representation of the signal that may not be available in some applications (e.g., cochlear implants). This study assesses the effect of spectral resolution on intelligibility of ideal-binary masked speech. In Experiment 1, speech corrupted in noise at -5 to 5 dB signal-to-noise ratio (SNR) was filtered into 6-32 channels and synthesized using the ideal binary mask. Results with normal-hearing listeners indicated substantial improvements in intelligibility with 24-32 channels, particularly in -5 dB SNR. Results from Experiment 2 indicated that having access to the ideal binary mask in the F1/F2 region is sufficient for good performance.  相似文献   

16.
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.  相似文献   

17.
These experiments examined how high presentation levels influence speech recognition for high- and low-frequency stimuli in noise. Normally hearing (NH) and hearing-impaired (HI) listeners were tested. In Experiment 1, high- and low-frequency bandwidths yielding 70%-correct word recognition in quiet were determined at levels associated with broadband speech at 75 dB SPL. In Experiment 2, broadband and band-limited sentences (based on passbands measured in Experiment 1) were presented at this level in speech-shaped noise filtered to the same frequency bandwidths as targets. Noise levels were adjusted to produce approximately 30%-correct word recognition. Frequency bandwidths and signal-to-noise ratios supporting criterion performance in Experiment 2 were tested at 75, 87.5, and 100 dB SPL in Experiment 3. Performance tended to decrease as levels increased. For NH listeners, this "rollover" effect was greater for high-frequency and broadband materials than for low-frequency stimuli. For HI listeners, the 75- to 87.5-dB increase improved signal audibility for high-frequency stimuli and rollover was not observed. However, the 87.5- to 100-dB increase produced qualitatively similar results for both groups: scores decreased most for high-frequency stimuli and least for low-frequency materials. Predictions of speech intelligibility by quantitative methods such as the Speech Intelligibility Index may be improved if rollover effects are modeled as frequency dependent.  相似文献   

18.
Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation.  相似文献   

19.
The purpose of this experiment was to evaluate the utilization of short-term spectral cues for recognition of initial plosive consonants (/b,d,g/) by normal-hearing and by hearing-impaired listeners differing in audiometric configuration. Recognition scores were obtained for these consonants paired with three vowels (/a,i,u/) while systematically reducing the duration (300 to 10 ms) of the synthetic consonant-vowel syllables. Results from 10 normal-hearing and 15 hearing-impaired listeners suggest that audiometric configuration interacts in a complex manner with the identification of short-duration stimuli. For consonants paired with the vowels /a/ and /u/, performance deteriorated as the slope of the audiometric configuration increased. The one exception to this result was a subject who had significantly elevated pure-tone thresholds relative to the other hearing-impaired subjects. Despite the changes in the shape of the onset spectral cues imposed by hearing loss, with increasing duration, consonant recognition in the /a/ and /u/ context for most hearing-impaired subjects eventually approached that of the normal-hearing listeners. In contrast, scores for consonants paired with /i/ were poor for a majority of hearing-impaired listeners for stimuli of all durations.  相似文献   

20.
The goal of this study was to determine the extent to which the difficulty experienced by impaired listeners in understanding noisy speech can be explained on the basis of elevated tone-detection thresholds. Twenty-one impaired ears of 15 subjects, spanning a variety of audiometric configurations with average hearing losses to 75 dB, were tested for reception of consonants in a speech-spectrum noise. Speech level, noise level, and frequency-gain characteristic were varied to generate a range of listening conditions. Results for impaired listeners were compared to those of normal-hearing listeners tested under the same conditions with extra noise added to approximate the impaired listeners' detection thresholds. Results for impaired and normal listeners were also compared on the basis of articulation indices. Consonant recognition by this sample of impaired listeners was generally comparable to that of normal-hearing listeners with similar threshold shifts listening under the same conditions. When listening conditions were equated for articulation index, there was no clear dependence of consonant recognition on average hearing loss. Assuming that the primary consequence of the threshold simulation in normals is loss of audibility (as opposed to suprathreshold discrimination or resolution deficits), it is concluded that the primary source of difficulty in listening in noise for listeners with moderate or milder hearing impairments, aside from the noise itself, is the loss of audibility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号