首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Procedures for enhancing the intelligibility of a target talker in the presence of a co-channel competing talker were evaluated in tests involving (i) continuously voiced sentences spoken on a monotone, (ii) continuously voiced sentences with time-varying intonation, and (iii) noncontinuously voiced sentences produced with natural intonation. The procedures were based on the methods of harmonic selection and cepstral filtering [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 87, 359-372 (1990)]. Target and competing voices were combined at signal-to-noise ratios (SNRs) between -10 dB and +10 dB. Subjects were a group with normal hearing and a heterogeneous group with mild-moderate cochlear hearing impairments. Processing enhanced the target voice over a range of SNRs for each type of sentence and for most listeners. Enhancement was greatest at negative SNRs. Among the impaired listeners, benefit was generally greater for those with milder losses. These results consolidate and extend previous demonstrations that voice-separation algorithms that exploit the harmonic structure of the voiced portions of speech can enhance intelligibility. However, practical application of such algorithms depends on a solution to the problem of tracking the fundamental-frequency contour of one voice in the presence of a competing voice.  相似文献   

2.
Spectral weighting strategies using a correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] were measured in ten listeners with sensorineural-hearing loss on a sentence recognition task. Sentences and a spectrally matched noise were filtered into five separate adjacent spectral bands and presented to listeners at various signal-to-noise ratios (SNRs). Five point-biserial correlations were computed between the listeners' response (correct or incorrect) on the task and the SNR in each band. The stronger the correlation between performance and SNR, the greater that given band was weighted by the listener. Listeners were tested with and without hearing aids on. All listeners were experienced hearing aid users. Results indicated that the highest spectral band (approximately 2800-11 000 Hz) received the greatest weight in both listening conditions. However, the weight on the highest spectral band was less when listeners performed the task with their hearing aids on in comparison to when listening without hearing aids. No direct relationship was observed between the listeners' weights and the sensation level within a given band.  相似文献   

3.
Two signal-processing algorithms, designed to separate the voiced speech of two talkers speaking simultaneously at similar intensities in a single channel, were compared and evaluated. Both algorithms exploit the harmonic structure of voiced speech and require a difference in fundamental frequency (F0) between the voices to operate successfully. One attenuates the interfering voice by filtering the cepstrum of the combined signal. The other uses the method of harmonic selection [T. W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)] to resynthesize the target voice from fragmentary spectral information. Two perceptual evaluations were carried out. One involved the separation of pairs of vowels synthesized on static F0's; the other involved the recovery of consonant-vowel (CV) words masked by a synthesized vowel. Normal-hearing listeners and four listeners with moderate-to-severe, bilateral, symmetrical, sensorineural hearing impairments were tested. All listeners showed increased accuracy of identification when the target voice was enhanced by processing. The vowel-identification data show that intelligibility enhancement is possible over a range of F0 separations between the target and interfering voice. The recovery of CV words demonstrates that the processing is valid not only for spectrally static vowels but also for less intense time-varying voiced consonants. The results for the impaired listeners suggest that the algorithms may be applicable as components of a noise-reduction system in future digital signal-processing hearing aids. The vowel-separation test, and subjective listening, suggest that harmonic selection, which is the more computationally expensive method, produces the more effective voice separation.  相似文献   

4.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

5.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

6.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

7.
Noise and distortion reduce speech intelligibility and quality in audio devices such as hearing aids. This study investigates the perception and prediction of sound quality by both normal-hearing and hearing-impaired subjects for conditions of noise and distortion related to those found in hearing aids. Stimuli were sentences subjected to three kinds of distortion (additive noise, peak clipping, and center clipping), with eight levels of degradation for each distortion type. The subjects performed paired comparisons for all possible pairs of 24 conditions. A one-dimensional coherence-based metric was used to analyze the quality judgments. This metric was an extension of a speech intelligibility metric presented in Kates and Arehart (2005) [J. Acoust. Soc. Am. 117, 2224-2237] and is based on dividing the speech signal into three amplitude regions, computing the coherence for each region, and then combining the three coherence values across frequency in a calculation based on the speech intelligibility index. The one-dimensional metric accurately predicted the quality judgments of normal-hearing listeners and listeners with mild-to-moderate hearing loss, although some systematic errors were present. A multidimensional analysis indicates that several dimensions are needed to describe the factors used by subjects to judge the effects of the three distortion types.  相似文献   

8.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

9.
The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.  相似文献   

10.
A Speech Intelligibility Index (SII) for the sentences in the Cantonese version of the Hearing In Noise Test (CHINT) was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe [J. Speech Hear. Res. 34, 427-438 (1991)]. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test [Nilsson et al., J. Acoust. Soc. Am. 95, 1085-1099 (1994)], the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in ,frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors.  相似文献   

11.
Speech recognition in noise improves with combined acoustic and electric stimulation compared to electric stimulation alone [Kong et al., J. Acoust. Soc. Am. 117, 1351-1361 (2005)]. Here the contribution of fundamental frequency (F0) and low-frequency phonetic cues to speech recognition in combined hearing was investigated. Normal-hearing listeners heard vocoded speech in one ear and low-pass (LP) filtered speech in the other. Three listening conditions (vocode-alone, LP-alone, combined) were investigated. Target speech (average F0=120 Hz) was mixed with a time-reversed masker (average F0=172 Hz) at three signal-to-noise ratios (SNRs). LP speech aided performance at all SNRs. Low-frequency phonetic cues were then removed by replacing the LP speech with a LP equal-amplitude harmonic complex, frequency and amplitude modulated by the F0 and temporal envelope of voiced segments of the target. The combined hearing advantage disappeared at 10 and 15 dB SNR, but persisted at 5 dB SNR. A similar finding occurred when, additionally, F0 contour cues were removed. These results are consistent with a role for low-frequency phonetic cues, but not with a combination of F0 information between the two ears. The enhanced performance at 5 dB SNR with F0 contour cues absent suggests that voicing or glimpsing cues may be responsible for the combined hearing benefit.  相似文献   

12.
This study considered consequences of sensorineural hearing loss in ten listeners. The characterization of individual hearing loss was based on psychoacoustic data addressing audiometric pure-tone sensitivity, cochlear compression, frequency selectivity, temporal resolution, and intensity discrimination. In the experiments it was found that listeners with comparable audiograms can show very different results in the supra-threshold measures. In an attempt to account for the observed individual data, a model of auditory signal processing and perception [Jepsen et al., J. Acoust. Soc. Am. 124, 422-438 (2008)] was used as a framework. The parameters of the cochlear processing stage of the model were adjusted to account for behaviorally estimated individual basilar-membrane input-output functions and the audiogram, from which the amounts of inner hair-cell and outer hair-cell losses were estimated as a function of frequency. All other model parameters were left unchanged. The predictions showed a reasonably good agreement with the measured individual data in the frequency selectivity and forward masking conditions while the variation of intensity discrimination thresholds across listeners was underestimated by the model. The model and the associated parameters for individual hearing-impaired listeners might be useful for investigating effects of individual hearing impairment in more complex conditions, such as speech intelligibility in noise.  相似文献   

13.
Wojtczak and Viemeister [J. Acoust. Soc. Am. 106, 1917-1924 (1999)] demonstrated a close relationship between intensity difference limens (DLs) and 4-Hz amplitude modulation (AM) detection thresholds in normal-hearing acoustic listeners. The present study demonstrates a similar relationship between intensity DLs and AM detection thresholds in cochlear-implant listeners, for gated stimuli. This suggests that acoustic and cochlear-implant listeners make use of a similar decision variable to perform intensity discrimination and modulation detection tasks. It can be shown that the absence of compression in electric hearing does not preclude this possibility.  相似文献   

14.
The Articulation Index and Speech Intelligibility Index predict intelligibility scores from measurements of speech and hearing parameters. One component in the prediction is the frequency-importance function, a weighting function that characterizes contributions of particular spectral regions of speech to speech intelligibility. The purpose of this study was to determine whether such importance functions could similarly characterize contributions of electrode channels in cochlear implant systems. Thirty-eight subjects with normal hearing listened to vowel-consonant-vowel tokens, either as recorded or as output from vocoders that simulated aspects of cochlear implant processing. Importance functions were measured using the method of Whitmal and DeRoy [J. Acoust. Soc. Am. 130, 4032-4043 (2011)], in which signal bandwidths were varied adaptively to produce specified token recognition scores in accordance with the transformed up-down rules of Levitt [J. Acoust. Soc. Am. 49, 467-477 (1971)]. Psychometric functions constructed from recognition scores were subsequently converted into importance functions. Comparisons of the resulting importance functions indicate that vocoder processing causes peak importance regions to shift downward in frequency. This shift is attributed to changes in strategy and capability for detecting voicing in speech, and is consistent with previously measured data.  相似文献   

15.
Simultaneous-masked psychophysical tuning curves (PTCs) were obtained from normal-hearing and sensorineural hearing-impaired listeners. The 20-ms signal was presented at the onset or at the temporal center of the 400-ms masker. For the normal-hearing listeners, as shown previously [S. P. Bacon and B. C. J. Moore, J. Acoust. Soc. Am. 80, 1638-1645 (1986)], the PTCs were sharper on the high-frequency side for a signal in the temporal center of the masker. For the hearing-impaired listeners, however, the shape of the PTC was virtually independent of the temporal position of the signal. These data suggest that the mechanisms responsible for sharpening the PTC with time in normal-hearing listeners are ineffective in listeners with moderate-to-severe sensorineural hearing loss.  相似文献   

16.
Overlap-masking degrades speech intelligibility in reverberation [R. H. Bolt and A. D. MacDonald, J. Acoust. Soc. Am. 21(6), 577-580 (1949)]. To reduce the effect of this degradation, steady-state suppression has been proposed as a preprocessing technique [Arai et al., Proc. Autumn Meet. Acoust. Soc. Jpn., 2001; Acoust. Sci. Tech. 23(8), 229-232 (2002)]. This technique automatically suppresses steady-state portions of speech that have more energy but are less crucial for speech perception. The present paper explores the effect of steady-state suppression on syllable identification preceded by /a/ under various reverberant conditions. In each of two perception experiments, stimuli were presented to 22 subjects with normal hearing. The stimuli consisted of mono-syllables in a carrier phrase with and without steady-state suppression and were presented under different reverberant conditions using artificial impulse responses. The results indicate that steady-state suppression statistically improves consonant identification for reverberation times of 0.7 to 1.2 s. Analysis of confusion matrices shows that identification of voiced consonants, stop and nasal consonants, and bilabial, alveolar, and velar consonants were especially improved by steady-state suppression. The steady-state suppression is demonstrated to be an effective preprocessing method for improving syllable identification by reducing the effect of overlap-masking under specific reverberant conditions.  相似文献   

17.
Reverberation usually degrades speech intelligibility for spatially separated speech and noise sources since spatial unmasking is reduced and late reflections decrease the fidelity of the received speech signal. The latter effect could not satisfactorily be predicted by a recently presented binaural speech intelligibility model [Beutelmann et al. (2010). J. Acoust. Soc. Am. 127, 2479-2497]. This study therefore evaluated three extensions of the model to improve its predictions: (1) an extension of the speech intelligibility index based on modulation transfer functions, (2) a correction factor based on the room acoustical quantity "definition," and (3) a separation of the speech signal into useful and detrimental parts. The predictions were compared to results of two experiments in which speech reception thresholds were measured in a reverberant room in quiet and in the presence of a noise source for listeners with normal hearing. All extensions yielded better predictions than the original model when the influence of reverberation was strong, while predictions were similar for conditions with less reverberation. Although model (3) differed substantially in the assumed interaction of binaural processing and early reflections, its predictions were very similar to model (2) that achieved the best fit to the data.  相似文献   

18.
The extension to the speech intelligibility index (SII; ANSI S3.5-1997 (1997)) proposed by Rhebergen and Versfeld [Rhebergen, K.S., and Versfeld, N.J. (2005). J. Acoust. Soc. Am. 117(4), 2181-2192] is able to predict for normal-hearing listeners the speech intelligibility in both stationary and fluctuating noise maskers with reasonable accuracy. The extended SII model was validated with speech reception threshold (SRT) data from the literature. However, further validation is required and the present paper describes SRT experiments with nonstationary noise conditions that are critical to the extended model. From these data, it can be concluded that the extended SII model is able to predict the SRTs for the majority of conditions, but that predictions are better when the extended SII model includes a function to account for forward masking.  相似文献   

19.
The benefit of supplementing speechreading with information about the frequencies of the first and second formants from the voiced sections of the speech signal was studied by presenting short sentences to 18 normal-hearing listeners under the following three conditions: (a) speechreading combined with listening to the formant-frequency information, (b) speechreading only, and (c) formant-frequency information only. The formant frequencies were presented either as pure tones or as a complex speechlike signal, obtained by filtering a periodic pulse sequence of 250 Hz by a cascade of four second-order bandpass filters (with constant bandwidth); the center frequencies of two of these filters followed the frequencies of the first and second formants, whereas the frequencies of the others remained constant. The percentage of correctly identified syllables increased from 22.8 in the case of speechreading only to 82.0 in the case of speechreading while listening to the complex speechlike signal. Listening to the formant information only scored 33.2% correct. However, comparison with the best-scoring condition of our previous study [Breeuwer and Plomp, J. Acoust. Soc. Am. 76, 686-691 (1984)] indicates that information about the sound-pressure levels in two one-octave filter bands with center frequencies of 500 and 3160 Hz is a more effective supplement to speechreading than the formant-frequency information.  相似文献   

20.
In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986)], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号