首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

2.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

3.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

4.
On the interpretability of speech/nonspeech comparisons: a reply to Fowler   总被引:1,自引:0,他引:1  
Fowler [J. Acoust. Soc. Am. 88, 1236-1249 (1990)] makes a set of claims on the basis of which she denies the general interpretability of experiments that compare the perception of speech sounds to the perception of acoustically analogous nonspeech sound. She also challenges a specific auditory hypothesis offered by Diehl and Walsh [J. Acoust. Soc. Am. 85, 2154-2164 (1989)] to explain the stimulus-length effect in the perception of stops and glides. It will be argued that her conclusions are unwarranted.  相似文献   

5.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

6.
The authors respond to Fitch's comments [H. Fitch, J. Acoust. Soc. Am. 86, 2017-2019 (1989)] on an earlier paper. New analyses are presented to address the question of whether F1 differences observed in the original report are an artifact of linear predictive coding (LPC) analysis techniques. Contrary to Fitch's claims, the results suggest that the F1 differences originally reported are, in fact, due to changes in vocal tract resonance characteristics. It is concluded that there are important acoustic-phonetic differences in speech when talkers speak in noise. These differences reflect changes in both glottal and supraglottal events that are designed to maintain speech intelligibility under adverse conditions.  相似文献   

7.
Pastore [J. Acoust. Soc. Am. 84, 2262-2266 (1988)] has written a lengthy response to Kewley-Port, Watson, and Foyle [J. Acoust. Soc. Am. 83, 1133-1145 (1988)]. In this reply to Pastore's letter, several of his arguments are addressed, and new data are reported which support the conclusion of the original article. That conclusion is, basically, that the temporal acuity of the auditory system does not appear to be the origin of categorical perception of speech or nonspeech sounds differing in temporal onsets.  相似文献   

8.
The extension to the speech intelligibility index (SII; ANSI S3.5-1997 (1997)) proposed by Rhebergen and Versfeld [Rhebergen, K.S., and Versfeld, N.J. (2005). J. Acoust. Soc. Am. 117(4), 2181-2192] is able to predict for normal-hearing listeners the speech intelligibility in both stationary and fluctuating noise maskers with reasonable accuracy. The extended SII model was validated with speech reception threshold (SRT) data from the literature. However, further validation is required and the present paper describes SRT experiments with nonstationary noise conditions that are critical to the extended model. From these data, it can be concluded that the extended SII model is able to predict the SRTs for the majority of conditions, but that predictions are better when the extended SII model includes a function to account for forward masking.  相似文献   

9.
An auditory enhancement effect was evaluated in normal and hearing-impaired persons using a paradigm similar to that used by Viemeister and Bacon [J. Acoust. Soc. Am. 71, 1502-1507 (1982)]. Thresholds for a 2000-Hz probe were obtained in two forward-masking conditions: (1) the standard condition in which the masker was a four-component harmonic complex including 2000 Hz, and (2) the enhancing condition in which the same harmonic complex except for the exclusion of the 2000-Hz component preceded the four-component masker. In addition, enhancement for speech was evaluated by asking subjects to identify flat-spectrum harmonic complexes that were preceded by inverse vowel spectra. Finally, suppression effects were evaluated by measuring forward-masked thresholds for a 2000-Hz probe as a function of suppressor frequency added to a 2000-Hz masker. Across all subjects, there was evidence of enhancement and better vowel recognition in those persons who also demonstrated evidence of suppression; however, two of the normal-hearing persons demonstrated reduced enhancement yet normal suppression effects.  相似文献   

10.
These experiments are concerned with the intelligibility of target speech in the presence of a background talker. Using a noise vocoder, Stone and Moore [J. Acoust. Soc. Am. 114, 1023-1034 (2003)] showed that single-channel fast-acting compression degraded intelligibility, but slow compression did not. Stone and Moore [J. Acoust. Soc. Am. 116, 2311-2323 (2004)] showed that intelligibility was lower when fast single-channel compression was applied to the target and background after mixing rather than before, and suggested that this was partly due to compression after mixing introducing "comodulation" between the target and background talkers. Experiment 1 here showed a similar effect for multi-channel compression. In experiment 2, intelligibility was measured as a function of the speed of multi-channel compression applied after mixing. For both eight- and 12-channel vocoders with one compressor per channel, intelligibility decreased as compression speed increased. For the eight-channel vocoder, a compressor that only affected modulation depth for rates below 2 Hz still reduced intelligibility. Experiment 3 used 12- or 18-channel vocoders. There were between 1 and 12 compression channels, and four speeds of compression. Intelligibility decreased as the number and speed of compression channels increased. The results are interpreted using several measures of the effects of compression, especially "across-source modulation correlation."  相似文献   

11.
This study investigates the controversy regarding the influence of age on the acoustic reflex threshold for broadband noise, 500-, 1000-, 2000-, and 4000-Hz activators between Jerger et al. [Mono. Contemp. Audiol. 1 (1978)] and Jerger [J. Acoust. Soc. Am. 66 (1979)] on the one hand and Silman [J. Acoust. Soc. Am. 66 (1979)] and others on the other. The acoustic reflex thresholds for broadband noise, 500-, 1000-, 2000-, and 4000-Hz activators were evaluated under two measurement conditions. Seventy-two normal-hearing ears were drawn from 72 subjects ranging in age from 20-69 years. The results revealed that age was correlated with the acoustic reflex threshold for BBN activator but not for any of the tonal activators; the correlation was stronger under the 1-dB than under the 5-dB measurement condition. Also, the mean acoustic reflex thresholds for broadband noise activator were essentially similar to those reported by Jerger et al. (1978) but differed from those obtained in this study under the 1-dB measurement condition.  相似文献   

12.
The phenomenological framework outlined in the companion paper [C. A. Shera and G. Zweig, J. Acoust. Soc. Am. 92, 1356-1370 (1992)] characterizes both forward and reverse transmission through the middle ear. This paper illustrates its use in the analysis of noninvasive measurements of middle-ear and cochlear mechanics. A cochlear scattering framework is developed for the analysis of combination-tone and other experiments in which acoustic distortion products are used to drive the middle ear "in reverse." The framework is illustrated with a simple psychophysical Gedankenexperiment analogous to the neurophysiological experiments of P. F. Fahey and J. B. Allen [J. Acoust. Soc. Am. 77, 599-612 (1985)].  相似文献   

13.
An extended version of the equalization-cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers. The model incorporates time-varying jitters, both in time and amplitude, and implements the equalization and cancellation operations in each frequency band independently. The model is consistent with the original EC model in predicting tone-detection performance for a large set of configurations. When the model is applied to speech, the speech intelligibility index is used to predict speech intelligibility performance in a variety of conditions. Specific conditions addressed include different types of maskers, different numbers of maskers, and different spatial locations of maskers. Model predictions are compared with empirical measurements reported by Hawley et al. [J. Acoust. Soc. Am. 115, 833-843 (2004)] and by Marrone et al. [J. Acoust. Soc. Am. 124, 1146-1158 (2008)]. The model succeeds in predicting speech intelligibility performance when maskers are speech-shaped noise or broadband-modulated speech-shaped noise but fails when the maskers are speech or reversed speech.  相似文献   

14.
Binaural disparities are the primary acoustic cues employed in sound localization tasks. However, the degree of binaural correlation in a sound serves as a complementary cue for detecting competing sound sources [J. F. Culling, H. S. Colburn, and M. Spurchise, "Interaural correlation sensitivity," J. Acoust. Soc. Am. 110(2), 1020-1029 (2001) and L. R. Bernstein and C. Trahiotis, "On the use of the normalized correlation as an index of interaural envelope correlation," J. Acoust. Soc. Am. 100, 1754-1763 (1996)]. Here a random chord stereogram (RCS) sound is developed that produces a salient pop-out illusion of a slowly varying ripple sound [T. Chi et al., "Spectro-temporal modulation transfer functions and speech intelligibility," J. Acoust. Soc. Am. 106(5), 2719-2732 (1999)], even though the left and right ear sounds alone consist of noise-like random modulations. The quality and resolution of this percept is systematically controlled by adjusting the spectrotemporal correlation pattern between the left and right sounds. The prominence and limited time-frequency resolution for resolving the RCS suggests that envelope correlations are a dominant binaural cue for grouping acoustic objects.  相似文献   

15.
Two signal-processing algorithms, derived from those described by Stubbs and Summerfield [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 84, 1236-1249 (1988)], were used to separate the voiced speech of two talkers speaking simultaneously, at similar intensities, in a single channel. Both algorithms use fundamental frequency (FO) as the basis for segregation. One attenuates the interfering voice by filtering the cepstrum of the signal. The other is a hybrid algorithm that combines cepstral filtering with the technique of harmonic selection [T.W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)]. The algorithms were evaluated and compared in perceptual experiments involving listeners with normal hearing and listeners with cochlear hearing impairments. In experiment 1 the processing was used to separate voiced sentences spoken on a monotone. Both algorithms gave significant increases in intelligibility to both groups of listeners. The improvements were equivalent to an increase of 3-4 dB in the effective signal-to-noise ratio (SNR). In experiment 2 the processing was used to separate voiced sentences spoken with time-varying intonation. For normal-hearing listeners, cepstral filtering gave a significant increase in intelligibility, while the hybrid algorithm gave an increase that was on the margins of significance (p = 0.06). The improvements were equivalent to an increase of 2-3 dB in the effective SNR. For impaired listeners, no intelligibility improvements were demonstrated with intoned sentences. The decrease in performance for intoned material is attributed to limitations of the algorithms when FO is nonstationary.  相似文献   

16.
Thresholds were measured for the detection of 20-ms sinusoids, with frequencies 500, 4000, or 6500 Hz, presented in bursts of bandpass noise of the same duration and centered around the signal frequency. A range of noise levels from 35 to 80 dB SPL was used. Noise at different center frequencies was equated in terms of the total noise power in an assumed auditory filter centered on the signal frequency. Thresholds were expressed as the signal levels, relative to these noise levels, necessary for subjects to achieve 71% correct. For 500-Hz signals, thresholds were about 5 dB regardless of noise level. For 6500-Hz signals, thresholds reached a maximum of 14 dB at intermediate noise levels of 55-65 dB SPL. For 4000-Hz signals, a maximum threshold of 10 dB was observed for noise levels of 45-55 dB SPL. When the bandpass noises were presented continuously, however, thresholds for 6500-Hz, 20-ms signals remained low (about 1 dB) and constant across level. These results are similar to those obtained for the intensity discrimination of brief tones in bandstop noise [R. P. Carlyon and B. C. J. Moore, J. Acoust. Soc. Am. 76, 1369-1376 (1984); R. P. Carlyon and B. C. J. Moore, J. Acoust. Soc. Am. 79, 453-460 (1986)].  相似文献   

17.
To examine whether auditory streaming contributes to unmasking, intelligibility of target sentences against two competing talkers was measured using the coordinate response measure (CRM) [Bolia et al., J. Acoust. Soc. Am. 107, 1065-1066 (2007)] corpus. In the control condition, the speech reception threshold (50% correct) was measured when the target and two maskers were collocated straight ahead. Separating maskers from the target by +/-30 degrees resulted in spatial release from masking of 12 dB. CRM sentences involve an identifier in the first part and two target words in the second part. In experimental conditions, masking talkers started spatially separated at +/-30 degrees but became collocated with the target before the scoring words. In one experiment, one target and two different maskers were randomly selected from a mixed-sex corpus. Significant unmasking of 4 dB remained despite the absence of persistent location cues. When same-sex talkers were used as maskers and target, unmasking was reduced. These data suggest that initial separation may permit confident identification and streaming of the target and masker speech where significant differences between target and masker voice characteristics exist, but where target and masker characteristics are similar, listeners must rely more heavily on continuing spatial cues.  相似文献   

18.
Recent studies of the relation between loudness and intensity difference limens (DLs) suggest that, if two tones of the same frequency are equally loud, they will have equal relative DLs [R. S. Schlauch and C.C. Wier, J. Speech Hear. Res. 30, 13-20 (1987); J.J. Zwislocki and H.N. Jordan, J. Acoust. Soc. Am. 79, 772-780 (1986)]. To test this hypothesis, loudness matches and intensity DLs for a 1000-Hz pure tone in quiet and in a 40-dB SPL spectrum level broadband noise were obtained for four subjects with normal hearing. The DLs were obtained in both gated- and continuous-pedestal conditions. Contrary to previous reports, equally loud tones do not yield equal relative DLs at several midintensities in the gated condition and at many intensities in the continuous condition. While the equal-loudness, equal-relative-DL hypothesis is not supported by the data, the relation between loudness and intensity discrimination appears to be well described by a model reported by Houtsma et al. [J. Acoust. Soc. Am. 68, 807-813 (1980)].  相似文献   

19.
Because they consist, in large part, of random turbulent noise, fricatives present a challenge to attempts to specify the phonetic correlates of phonological features. Previous research has focused on temporal properties, acoustic power, and a variety of spectral properties of fricatives in a number of contexts [Jongman et al., J. Acoust. Soc. Am. 108, 1252-1263 (2000); Jesus and Shadle, J. Phonet. 30, 437-467 (2002); Crystal and House, J. Acoust. Soc. Am. 83, 1553-1573 (1988a)]. However, no systematic investigation of the effects of focus and prosodic context on fricative production has been carried out. Manipulation of explicit focus can serve to selectively exaggerate linguistically relevant properties of speech in much the same manner as stress [de Jong, J. Acoust. Soc. Am. 97, 491-504 (1995); de Jong, J. Phonet. 32, 493-516 (2004); de Jong and Zawaydeh, J. Phonet. 30, 53-75 (2002)]. This experimental technique was exploited to investigate acoustic power along with temporal and spectral characteristics of American English fricatives in two prosodic contexts, to probe whether native speakers selectively attend to subsegmental features, and to consider variability in fricative production across speakers. While focus in general increased noise power and duration, speakers did not selectively enhance spectral features of the target fricatives.  相似文献   

20.
Models of auditory masking: a molecular psychophysical approach   总被引:1,自引:0,他引:1  
Gilkey et al. [J. Acoust. Soc. Am. 78, 1207-1219 (1985)] measured hit proportions and false alarm proportions for detecting a 500-Hz tone at each of four starting phase angles in each of 25 reproducible noise samples. In the present paper, their results are modeled by fitting the general form of the electrical analog model of Jeffress [J. Acoust. Soc. Am. 48, 480-488 (1967)] to the diotic data. The best-fitting configurations of this model do not correspond to energy detectors or to envelope detectors. A detector composed of a 50-Hz-wide single-tuned filter, followed by a half-wave rectifier and an integrator with an integration time of 100 to 200 ms fits the data of all four subjects relatively well. Linear combinations of the outputs of several detectors that differ in center frequency or integration window provide even better fits to the data. These linear combinations assign negative weights to some frequencies or to some time intervals, suggesting that a subject's decision is based on a comparison of information in different spectral or temporal portions of the stimulus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号