期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Intelligibility of bandpass filtered speech: steepness of slopes required to eliminate transition band contributions 总被引：2，自引：0，他引：2

Warren RM Bashford JA Lenz PW 《The Journal of the Acoustical Society of America》2004,115(3):1292-1295

Despite the recognition that the steepness of filter slopes can play an important role in the intelligibility of bandpass speech, there has been no systematic examination of its importance. The present study used high orders of finite impulse response (FIR) filtering to produce slopes ranging from 150 to 10,000 dB/octave. The slopes flanked 1/3-octave passbands of everyday sentences having a center frequency of 1500 Hz (the region of highest intelligibility for the male speaker's voice). Presentation levels were approximately 75 and 45 dB. No significant differences were found for the two presentation levels. Average intelligibility scores ranged from 77% at 150 dB/octave down to the asymptotic intelligibility score of 12% at 4800 dB/octave. These results indicate that slopes of several thousand dB/octave may be required for accurate and unambiguous specification of the range of frequencies contributing to intelligibility of filtered speech. In addition, the extremely steep slopes are needed to ensure that none of the spectral components contributing to intelligibility has its relative importance diminished by spectral tilt. 相似文献

2.

Effect of spectral frequency range and separation on the perception of asynchronous speech

Healy EW Bacon SP 《The Journal of the Acoustical Society of America》2007,121(3):1691-1700

The use of across-frequency timing cues and the effect of disrupting these cues were examined across the frequency spectrum by introducing between-band asynchronies to pairs of narrowband temporal speech patterns. Sentence intelligibility by normal-hearing listeners fell when as little as 12.5 ms of asynchrony was introduced and was reduced to floor values by 100 ms. Disruptions to across-frequency timing had similar effects in the low-, mid-, and high-frequency regions, but band pairs having wider frequency separation were less disrupted by small amounts of asynchrony. In experiment 2, it was found that the disruptive influence of asynchrony on adjacent band pairs did not result from disruptions to the complex patterns present in overlapping excitation. The results of experiment 3 suggest that the processing of speech patterns may take place using mechanisms having different sensitivities to exact timing, similar to the dual mechanisms proposed for within- and across-channel gap detection. Preservation of relative timing can be critical to intelligibility. While the use of across-frequency timing cues appears similar across the spectrum, it may differ based on frequency separation. This difference appears to involve a greater reliance on exact timing during the processing of speech energy at proximate frequencies. 相似文献

3.

Intelligibility of 1/3-octave speech: greater contribution of frequencies outside than inside the nominal passband

Warren RM Bashford JA 《The Journal of the Acoustical Society of America》1999,106(5):L47-L52

We reported previously that "everyday" sentences were highly intelligible when limited to a 1/3-octave passband centered at 1,500 Hz and having transition-band slopes of approximately 100 dB/octave. The present study determined the relative contributions to intelligibility made by the passband (PB) and the transition bands (TBs) by partitioning the same bandpass sentences using 2,000-order FIR filtering. Intelligibility scores were: PB with both TBs, 92%; deletion of both TBs (leaving only the 1/3-octave PB with nearly vertical slopes), 24%; deletion of the PB (leaving both TBs separated by a 1/3-octave gap), 83%. These and other results indicate a remarkable ability to compensate for severe spectral tilt and the consequent importance of considering frequencies outside the nominal passband in interpreting studies using filtered speech. 相似文献

4.

The effect of smoothing filter slope and spectral frequency on temporal speech information

Healy EW Steinbach HM 《The Journal of the Acoustical Society of America》2007,121(2):1177-1181

It is known that information contained within the filter skirts can provide cues important to speech intelligibility. However, the role of filter slope during temporal smoothing has received little attention. In experiment 1, smoothing filter slope angle was found to have a large effect on the intelligibility of sentences represented by three amplitude-modulated sinusoids. In experiment 2, the use of temporal cues above 16 Hz was examined across various regions of the spectrum. When increases in rate were presented to individual spectral bands, intelligibility only increased when presented in the higher spectral region. This result suggests a greater reliance on higher-rate cues in this region. However, intelligibility was greatest when these cues were distributed across the spectrum, indicating that their effective use is not restricted solely to this region. 相似文献

5.

Acoustic and linguistic factors in the perception of bandpass-filtered speech

Stickney GS Assmann PF 《The Journal of the Acoustical Society of America》2001,109(3):1157-1165

Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering. 相似文献

6.

The effects of the addition of low-level, low-noise noise on the intelligibility of sentences processed to remove temporal envelope information

Hopkins K Moore BC Stone MA 《The Journal of the Acoustical Society of America》2010,128(4):2150-2161

The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low. 相似文献

7.

The behavioral response of mice to gaps in noise depends on its spectral components and its bandwidth

Ison JR Allen PD Rivoli PJ Moore JT 《The Journal of the Acoustical Society of America》2005,117(6):3944-3951

The purpose of these experiments was to determine whether detecting brief decrements in noise level ("gaps") varies with the spectral content and bandwidth of noise in mice as it does in humans. The behavioral effect of gaps was quantified by their inhibiting a subsequent acoustic startle reflex. Gap durations from 1 to 29 ms were presented in five adjacent 1-octave noise bands and one 5-octave band, their range being 2 kHz to 64 kHz. Gaps ended 60 ms before the startle stimulus (experiment 1) or at startle onset (experiment 2). Asymptotic inhibition was greater for higher-frequency 1-octave bands and highest for the 5-octave band in both experiments, but time constants were related to frequency only in experiment 1. For the lowest band (2-4 kHz) neither noise decrements (experiment 1 and 2) nor increments (experiment 3) had any behavioral consequence, but this band was effective when presented as a pulse in quiet (experiment 4). The lowest frequencies in the most effective 1-octave band were one octave above the spectral region where mice have their best absolute thresholds. These effects are similar to those obtained in humans, and reveal a special contribution of wide band, high-frequency stimulation to temporal acuity. 相似文献

8.

Perception of interrupted speech: cross-rate variation in the intelligibility of gated and concatenated sentences

Shafiro V Sheft S Risley R 《The Journal of the Acoustical Society of America》2011,130(2):EL108-EL114

Temporal constraints on the perception of variable-size speech fragments produced by interruption rates between 0.5 and 16 Hz were investigated by contrasting the intelligibility of gated sentences with and without silent intervals. Concatenation of consecutive speech fragments produced a significant decrease in intelligibility at 2 and 4 Hz, while having little effect at lower and higher rates. Consistent with previous studies, these findings indicate that (1) syllable-sized intervals associated with intermediate-rate interruptions are more susceptible to temporal distortions than the longer word-size or shorter phoneme-size intervals and (2) suggest qualitative differences in underlying perceptual processes at different rates. 相似文献

9.

The effect of segmental and suprasegmental corrections on the intelligibility of deaf speech

B Maassen D J Povel 《The Journal of the Acoustical Society of America》1985,78(3):877-886

Three experiments were conducted to study the effect of segmental and suprasegmental corrections on the intelligibility and judged quality of deaf speech. By means of digital signal processing techniques, including LPC analysis, transformations of separate speech sounds, temporal structure, and intonation were carried out on 30 Dutch sentences spoken by ten deaf children. The transformed sentences were tested for intelligibility and acceptability by presenting them to inexperienced listeners. In experiment 1, LPC based reflection coefficients describing segmental characteristics of deaf speakers were replaced by those of hearing speakers. A complete segmental correction caused a dramatic increase in intelligibility from 24% to 72%, which, for a major part, was due to correction of vowels. Experiment 2 revealed that correction of temporal structure and intonation caused only a small improvement from 24% to about 34%. Combination of segmental and suprasegmental corrections yielded almost perfectly understandable sentences, due to a more than additive effect of the two corrections. Quality judgments, collected in experiment 3, were in close agreement with the intelligibility measures. The results show that, in order for these speakers to become more intelligible, improving their articulation is more important than improving their production of temporal structure and intonation. 相似文献

10.

Cochlea-scaled spectral entropy predicts rate-invariant intelligibility of temporally distorted sentences

Stilp CE Kiefte M Alexander JM Kluender KR 《The Journal of the Acoustical Society of America》2010,128(4):2112-2126

Some evidence, mostly drawn from experiments using only a single moderate rate of speech, suggests that low-frequency amplitude modulations may be particularly important for intelligibility. Here, two experiments investigated intelligibility of temporally distorted sentences across a wide range of simulated speaking rates, and two metrics were used to predict results. Sentence intelligibility was assessed when successive segments of fixed duration were temporally reversed (exp. 1), and when sentences were processed through four third-octave-band filters, the outputs of which were desynchronized (exp. 2). For both experiments, intelligibility decreased with increasing distortion. However, in exp. 2, intelligibility recovered modestly with longer desynchronization. Across conditions, performances measured as a function of proportion of utterance distorted converged to a common function. Estimates of intelligibility derived from modulation transfer functions predict a substantial proportion of the variance in listeners' responses in exp. 1, but fail to predict performance in exp. 2. By contrast, a metric of potential information, quantified as relative dissimilarity (change) between successive cochlear-scaled spectra, is introduced. This metric reliably predicts listeners' intelligibility across the full range of speaking rates in both experiments. Results support an information-theoretic approach to speech perception and the significance of spectral change rather than physical units of time. 相似文献

11.

Measuring the critical band for speech

Healy EW Bacon SP 《The Journal of the Acoustical Society of America》2006,119(2):1083-1091

The current experiments were designed to measure the frequency resolution employed by listeners during the perception of everyday sentences. Speech bands having nearly vertical filter slopes and narrow bandwidths were sharply partitioned into various numbers of equal log- or ERBN-width subbands. The temporal envelope from each partition was used to amplitude modulate a corresponding band of low-noise noise, and the modulated carriers were combined and presented to normal-hearing listeners. Intelligibility increased and reached asymptote as the number of partitions increased. In the mid- and high-frequency regions of the speech spectrum, the partition bandwidth corresponding to asymptotic performance matched current estimates of psychophysical tuning across a number of conditions. These results indicate that, in these regions, the critical band for speech matches the critical band measured using traditional psychoacoustic methods and nonspeech stimuli. However, in the low-frequency region, partition bandwidths at asymptote were somewhat narrower than would be predicted based upon psychophysical tuning. It is concluded that, overall, current estimates of psychophysical tuning represent reasonably well the ability of listeners to extract spectral detail from running speech. 相似文献

12.

Effects of frequency response characteristics on speech discrimination and perceived intelligibility and pleasantness of speech for hearing-impaired listeners

D Byrne 《The Journal of the Acoustical Society of America》1986,80(2):494-504

Frequency response characteristics were selected for 14 hearing-impaired ears, according to six procedures. Three procedures were based on MCL measurements with speech bands of three bandwidths (1/3 octave, 1 octave, and 1 2/3 octaves). The other procedures were based on hearing thresholds, pure-tone MCLs, and pure-tone LDLs. The procedures were evaluated by speech discrimination testing, using nonsense syllables in noise, and by paired comparison judgments of the intelligibility and pleasantness of running speech. Speech discrimination testing showed significant differences between pairs of responses for only seven test ears. Nasals and glides were most affected by frequency response variations. Both intelligibility and pleasantness judgments showed significant differences for all test ears. Intelligibility in noise was less affected by frequency response differences than was intelligibility in quiet or pleasantness in quiet or in noise. For some ears, the ranking of responses depended on whether intelligibility or pleasantness was being judged and on whether the speech was in quiet or in noise. Overall, the three speech band MCL procedures were far superior to the others. Thus the studies strongly support the frequency response selection rationale of amplifying all frequency bands of speech to MCL. They also highlight some of the complications involved in achieving this aim. 相似文献

13.

Speech intelligibility in cochlear implant simulations: Effects of carrier type, interfering noise, and subject experience

Whitmal NA Poissant SF Freyman RL Helfer KS 《The Journal of the Acoustical Society of America》2007,122(4):2376-2388

Channel vocoders using either tone or band-limited noise carriers have been used in experiments to simulate cochlear implant processing in normal-hearing listeners. Previous results from these experiments have suggested that the two vocoder types produce speech of nearly equal intelligibility in quiet conditions. The purpose of this study was to further compare the performance of tone and noise-band vocoders in both quiet and noisy listening conditions. In each of four experiments, normal-hearing subjects were better able to identify tone-vocoded sentences and vowel-consonant-vowel syllables than noise-vocoded sentences and syllables, both in quiet and in the presence of either speech-spectrum noise or two-talker babble. An analysis of consonant confusions for listening in both quiet and speech-spectrum noise revealed significantly different error patterns that were related to each vocoder's ability to produce tone or noise output that accurately reflected the consonant's manner of articulation. Subject experience was also shown to influence intelligibility. Simulations using a computational model of modulation detection suggest that the noise vocoder's disadvantage is in part due to the intrinsic temporal fluctuations of its carriers, which can interfere with temporal fluctuations that convey speech recognition cues. 相似文献

14.

The spatial unmasking of speech: evidence for better-ear listening

Edmonds BA Culling JF 《The Journal of the Acoustical Society of America》2006,120(3):1539-1545

Speech reception thresholds (SRTs) were measured for target speech presented concurrently with interfering speech (spoken by a different speaker). In experiment 1, the target and interferer were divided spectrally into high- and low-frequency bands and presented over headphones in three conditions: monaural, dichotic (target and interferer to different ears), and swapped (the low-frequency target band and the high-frequency interferer band were presented to one ear, while the high-frequency target band and the low-frequency interferer band were presented to the other ear). SRTs were highest in the monaural condition and lowest in the dichotic condition; SRTs in the swapped condition were intermediate. In experiment 2, two new conditions were devised such that one target band was presented in isolation to one ear while the other band was presented at the other ear with the interferer. The pattern of SRTs observed in experiment 2 suggests that performance in the swapped condition reflects the intelligibility of the target frequency bands at just one ear; the auditory system appears unable to exploit advantageous target-to-interferer ratios at different ears when segregating target speech from a competing speech interferer. 相似文献

15.

Compression and expansion of the temporal envelope: evaluation of speech intelligibility and sound quality.

R A van Buuren J M Festen T Houtgast 《The Journal of the Acoustical Society of America》1999,105(5):2903-2913

Sensorineural hearing loss is accompanied by loudness recruitment, a steeper-than-normal rise of perceived loudness with presentation level. To compensate for this abnormality, amplitude compression is often applied (e.g., in a hearing aid). Alternatively, since speech intelligibility has been modeled as the perception of fast energy fluctuations, enlarging these (by means of expansion) may improve speech intelligibility. Still, even if these signal-processing techniques prove useful in terms of speech intelligibility, practical application might be hindered by unacceptably low sound quality. Therefore, both speech intelligibility and sound quality were evaluated for syllabic compression and expansion of the temporal envelope. Speech intelligibility was evaluated with an adaptive procedure, based on short everyday sentences either in noise or with a competing speaker. Sound quality was measured by means of a rating-scale procedure, for both speech and music. In a systematic setup, both the ratio of compression or expansion and the number of independent processing bands were varied. Individual hearing thresholds were compensated for by a listener-specific filter and amplification. Both listeners with normal hearing and listeners with sensorineural hearing impairment participated as paid volunteers. The results show that, on average, both compression and expansion fail to show better speech intelligibility or sound quality than linear amplification. 相似文献

16.

Speech intelligibility and localization in a multi-source environment. 总被引：1，自引：0，他引：1

M L Hawley R Y Litovsky H S Colburn 《The Journal of the Acoustical Society of America》1999,105(6):3436-3448

Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability. 相似文献

17.

The effect of timing errors on the intelligibility of deaf children's speech.

M J Osberger H Levitt 《The Journal of the Acoustical Society of America》1979,66(5):1316-1324

The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected. 相似文献

18.

Comodulation masking release in consonant recognition

Kwon BJ 《The Journal of the Acoustical Society of America》2002,112(2):634-641

Comodulation masking release (CMR) refers to an improvement in the detection threshold of a signal masked by noise with coherent amplitude fluctuation across frequency, as compared to noise without the envelope coherence. The present study tested whether such an advantage for signal detection would facilitate the identification of speech phonemes. Consonant identification of bandpass speech was measured under the following three masker conditions: (1) a single band of noise in the speech band ("on-frequency" masker); (2) two bands of noise, one in the on-frequency band and the other in the "flanking band," with coherence of temporal envelope fluctuation between the two bands (comodulation); and (3) two bands of noise (on-frequency band and flanking band), without the coherence of the envelopes (noncomodulation). A pilot experiment with a small number of consonant tokens was followed by the main experiment with 12 consonants and the following masking conditions: three frequency locations of the flanking band and two masker levels. Results showed that in all conditions, the comodulation condition provided higher identification scores than the noncomodulation condition, and the difference in score was 3.5% on average. No significant difference was observed between the on-frequency only condition and the comodulation condition, i.e., an "unmasking" effect by the addition of a comodulated flaking band was not observed. The positive effect of CMR on consonant recognition found in the present study endorses a "cued-listening" theory, rather than an envelope correlation theory, as a basis of CMR in a suprathreshold task. 相似文献

19.

Spatial unmasking of nearby speech sources in a simulated anechoic environment

Shinn-Cunningham BG Schickler J Kopco N Litovsky R 《The Journal of the Acoustical Society of America》2001,110(2):1118-1129

Spatial unmasking of speech has traditionally been studied with target and masker at the same, relatively large distance. The present study investigated spatial unmasking for configurations in which the simulated sources varied in azimuth and could be either near or far from the head. Target sentences and speech-shaped noise maskers were simulated over headphones using head-related transfer functions derived from a spherical-head model. Speech reception thresholds were measured adaptively, varying target level while keeping the masker level constant at the "better" ear. Results demonstrate that small positional changes can result in very large changes in speech intelligibility when sources are near the listener as a result of large changes in the overall level of the stimuli reaching the ears. In addition, the difference in the target-to-masker ratios at the two ears can be substantially larger for nearby sources than for relatively distant sources. Predictions from an existing model of binaural speech intelligibility are in good agreement with results from all conditions comparable to those that have been tested previously. However, small but important deviations between the measured and predicted results are observed for other spatial configurations, suggesting that current theories do not accurately account for speech intelligibility for some of the novel spatial configurations tested. 相似文献

20.

Speechreading supplemented with auditorily presented speech parameters 总被引：2，自引：0，他引：2

M Breeuwer R Plomp 《The Journal of the Acoustical Society of America》1986,79(2):481-499

Results are reported from two experiments in which the benefit of supplementing speechreading with auditorily presented information about the speech signal was investigated. In experiment I, speechreading was supplemented with information about the prosody of the speech signal. For ten normal-hearing subjects with no experience in speechreading, the intelligibility score for sentences increased significantly when speechreading was supplemented with information about the overall amplitude of the speech signal, information about the fundamental frequency, or both. Binary information about voicing appeared not to be a significant supplement. In experiment II, the best-scoring supplements of experiment I were compared with two supplementary signals from our previous studies, i.e., information about the sound-pressure levels in two 1-oct filter bands centered at 500 and 3160 Hz, or information about the frequencies of the first and second formants from voiced speech segments. Sentence-intelligibility scores were measured for 24 normal-hearing subjects with no experience in speechreading, and for 12 normal-hearing experienced speechreaders. For the inexperienced speechreaders, the sound-pressure levels appeared to be the best supplement (87.1% correct syllables). For the experienced speechreaders, the formant-frequency information (88.6% correct), and the fundamental-frequency plus amplitude information (86.0% correct), were equally efficient supplements as the sound-pressure information (86.1% correct). Discrimination of phonemes (both consonants and vowels) was measured for the group of 24 inexperienced speechreaders. Percentage correct responses, confusion among phonemes, and the percentage of transmitted information about different types of manner and place of articulation and about the feature voicing are presented. 相似文献