首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
Stone et al. [J. Acoust. Soc Am. 130, 2874-2881 (2011)], using vocoder processing, showed that the envelope modulations of a notionally steady noise were more effective than the envelope energy as a masker of speech. Here the same effect is demonstrated using non-vocoded signals. Speech was filtered into 28 channels. A masker centered on each channel was added to the channel signal at a target-to-background ratio of -5 or -10 dB. Maskers were sinusoids or noise bands with bandwidth 1/3 or 1 ERB(N) (ERB(N) being the bandwidth of "normal" auditory filters), synthesized with Gaussian (GN) or low-noise (LNN) statistics. To minimize peripheral interactions between maskers, odd-numbered channels were presented to one ear and even to the other. Speech intelligibility was assessed in the presence of each "steady" masker and that masker 100% sinusoidally amplitude modulated (SAM) at 8 Hz. Intelligibility decreased with increasing envelope fluctuation of the maskers. Masking release, the difference in intelligibility between the SAM and its "steady" counterpart, increased with bandwidth from near-zero to around 50 percentage points for the 1-ERB(N) GN. It is concluded that the sinusoidal and GN maskers behaved primarily as energetic and modulation maskers, respectively.  相似文献   

2.
Tone recognition is important for speech understanding in tonal languages such as Mandarin Chinese. Cochlear implant patients are able to perceive some tonal information by using temporal cues such as periodicity-related amplitude fluctuations and similarities between the fundamental frequency (F0) contour and the amplitude envelope. The present study investigates whether modifying the amplitude envelope to better resemble the F0 contour can further improve tone recognition in multichannel cochlear implants. Chinese tone and vowel recognition were measured for six native Chinese normal-hearing subjects listening to a simulation of a four-channel cochlear implant speech processor with and without amplitude envelope enhancement. Two algorithms were proposed to modify the amplitude envelope to more closely resemble the F0 contour. In the first algorithm, the amplitude envelope as well as the modulation depth of periodicity fluctuations was adjusted for each spectral channel. In the second algorithm, the overall amplitude envelope was adjusted before multichannel speech processing, thus reducing any local distortions to the speech spectral envelope. The results showed that both algorithms significantly improved Chinese tone recognition. By adjusting the overall amplitude envelope to match the F0 contour before multichannel processing, vowel recognition was better preserved and less speech-processing computation was required. The results suggest that modifying the amplitude envelope to more closely resemble the F0 contour may be a useful approach toward improving Chinese-speaking cochlear implant patients' tone recognition.  相似文献   

3.
The present study sought to establish whether speech recognition can be disrupted by the presence of amplitude modulation (AM) at a remote spectral region, and whether that disruption depends upon the rate of AM. The goal was to determine whether this paradigm could be used to examine which modulation frequencies in the speech envelope are most important for speech recognition. Consonant identification for a band of speech located in either the low- or high-frequency region was measured in the presence of a band of noise located in the opposite frequency region. The noise was either unmodulated or amplitude modulated by a sinusoid, a band of noise with a fixed absolute bandwidth, or a band of noise with a fixed relative bandwidth. The frequency of the modulator was 4, 16, 32, or 64 Hz. Small amounts of modulation interference were observed for all modulator types, irrespective of the location of the speech band. More important, the interference depended on modulation frequency, clearly supporting the existence of selectivity of modulation interference with speech stimuli. Overall, the results suggest a primary role of envelope fluctuations around 4 and 16 Hz without excluding the possibility of a contribution by faster rates.  相似文献   

4.
The study of speech from which the temporal fine structure (TFS) has been removed has become an important research area. Common procedures for removing TFS include noise and tone vocoders. In the noise vocoder, bands of noise are modulated by the envelope of the speech within each band, and in the tone vocoder the carrier is a sinusoid at the center of each frequency band. Five different procedures for removing TFS are evaluated in this paper: the noise vocoder, a low-noise noise approach in which the noise envelope is replaced by the speech envelope in each frequency band, phase randomization within each band, the tone vocoder, and sinusoidal modeling with random phase. The effects of TFS modification on the speech envelope are evaluated using an index based on the envelope time-frequency modulation. The results show that for all of the TFS techniques implemented in this study, there is a substantial loss in the accuracy of reproduction of the envelope time-frequency modulation. The tone vocoder gives the best accuracy, followed by the procedure that replaces the noise envelope with the speech envelope in each band.  相似文献   

5.
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.  相似文献   

6.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

7.
In this study the perception of the fundamental frequency (F0) of periodic stimuli by cochlear implant users is investigated. A widely used speech processor is the Continuous Interleaved Sampling (CIS) processor, for which the fundamental frequency appears as temporal fluctuations in the envelopes at the output. Three experiments with four users of the LAURA (Registered trade mark of Philips Hearing Implants, now Cochlear Technology Centre Europe) cochlear implant were carried out to examine the influence of the modulation depth of these envelope fluctuations on pitch discrimination. In the first experiment, the subjects were asked to discriminate between two SAM (sinusoidally amplitude modulated) pulse trains on a single electrode channel differing in modulation frequency ( deltaf = 20%). As expected, the results showed a decrease in the performance for smaller modulation depths. Optimal performance was reached for modulation depths between 20% and 99%, depending on subject, electrode channel, and modulation frequency. In the second experiment, the smallest noticeable difference in F0 of synthetic vowels was measured for three algorithms that differed in the obtained modulation depth at the output: the default CIS strategy, the CIS strategy in which the F0 fluctuations in the envelope were removed (FLAT CIS), and a third CIS strategy, which was especially designed to control and increase the depth of these fluctuations (F0 CIS). In general, performance was poorest for the FLAT CIS strategy, where changes in F0 are only apparent as changes of the average amplitude in the channel outputs. This emphasizes the importance of temporal coding of F0 in the speech envelope for pitch perception. No significantly better results were obtained for the F0 CIS strategy compared to the default CIS strategy, although the latter results in envelope modulation depths at which sub-optimal scores were obtained in some cases of the first experiment. This indicates that less modulation is needed if all channels are stimulated with synchronous F0 fluctuations. This hypothesis is confirmed in a third experiment where subjects performed significantly better in a pitch discrimination task with SAM pulse trains, if three channels were stimulated concurrently, as opposed to only one.  相似文献   

8.
Comodulation masking release (CMR) refers to an improvement in the detection threshold of a signal masked by noise with coherent amplitude fluctuation across frequency, as compared to noise without the envelope coherence. The present study tested whether such an advantage for signal detection would facilitate the identification of speech phonemes. Consonant identification of bandpass speech was measured under the following three masker conditions: (1) a single band of noise in the speech band ("on-frequency" masker); (2) two bands of noise, one in the on-frequency band and the other in the "flanking band," with coherence of temporal envelope fluctuation between the two bands (comodulation); and (3) two bands of noise (on-frequency band and flanking band), without the coherence of the envelopes (noncomodulation). A pilot experiment with a small number of consonant tokens was followed by the main experiment with 12 consonants and the following masking conditions: three frequency locations of the flanking band and two masker levels. Results showed that in all conditions, the comodulation condition provided higher identification scores than the noncomodulation condition, and the difference in score was 3.5% on average. No significant difference was observed between the on-frequency only condition and the comodulation condition, i.e., an "unmasking" effect by the addition of a comodulated flaking band was not observed. The positive effect of CMR on consonant recognition found in the present study endorses a "cued-listening" theory, rather than an envelope correlation theory, as a basis of CMR in a suprathreshold task.  相似文献   

9.
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.  相似文献   

10.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, the performance degrades significantly in noisy conditions. In this paper, a technique for noise compensation is proposed where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensation technique suppresses the effect of additive noise in speech. The robustness of the proposed features is further enhanced by the gain normalization technique. The normalized temporal envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are converted into modulation frequency features. These features are used in an automatic phoneme recognition task. Experiments are performed in mismatched train/test conditions where the test data are corrupted with various environmental distortions like telephone channel noise, additive noise, and room reverberation. Experiments are also performed on large amounts of real conversational telephone speech. In these experiments, the proposed features show substantial improvements in phoneme recognition rates compared to other speech analysis techniques. Furthermore, the contribution of various processing stages for robust speech signal representation is analyzed.  相似文献   

11.
Many hearing-impaired listeners suffer from distorted auditory processing capabilities. This study examines which aspects of auditory coding (i.e., intensity, time, or frequency) are distorted and how this affects speech perception. The distortion-sensitivity model is used: The effect of distorted auditory coding of a speech signal is simulated by an artificial distortion, and the sensitivity of speech intelligibility to this artificial distortion is compared for normal-hearing and hearing-impaired listeners. Stimuli (speech plus noise) are wavelet coded using a complex sinusoidal carrier with a Gaussian envelope (1/4 octave bandwidth). Intensity information is distorted by multiplying the modulus of each wavelet coefficient by a random factor. Temporal and spectral information are distorted by randomly shifting the wavelet positions along the temporal or spectral axis, respectively. Measured were (1) detection thresholds for each type of distortion, and (2) speech-reception thresholds for various degrees of distortion. For spectral distortion, hearing-impaired listeners showed increased detection thresholds and were also less sensitive to the distortion with respect to speech perception. For intensity and temporal distortion, this was not observed. Results indicate that a distorted coding of spectral information may be an important factor underlying reduced speech intelligibility for the hearing impaired.  相似文献   

12.
CAA broadband noise prediction for aeroacoustic design   总被引:1,自引:0,他引:1  
The current status of a computational aeroacoustics (CAA) approach to simulate broadband noise is reviewed. The method rests on the use of steady Reynolds averaged Navier-Stokes (RANS) simulation to describe the time-averaged motion of turbulent flow. By means of synthetic turbulence the steady one-point statistics (e.g. turbulence kinetic energy) and turbulent length- and time-scales of RANS are translated into fluctuations having statistics that very accurately reproduce the initial RANS target-setting. The synthetic fluctuations are used to prescribe sound sources which drive linear perturbation equations. The whole approach represents a methodology to solve statistical noise theory with state-of-the-art CAA tools in the time-domain. A brief overview of the synthetic turbulence model and its numerical discretization in terms of the random particle-mesh (RPM) and fast random particle-mesh (FRPM) method is given. Results are presented for trailing-edge noise, slat noise, and jet noise. Some problems related to the formulation of vortex sound sources are discussed.  相似文献   

13.
This study tested a time-domain spectral enhancement algorithm that was recently proposed by Turicchia and Sarpeshkar [IEEE Trans. Speech Audio Proc. 13, 243-253 (2005)]. The algorithm uses a filter bank, with each filter channel comprising broadly tuned amplitude compression, followed by more narrowly tuned expansion (companding). Normal-hearing listeners were tested in their ability to recognize sentences processed through a noise-excited envelope vocoder that simulates aspects of cochlear-implant processing. The sentences were presented in a steady background noise at signal-to-noise ratios of 0, 3, and 6 dB and were either passed directly through an envelope vocoder, or were first processed by the companding algorithm. Using an eight-channel envelope vocoder, companding produced small but significant improvements in speech reception. Parametric variations of the companding algorithm showed that the improvement in intelligibility was robust to changes in filter tuning, whereas decreases in the time constants resulted in a decrease in intelligibility. Companding continued to provide a benefit when the number of vocoder frequency channels was increased to sixteen. When integrated within a sixteen-channel cochlear-implant simulator, companding also led to significant improvements in sentence recognition. Thus, companding may represent a readily implementable way to provide some speech recognition benefits to current cochlear-implant users.  相似文献   

14.
This experiment examined the effects of spectral resolution and fine spectral structure on recognition of spectrally asynchronous sentences by normal-hearing and cochlear implant listeners. Sentence recognition was measured in six normal-hearing subjects listening to either full-spectrum or noise-band processors and five Nucleus-22 cochlear implant listeners fitted with 4-channel continuous interleaved sampling (CIS) processors. For the full-spectrum processor, the speech signals were divided into either 4 or 16 channels. For the noise-band processor, after band-pass filtering into 4 or 16 channels, the envelope of each channel was extracted and used to modulate noise of the same bandwidth as the analysis band, thus eliminating the fine spectral structure available in the full-spectrum processor. For the 4-channel CIS processor, the amplitude envelopes extracted from four bands were transformed to electric currents by a power function and the resulting electric currents were used to modulate pulse trains delivered to four electrode pairs. For all processors, the output of each channel was time-shifted relative to other channels, varying the channel delay across channels from 0 to 240 ms (in 40-ms steps). Within each delay condition, all channels were desynchronized such that the cross-channel delays between adjacent channels were maximized, thereby avoiding local pockets of channel synchrony. Results show no significant difference between the 4- and 16-channel full-spectrum speech processor for normal-hearing listeners. Recognition scores dropped significantly only when the maximum delay reached 200 ms for the 4-channel processor and 240 ms for the 16-channel processor. When fine spectral structures were removed in the noise-band processor, sentence recognition dropped significantly when the maximum delay was 160 ms for the 16-channel noise-band processor and 40 ms for the 4-channel noise-band processor. There was no significant difference between implant listeners using the 4-channel CIS processor and normal-hearing listeners using the 4-channel noise-band processor. The results imply that when fine spectral structures are not available, as in the implant listener's case, increased spectral resolution is important for overcoming cross-channel asynchrony in speech signals.  相似文献   

15.
Background noise reduces the depth of the low-frequency envelope modulations known to be important for speech intelligibility. The relative strength of the target and masker envelope modulations can be quantified using a modulation signal-to-noise ratio, (S/N)(mod), measure. Such a measure can be used in noise-suppression algorithms to extract target-relevant modulations from the corrupted (target + masker) envelopes for potential improvement in speech intelligibility. In the present study, envelopes are decomposed in the modulation spectral domain into a number of channels spanning the range of 0-30 Hz. Target-dominant modulations are identified and retained in each channel based on the (S/N)(mod) selection criterion, while modulations which potentially interfere with perception of the target (i.e., those dominated by the masker) are discarded. The impact of modulation-selective processing on the speech-reception threshold for sentences in noise is assessed with normal-hearing listeners. Results indicate that the intelligibility of noise-masked speech can be improved by as much as 13 dB when preserving target-dominant modulations, present up to a modulation frequency of 18 Hz, while discarding masker-dominant modulations from the mixture envelopes.  相似文献   

16.
This study tested the hypothesis that a detection advantage for gaps in comodulated noise relative to random noise can be demonstrated in conditions of continuous noise and salient envelope fluctuations. Experiment 1 used five 25-Hz-wide bands of Gaussian noise, low-fluctuation noise, and a noise with increased salience of the inherent fluctuations (staccato noise). The bands were centered at 444, 667, 1000, 1500, and 2250 Hz, with the gap signal always inserted in the 1000-Hz band. Results indicated that a gap detection advantage existed in continuous comodulated noise only for Gaussian and staccato noise. Experiment 2 demonstrated that the advantage did not exist for gated presentation. This experiment also showed that the advantage bore some similarity to comodulation masking release. However, differences were also noted in terms of the effects of the number of flanking bands and the absence of a detection advantage in gated conditions. The detrimental effect of a gated flanking band was less pronounced for a comodulated band than for a random band. This study indicates that, under some conditions, a detection advantage for gaps carried by a narrow band of noise can occur in the presence of comodulated flanking bands of noise.  相似文献   

17.
The present study investigated the effect of envelope modulations in a background masker on consonant recognition by normal hearing listeners. It is well known that listeners understand speech better under a temporally modulated masker than under a steady masker at the same level, due to masking release. The possibility of an opposite phenomenon, modulation interference, whereby speech recognition could be degraded by a modulated masker due to interference with auditory processing of the speech envelope, was hypothesized and tested under various speech and masker conditions. It was of interest whether modulation interference for speech perception, if it were observed, could be predicted by modulation masking, as found in psychoacoustic studies using nonspeech stimuli. Results revealed that masking release measurably occurred under a variety of conditions, especially when the speech signal maintained a high degree of redundancy across several frequency bands. Modulation interference was also clearly observed under several circumstances when the speech signal did not contain a high redundancy. However, the effect of modulation interference did not follow the expected pattern from psychoacoustic modulation masking results. In conclusion, (1) both factors, modulation interference and masking release, should be accounted for whenever a background masker contains temporal fluctuations, and (2) caution needs to be taken when psychoacoustic theory on modulation masking is applied to speech recognition.  相似文献   

18.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

19.
This paper focuses on masking speech with meaningless steady noise as a way of realizing a comfortable sound environment. As a basis for research, meaningless steady noise at minimum sound pressure levels for masking of male or female meaningful speech is considered, based on psychological experiments using a method of adjustment. From the results, band-limited pink noise can be selected as the most effective noise for masking of speech. In the case of speech with a lower sound pressure level, the sound pressure level of the meaningless steady noise needs to be a little higher.  相似文献   

20.
Recent simulations of continuous interleaved sampling (CIS) cochlear implant speech processors have used acoustic stimulation that provides only weak cues to pitch, periodicity, and aperiodicity, although these are regarded as important perceptual factors of speech. Four-channel vocoders simulating CIS processors have been constructed, in which the salience of speech-derived periodicity and pitch information was manipulated. The highest salience of pitch and periodicity was provided by an explicit encoding, using a pulse carrier following fundamental frequency for voiced speech, and a noise carrier during voiceless speech. Other processors included noise-excited vocoders with envelope cutoff frequencies of 32 and 400 Hz. The use of a pulse carrier following fundamental frequency gave substantially higher performance in identification of frequency glides than did vocoders using envelope-modulated noise carriers. The perception of consonant voicing information was improved by processors that preserved periodicity, and connected discourse tracking rates were slightly faster with noise carriers modulated by envelopes with a cutoff frequency of 400 Hz compared to 32 Hz. However, consonant and vowel identification, sentence intelligibility, and connected discourse tracking rates were generally similar through all of the processors. For these speech tasks, pitch and periodicity beyond the weak information available from 400 Hz envelope-modulated noise did not contribute substantially to performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号