首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
Background noise reduces the depth of the low-frequency envelope modulations known to be important for speech intelligibility. The relative strength of the target and masker envelope modulations can be quantified using a modulation signal-to-noise ratio, (S/N)(mod), measure. Such a measure can be used in noise-suppression algorithms to extract target-relevant modulations from the corrupted (target + masker) envelopes for potential improvement in speech intelligibility. In the present study, envelopes are decomposed in the modulation spectral domain into a number of channels spanning the range of 0-30 Hz. Target-dominant modulations are identified and retained in each channel based on the (S/N)(mod) selection criterion, while modulations which potentially interfere with perception of the target (i.e., those dominated by the masker) are discarded. The impact of modulation-selective processing on the speech-reception threshold for sentences in noise is assessed with normal-hearing listeners. Results indicate that the intelligibility of noise-masked speech can be improved by as much as 13 dB when preserving target-dominant modulations, present up to a modulation frequency of 18 Hz, while discarding masker-dominant modulations from the mixture envelopes.  相似文献   

2.
Recent simulations of continuous interleaved sampling (CIS) cochlear implant speech processors have used acoustic stimulation that provides only weak cues to pitch, periodicity, and aperiodicity, although these are regarded as important perceptual factors of speech. Four-channel vocoders simulating CIS processors have been constructed, in which the salience of speech-derived periodicity and pitch information was manipulated. The highest salience of pitch and periodicity was provided by an explicit encoding, using a pulse carrier following fundamental frequency for voiced speech, and a noise carrier during voiceless speech. Other processors included noise-excited vocoders with envelope cutoff frequencies of 32 and 400 Hz. The use of a pulse carrier following fundamental frequency gave substantially higher performance in identification of frequency glides than did vocoders using envelope-modulated noise carriers. The perception of consonant voicing information was improved by processors that preserved periodicity, and connected discourse tracking rates were slightly faster with noise carriers modulated by envelopes with a cutoff frequency of 400 Hz compared to 32 Hz. However, consonant and vowel identification, sentence intelligibility, and connected discourse tracking rates were generally similar through all of the processors. For these speech tasks, pitch and periodicity beyond the weak information available from 400 Hz envelope-modulated noise did not contribute substantially to performance.  相似文献   

3.
The present study sought to establish whether speech recognition can be disrupted by the presence of amplitude modulation (AM) at a remote spectral region, and whether that disruption depends upon the rate of AM. The goal was to determine whether this paradigm could be used to examine which modulation frequencies in the speech envelope are most important for speech recognition. Consonant identification for a band of speech located in either the low- or high-frequency region was measured in the presence of a band of noise located in the opposite frequency region. The noise was either unmodulated or amplitude modulated by a sinusoid, a band of noise with a fixed absolute bandwidth, or a band of noise with a fixed relative bandwidth. The frequency of the modulator was 4, 16, 32, or 64 Hz. Small amounts of modulation interference were observed for all modulator types, irrespective of the location of the speech band. More important, the interference depended on modulation frequency, clearly supporting the existence of selectivity of modulation interference with speech stimuli. Overall, the results suggest a primary role of envelope fluctuations around 4 and 16 Hz without excluding the possibility of a contribution by faster rates.  相似文献   

4.
Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.  相似文献   

5.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

6.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, the performance degrades significantly in noisy conditions. In this paper, a technique for noise compensation is proposed where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensation technique suppresses the effect of additive noise in speech. The robustness of the proposed features is further enhanced by the gain normalization technique. The normalized temporal envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are converted into modulation frequency features. These features are used in an automatic phoneme recognition task. Experiments are performed in mismatched train/test conditions where the test data are corrupted with various environmental distortions like telephone channel noise, additive noise, and room reverberation. Experiments are also performed on large amounts of real conversational telephone speech. In these experiments, the proposed features show substantial improvements in phoneme recognition rates compared to other speech analysis techniques. Furthermore, the contribution of various processing stages for robust speech signal representation is analyzed.  相似文献   

7.
The role of different modulation frequencies in the speech envelope were studied by means of the manipulation of vowel-consonant-vowel (VCV) syllables. The envelope of the signal was extracted from the speech and the fine-structure was replaced by speech-shaped noise. The temporal envelopes in every critical band of the speech signal were notch filtered in order to assess the relative importance of different modulation frequency regions between 0 and 20 Hz. For this purpose notch filters around three center frequencies (8, 12, and 16 Hz) with three different notch widths (4-, 8-, and 12-Hz wide) were used. These stimuli were used in a consonant-recognition task in which ten normal-hearing subjects participated, and their results were analyzed in terms of recognition scores. More qualitative information was obtained with a multidimensional scaling method (INDSCAL) and sequential information analysis (SINFA). Consonant recognition is very robust for the removal of certain modulation frequency areas. Only when a wide notch around 8 Hz is applied does the speech signal become heavily degraded. As expected, the voicing information is lost, while there are different effects on plosiveness and nasality. Even the smallest filtering has a substantial effect on the transfer of the plosiveness feature, while on the other hand, filtering out only the low-modulation frequencies has a substantial effect on the transfer of nasality cues.  相似文献   

8.
Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies.  相似文献   

9.
This experiment examined the effects of spectral resolution and fine spectral structure on recognition of spectrally asynchronous sentences by normal-hearing and cochlear implant listeners. Sentence recognition was measured in six normal-hearing subjects listening to either full-spectrum or noise-band processors and five Nucleus-22 cochlear implant listeners fitted with 4-channel continuous interleaved sampling (CIS) processors. For the full-spectrum processor, the speech signals were divided into either 4 or 16 channels. For the noise-band processor, after band-pass filtering into 4 or 16 channels, the envelope of each channel was extracted and used to modulate noise of the same bandwidth as the analysis band, thus eliminating the fine spectral structure available in the full-spectrum processor. For the 4-channel CIS processor, the amplitude envelopes extracted from four bands were transformed to electric currents by a power function and the resulting electric currents were used to modulate pulse trains delivered to four electrode pairs. For all processors, the output of each channel was time-shifted relative to other channels, varying the channel delay across channels from 0 to 240 ms (in 40-ms steps). Within each delay condition, all channels were desynchronized such that the cross-channel delays between adjacent channels were maximized, thereby avoiding local pockets of channel synchrony. Results show no significant difference between the 4- and 16-channel full-spectrum speech processor for normal-hearing listeners. Recognition scores dropped significantly only when the maximum delay reached 200 ms for the 4-channel processor and 240 ms for the 16-channel processor. When fine spectral structures were removed in the noise-band processor, sentence recognition dropped significantly when the maximum delay was 160 ms for the 16-channel noise-band processor and 40 ms for the 4-channel noise-band processor. There was no significant difference between implant listeners using the 4-channel CIS processor and normal-hearing listeners using the 4-channel noise-band processor. The results imply that when fine spectral structures are not available, as in the implant listener's case, increased spectral resolution is important for overcoming cross-channel asynchrony in speech signals.  相似文献   

10.
This article presents the results of two experiments investigating performance on a monaural envelope correlation discrimination task. Subjects were asked to discriminate pairs of noise bands that had identical envelopes (referred to as correlated stimuli) from pairs of noise bands that had envelopes which were independent (uncorrelated stimuli). In the first experiment, a number of stimulus parameters were varied: the center frequency of the lower frequency noise band in a pair, f1; the frequency separation between component noise bands; the duration of the stimuli; and the bandwidth of the component noise bands. For a long stimulus duration (500 ms) and a relatively wide bandwidth (100 Hz), subjects could easily discriminate correlated from uncorrelated stimuli for a wide range of frequency separations between the component noise bands. This was true both when f1 was 350 Hz, and when f1 was 2500 Hz. In each case, narrowing the bandwidth to 25 Hz, or shortening the duration to 100 ms, or both, made the task more difficult, but not impossible. In the second experiment, the level of the higher frequency noise band in a pair was varied. Performance did not decrease monotonically as the level of this band was decreased below the level of the other band, and only showed marked impairment when the level of the higher frequency band was at least 60 dB below that of the lower frequency band. The pattern of results in these two experiments is different from that which is obtained when the same stimulus parameters are varied in experiments investigating comodulation masking release (CMR). This suggests that the mechanisms underlying CMR and those underlying the discrimination of envelope correlation are not identical.  相似文献   

11.
A composite auditory model for processing speech sounds   总被引:2,自引:0,他引:2  
A composite inner-ear model, containing the middle ear, basilar membrane (BM), hair cells, and hair-cell/nerve-fiber synapses, is presented. The model incorporates either a linear-BM stage or a nonlinear one. The model with the nonlinear BM generally shows a high degree of success in reproducing the qualitative aspects of experimentally recorded cat auditory-nerve-fiber responses to speech. In modeling fiber population responses to speech and speech in noise, it was found that the BM nonlinearity allows bands of fibers in the model to synchronize strongly to a common spectral peak in the stimulus. A cross-channel correlation algorithm has been devised to further process the model's population outputs. With output from the nonlinear-BM model, the cross-channel correlation values are appreciably reduced only at those channels whose CFs coincide with the formant frequencies. This observation also holds, to a large extent, for noisy speech.  相似文献   

12.
These experiments were designed to examine the mechanism of detection of phase disparity in the envelopes of two sinusoidally amplitude-modulated (AM) sinusoids. Specifically, they were performed to determine whether detection of envelope phase disparity was consistent with processing within a single channel in which the AM tones were simply added. In the first condition, with an 8-Hz modulation frequency, phase-disparity thresholds increased sharply with an initial increase in separation of the carrier frequencies. They then remained approximately constant when the separation was an octave or above. In the second condition, with carrier pairs of 1 and 2 kHz or 1 and 3.2 kHz and a modulation frequency of 8 Hz, thresholds were little affected as the level of one carrier was decreased relative to the other. With a modulation frequency of 128 Hz, for most subjects there was more of an effect of level disparity on thresholds. In the third condition, when the modulation frequency was 8 Hz, subjects showed relatively constant thresholds whether the signals were presented monotically, dichotically, or dichotically with low- and high-pass noise. Dichotic thresholds were typically higher than monotic when the modulation frequency was 128 Hz. These results suggest that it is not necessary to have information available within a single additive channel to detect envelope phase disparity. In certain circumstances, a comparison across channels may be used to detect such disparities.  相似文献   

13.
蒋斌  匡正  吴鸣  杨军 《声学学报》2012,37(6):659-666
实验研究了帧长对汉语音段反转言语可懂度的影响。实验结果表明,帧长在64 ms以下,汉语音段反转言语具有较高的可懂度;帧长在64~203 ms之间,可懂度随帧长的增加逐渐降低;帧长在203 ms以上,可懂度为0。在帧长8 ms时,汉语的声调失真导致可懂度下降。原始语音信号和音段反转言语的调制谱的分析表明,调制谱失真大小和可懂度密切相关。因此,用原始语音信号和音段反转言语的窄带包络间的归一化相关值可以衡量调制谱失真大小,基于语音的语言传输指数法计算的客观值和实验结果显著相关(r=0.876,p<0.01)。研究表明,语言可懂度与窄带包络有关,音段反转言语的可懂度和保留原始语音信号的窄带包络密切相关。   相似文献   

14.
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.  相似文献   

15.
In phonemic restoration, intelligibility of interrupted speech is enhanced when noise fills the speech gaps. When the broadband envelope of missing speech amplitude modulates the intervening noise, intelligibility is even better. However, this phenomenon represents a perceptual failure: The amplitude modulation, a noise feature, is misattributed to the speech. Experiments explored whether object formation influences how information in the speech gaps is perceptually allocated. Experiment 1 replicates the finding that intelligibility is enhanced when speech-modulated noise rather than unmodulated noise is presented in the gaps. In Experiment 2, interrupted speech was presented diotically, but intervening noises were presented either diotically or with an interaural time difference leading in the right ear, causing the noises to be perceived to the side of the listener. When speech-modulated noise and speech are perceived from different directions, intelligibility is no longer enhanced by the modulation. However, perceived location has no effect for unmodulated noise, which contains no speech-derived information. Results suggest that enhancing object formation reduces misallocation of acoustic features across objects, and demonstrate that our ability to understand noisy speech depends on a cascade of interacting processes, including glimpsing sensory inputs, grouping sensory inputs into objects, and resolving ambiguity through top-down knowledge.  相似文献   

16.
Experiments were performed to determine under what conditions quasi-frequency-modulated (QFM) noise and random-sideband noise are suitable comparisons for AM noise in measuring a temporal modulation transfer function (TMTF). Thresholds were measured for discrimination of QFM from random-sideband noise and AM from QFM noise as a function of sideband separation. In the first experiment, the upper spectral edge of the noise stimuli was at 2400 Hz and the bandwidth was 1600 Hz. For sideband separations up to 256 Hz, at threshold sideband levels for discriminating AM from QFM noise, QFM was indiscriminable from random-sideband noise. For the largest sideband separation used (512 Hz), listeners may have used within-stimulus envelope correlation in the QFM noise to discriminate it from the random-sideband noise. Results when stimulus bandwidth was varied suggest that listeners were able to use this cue when the carrier was wider than a critical band, and the sideband separation approached the carrier bandwidth. Within-stimulus envelope correlation was also present in AM noise, and thus QFM noise was a suitable comparison because it made this cue unusable and forced listeners to use across-stimulus envelope differences. When the carrier bandwidth was less than a critical band or was wideband, QFM noise and random-sideband noise were equally suitable comparisons for AM noise. When discrimination thresholds for QFM and random-sideband noise were converted to modulation depth and modulation frequency, they were nearly identical to those for discrimination of AM from QFM noise, suggesting that listeners were using amplitude modulation cues in both cases.  相似文献   

17.
Sinusoidally amplitude-modulated (SAM) noise was monaurally presented to the neotropical frog, Eleutherodactylus coqui, while recording intracellularly from auditory-nerve fibers. Neuronal phase locking was measured to the SAM noise envelope in the form of a period histogram. The modulation depth was changed (in 10% steps) until the threshold modulation depth was determined. This was repeated for various modulation frequencies (20-1200 Hz) and different levels of SAM noise (34-64 dB/Hz). From these data, temporal modulation transfer functions (TMTFs) were produced and minimum integration time (MIT) for each auditory fiber was calculated. The median MIT was 0.42 ms (lower quartile 0.32, upper quartile 0.68 ms). A noise level-dependent effect was noted on the shape of the TMTF as well as the minimum integration time. The latter results may be explained as a loss in spectral resolution with increasing noise level, which is consistent with the correlation that was found between minimum integration time and bandwidth.  相似文献   

18.
Recent temporal models of pitch and amplitude modulation perception converge on a relatively realistic implementation of cochlear processing followed by a temporal analysis of periodicity. However, for modulation perception, a modulation filterbank is applied whereas for pitch perception, autocorrelation is applied. Considering the large overlap in pitch and modulation perception, this is not parsimonious. Two experiments are presented to investigate the interaction between carrier periodicity, which produces strong pitch sensations, and envelope periodicity using broadband stimuli. Results show that in the presence of carrier periodicity, detection of amplitude modulation is impaired throughout the tested range (8-1000 Hz). On the contrary, detection of carrier periodicity in the presence of an additional amplitude modulation is impaired only for very low frequencies below the pitch range (<33 Hz). Predictions of a generic implementation of a modulation-filterbank model and an autocorrelation model are compared to the data. Both models were too insensitive to high-frequency envelope or carrier periodicity and to infra-pitch carrier periodicity. Additionally, both models simulated modulation detection quite well but underestimated the detrimental effect of carrier periodicity on modulation detection. It is suggested that a hybrid model consisting of bandpass envelope filters with a ripple in their passband may provide a functionally successful and physiologically plausible basis for a unified model of auditory periodicity extraction.  相似文献   

19.
The brain can restore missing speech segments using linguistic knowledge and context. The phonemic restoration effect is commonly quantified by the increase in intelligibility of interrupted speech when the silent gaps are filled with noise bursts. In normal hearing, the restoration effect is negatively correlated with the baseline scores with interrupted speech; listeners with poorer baseline show more benefit from restoration. Reanalyzing data from Bas?kent et al. [(2010). Hear. Res. 260, 54-62], correlations with mild and moderate hearing impairment were observed to differ than with normal hearing. This analysis further shows that hearing impairment may affect top-down restoration of speech.  相似文献   

20.
Psychometric functions were measured for the discrimination of the interaural phase difference (IPD) of the envelope of a sinusoidally amplitude-modulated (SAM) 4-kHz pure tone for modulation frequencies of 128 and 300 Hz and modulation depths (m) of 0.2, 0.6, 0.9, and 1.0. Contrary to recent modeling assumptions, it was found that a constant change in normalized interaural envelope correlation, with or without additional model stages to simulate peripheral auditory processing, did not produce a constant level of performance. Rather, in some cases, performance could range from chance to near perfect across modulation depths for a given change in normalized interaural envelope correlation. This was also true for the maximum change in normalized interaural envelope correlation computed across the cross-correlation functions for the stimuli to be discriminated. The change in the interaural time difference (ITD) computed from the IPD accounted for discriminability across modulation depths better than the change in normalized interaural envelope correlation, although ITD could not account for all the data, particularly those for lower values of m.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号