首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The contribution of temporal fine structure (TFS) information to co-modulation masking release (CMR) was examined by comparing CMR obtained with unprocessed or vocoded stimuli. Tone thresholds were measured in the presence of a sinusoidally amplitude-modulated on-frequency band (OFB) of noise and zero, two, or four flanking bands (FBs) of noise whose envelopes were either co- or anti-modulated with the OFB envelope. Vocoding replaced the TFS of the tone and masker with unrelated TFS of noise or sinusoidal carriers. Maximum CMR of 11 dB was found as the difference between the co- and anti-modulated conditions for unprocessed stimuli. After vocoding, tone thresholds increased by 7 dB, and CMR was reduced to about 4 dB but remained significant. The magnitude of CMR was similar for both the sine and the noise vocoder. Co-modulation improved detection in the vocoded condition despite the absence of the tone-masker TFS interactions; thus CMR appears to be a robust mechanism based on across-frequency processing. TFS information appears to contribute to across-channel CMR since the magnitude of CMR was significantly reduced after vocoding. Since CMR was evidenced despite vocoding, it is hoped that co-modulation would also improve detection in cochlear-implant listening.  相似文献   

2.
The present study assessed the relative contribution of the "target" and "masker" temporal fine structure (TFS) when identifying consonants. Accordingly, the TFS of the target and that of the masker were manipulated simultaneously or independently. A 30 band vocoder was used to replace the original TFS of the stimuli with tones. Four masker types were used. They included a speech-shaped noise, a speech-shaped noise modulated by a speech envelope, a sentence, or a sentence played backward. When the TFS of the target and that of the masker were disrupted simultaneously, consonant recognition dropped significantly compared to the unprocessed condition for all masker types, except the speech-shaped noise. Disruption of only the target TFS led to a significant drop in performance with all masker types. In contrast, disruption of only the masker TFS had no effect on recognition. Overall, the present data are consistent with previous work showing that TFS information plays a significant role in speech recognition in noise, especially when the noise fluctuates over time. However, the present study indicates that listeners rely primarily on TFS information in the target and that the nature of the masker TFS has a very limited influence on the outcome of the unmasking process.  相似文献   

3.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

4.
Three experiments were designed to provide psychophysical evidence for the existence of envelope information in the temporal fine structure (TFS) of stimuli that were originally amplitude modulated (AM). The original stimuli typically consisted of the sum of a sinusoidally AM tone and two unmodulated tones so that the envelope and TFS could be determined a priori. Experiment 1 showed that normal-hearing listeners not only perceive AM when presented with the Hilbert fine structure alone but AM detection thresholds are lower than those observed when presenting the original stimuli. Based on our analysis, envelope recovery resulted from the failure of the decomposition process to remove the spectral components related to the original envelope from the TFS and the introduction of spectral components related to the original envelope, suggesting that frequency- to amplitude-modulation conversion is not necessary to recover envelope information from TFS. Experiment 2 suggested that these spectral components interact in such a way that envelope fluctuations are minimized in the broadband TFS. Experiment 3 demonstrated that the modulation depth at the original carrier frequency is only slightly reduced compared to the depth of the original modulator. It also indicated that envelope recovery is not specific to the Hilbert decomposition.  相似文献   

5.
The study of speech from which the temporal fine structure (TFS) has been removed has become an important research area. Common procedures for removing TFS include noise and tone vocoders. In the noise vocoder, bands of noise are modulated by the envelope of the speech within each band, and in the tone vocoder the carrier is a sinusoid at the center of each frequency band. Five different procedures for removing TFS are evaluated in this paper: the noise vocoder, a low-noise noise approach in which the noise envelope is replaced by the speech envelope in each frequency band, phase randomization within each band, the tone vocoder, and sinusoidal modeling with random phase. The effects of TFS modification on the speech envelope are evaluated using an index based on the envelope time-frequency modulation. The results show that for all of the TFS techniques implemented in this study, there is a substantial loss in the accuracy of reproduction of the envelope time-frequency modulation. The tone vocoder gives the best accuracy, followed by the procedure that replaces the noise envelope with the speech envelope in each band.  相似文献   

6.
The precedence effect (PE) describes the ability to localize a direct, leading sound correctly when its delayed copy (lag) is present, though not separately audible. The relative contribution of binaural cues in the temporal fine structure (TFS) of lead-lag signals was compared to that of interaural level differences (ILDs) and interaural time differences (ITDs) carried in the envelope. In a localization dominance paradigm participants indicated the spatial location of lead-lag stimuli processed with a binaural noise-band vocoder whose noise carriers introduced random TFS. The PE appeared for noise bursts of 10 ms duration, indicating dominance of envelope information. However, for three test words the PE often failed even at short lead-lag delays, producing two images, one toward the lead and one toward the lag. When interaural correlation in the carrier was increased, the images appeared more centered, but often remained split. Although previous studies suggest dominance of TFS cues, no image is lateralized in accord with the ITD in the TFS. An interpretation in the context of auditory scene analysis is proposed: By replacing the TFS with that of noise the auditory system loses the ability to fuse lead and lag into one object, and thus to show the PE.  相似文献   

7.
The fused low pitch evoked by complex tones containing only unresolved high-frequency components demonstrates the ability of the human auditory system to extract pitch using a temporal mechanism in the absence of spectral cues. However, the temporal features used by such a mechanism have been a matter of debate. For stimuli with components lying exclusively in high-frequency spectral regions, the slowly varying temporal envelope of sounds is often assumed to be the only information contained in auditory temporal representations, and it has remained controversial to what extent the fast amplitude fluctuations, or temporal fine structure (TFS), of the conveyed signal can be processed. Using a pitch matching paradigm, the present study found that the low pitch of inharmonic transposed tones with unresolved components was consistent with the timing between the most prominent TFS maxima in their waveforms, rather than envelope maxima. Moreover, envelope cues did not take over as the absolute frequency or rank of the lowest component was raised and TFS cues thus became less effective. Instead, the low pitch became less salient. This suggests that complex pitch perception does not rely on envelope coding as such, and that TFS representation might persist at higher frequencies than previously thought.  相似文献   

8.
The ability to segregate two spectrally and temporally overlapping signals based on differences in temporal envelope structure and binaural cues was investigated. Signals were a harmonic tone complex (HTC) with 20 Hz fundamental frequency and a bandpass noise (BPN). Both signals had interaural differences of the same absolute value, but with opposite signs to establish lateralization to different sides of the medial plane, such that their combination yielded two different spatial configurations. As an indication for segregation ability, threshold interaural time and level differences were measured for discrimination between these spatial configurations. Discrimination based on interaural level differences was good, although absolute thresholds depended on signal bandwidth and center frequency. Discrimination based on interaural time differences required the signals' temporal envelope structures to be sufficiently different. Long-term interaural cross-correlation patterns or long-term averaged patterns after equalization-cancellation of the combined signals did not provide information for the discrimination. The binaural system must, therefore, have been capable of processing changes in interaural time differences within the period of the harmonic tone complex, suggesting that monaural information from the temporal envelopes influences the use of binaural information in the perceptual organization of signal components.  相似文献   

9.
Detection thresholds were determined for signals consisting of one, two, or five noise bands embedded in eight "cue" bands. All of the noise bands were 100 Hz wide. The center frequencies of the signal bands ranged from 1250-3250 Hz in 500-Hz steps, and those of the cue bands ranged from 500-4000 Hz in 500-Hz steps. The multiple-band signals either all had the same temporal envelope, or all had different temporal envelopes. Similarly, the cue bands either all had the same temporal envelope or all had different temporal envelopes. In separate listening conditions, signal thresholds were determined for various combinations of the temporal envelope patterns of the signal and cue bands. The results were analyzed both in terms of differences in threshold across listening conditions, and in terms of changes in threshold within a listening condition, as the number of signal bands was increased. For both the single- and multiple-band signals, performance was best when the signal band(s) had a different envelope from the common envelope of the cue bands, and performance was worst when either the cue bands all had different envelopes, or the signal and cue bands all shared the same envelope. The thresholds of the multiple-band signals were better fitted by an independent-thresholds model than by a statistical-summation model. However, neither model predicted thresholds uniformly well in all listening conditions. The results are discussed in terms of both "within-channel" and "across-channel" models.  相似文献   

10.
Normal-hearing listeners receive less benefit from momentary dips in the level of a fluctuating masker for speech processed to degrade spectral detail or temporal fine structure (TFS) than for unprocessed speech. This has been interpreted as evidence that the magnitude of the fluctuating-masker benefit (FMB) reflects the ability to resolve spectral detail and TFS. However, the FMB for degraded speech is typically measured at a higher signal-to-noise ratio (SNR) to yield performance similar to normal speech for the baseline (stationary-noise) condition. Because the FMB decreases with increasing SNR, this SNR difference might account for the reduction in FMB for degraded speech. In this study, the FMB for unprocessed and processed (TFS-removed or spectrally smeared) speech was measured in a paradigm that adjusts word-set size, rather than SNR, to equate stationary-noise performance across processing conditions. Compared at the same SNR and percent-correct level (but with different set sizes), processed and unprocessed stimuli yielded a similar FMB for four different fluctuating maskers (speech-modulated noise, one opposite-gender interfering talker, two same-gender interfering talkers, and 16-Hz interrupted noise). These results suggest that, for these maskers, spectral or TFS distortions do not directly impair the ability to benefit from momentary dips in masker level.  相似文献   

11.
Within an auditory channel, the speech waveform contains both temporal envelope (E(O)) and temporal fine structure (TFS) information. Vocoder processing extracts a modified version of the temporal envelope (E') within each channel and uses it to modulate a channel carrier. The resulting signal, E'(Carr), has reduced information content compared to the original "E(O)?+ TFS" signal. The dynamic range over which listeners make additional use of E(O)?+ TFS over E'(Carr) cues was investigated in a competing-speech task. The target-and-background mixture was processed using a 30-channel vocoder. In each channel, E(O)?+ TFS replaced E'(Carr) at either the peaks or the valleys of the signal. The replacement decision was based on comparing the short-term channel level to a parametrically varied "switching threshold," expressed relative to the long-term channel level. Intelligibility was measured as a function of switching threshold, carrier type, target-to-background ratio, and replacement method. Scores showed a dependence on all four parameters. Derived intensity-importance functions (IIFs) showed that E(O)?+ TFS information from 8-13 dB below to 10 dB above the channel long-term level was important. When E(O)?+ TFS information was added at the peaks, IIFs peaked around -2 dB, but when E(O)?+ TFS information was added at the valleys, the peaks lay around +1 dB.  相似文献   

12.
This study measured the role of spectral details and temporal envelope (E) and fine structure (TFS) cues in reconstructing sentences from speech fragments. Four sets of sentences were processed using a 32-band vocoder. Twenty one bands were either processed or removed, leading to sentences differing in their amount of spectral details, E and TFS information. These sentences remained perfectly intelligible, but intelligibility significantly fell after the introduction of periodic silent gaps of 120-ms. While the role of E was unclear, the results unambiguously showed that TFS cues and spectral details influence the ability to reconstruct interrupted sentences.  相似文献   

13.
The intelligibility of speech signals processed to retain either temporal envelope (E) or fine structure (TFS) cues within 16 0.4-oct-wide frequency bands was evaluated when processed stimuli were periodically interrupted at different rates. The interrupted E- and TFS-coded stimuli were highly intelligible in all conditions. However, the different patterns of results obtained for E- and TFS-coded speech suggest that the two types of stimuli do not convey identical speech cues. When an effect of interruption rate was observed, the effect occurred at low interruption rates (<8 Hz) and was stronger for E- than TFS-coded speech, suggesting larger involvement of modulation masking with E-coded speech.  相似文献   

14.
The contribution of temporal fine structure (TFS) cues to consonant identification was assessed in normal-hearing listeners with two speech-processing schemes designed to remove temporal envelope (E) cues. Stimuli were processed vowel-consonant-vowel speech tokens. Derived from the analytic signal, carrier signals were extracted from the output of a bank of analysis filters. The "PM" and "FM" processing schemes estimated a phase- and frequency-modulation function, respectively, of each carrier signal and applied them to a sinusoidal carrier at the analysis-filter center frequency. In the FM scheme, processed signals were further restricted to the analysis-filter bandwidth. A third scheme retaining only E cues from each band was used for comparison. Stimuli processed with the PM and FM schemes were found to be highly intelligible (50-80% correct identification) over a variety of experimental conditions designed to affect the putative reconstruction of E cues subsequent to peripheral auditory filtering. Analysis of confusions between consonants showed that the contribution of TFS cues was greater for place than manner of articulation, whereas the converse was observed for E cues. Taken together, these results indicate that TFS cues convey important phonetic information that is not solely a consequence of E reconstruction.  相似文献   

15.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

16.
High-frequency, "transposed" stimuli have been shown to yield enhanced processing of ongoing interaural temporal disparities (ITDs). This paper concerns determining which aspect or aspects of the envelopes of such stimuli mediate enhanced resolution of ITD. Behavioral measures and quantitative analyses utilizing special classes of transposed stimuli show that the "internal" interaural envelope correlation accounts both qualitatively and quantitatively for the enhancement. In contrast, the normalized fourth moment of the envelope (Y), which provides an index of the degree to which the envelopes of high-frequency stimuli fluctuate, does not lead to a successful accounting of the data.  相似文献   

17.
Behavioral responses obtained from chinchillas trained to discriminate a cosine-phase harmonic tone complex from wideband noise indicate that the perception of 'pitch' strength in chinchillas is largely influenced by periodicity information in the stimulus envelope. The perception of 'pitch' strength was examined in chinchillas in a stimulus generalization paradigm after animals had been retrained to discriminate infinitely iterated rippled noise from wideband noise. Retrained chinchillas gave larger behavioral responses to test stimuli having strong fine structure periodicity, but weak envelope periodicity. That is, chinchillas learn to use the information in the fine structure and consequently, their perception of 'pitch' strength is altered. Behavioral responses to rippled noises having similar periodicity strengths, but large spectral differences were also tested. Responses to these rippled noises were similar, suggesting a temporal analysis can be used to account for the behavior. Animals were then retested using the cosine-phase harmonic tone complex as the expected signal stimulus. Generalization gradients returned to those obtained originally in the na?ve condition, suggesting that chinchillas do not remain "fine structure listeners," but rather revert back to being "envelope listeners" when the periodicity strength in the envelope of the expected stimulus is high.  相似文献   

18.
A robust feature extraction technique for phoneme recognition is proposed which is based on deriving modulation frequency components from the speech signal. The modulation frequency components are computed from syllable-length segments of sub-band temporal envelopes estimated using frequency domain linear prediction. Although the baseline features provide good performance in clean conditions, the performance degrades significantly in noisy conditions. In this paper, a technique for noise compensation is proposed where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensation technique suppresses the effect of additive noise in speech. The robustness of the proposed features is further enhanced by the gain normalization technique. The normalized temporal envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are converted into modulation frequency features. These features are used in an automatic phoneme recognition task. Experiments are performed in mismatched train/test conditions where the test data are corrupted with various environmental distortions like telephone channel noise, additive noise, and room reverberation. Experiments are also performed on large amounts of real conversational telephone speech. In these experiments, the proposed features show substantial improvements in phoneme recognition rates compared to other speech analysis techniques. Furthermore, the contribution of various processing stages for robust speech signal representation is analyzed.  相似文献   

19.
Iterated rippled noise (IRN) is generated by a cascade of delay and add (the gain after the delay is 1.0) or delay and subtract (the gain is -1.0) operations. The delay and add/subtract operations impart a spectral ripple and a temporal regularity to the noise. The waveform fine structure is different in these two conditions, but the envelope can be extremely similar. Four experiments were used to determine conditions in which the processing of IRN stimuli might be mediated by the waveform fine structure or by the envelope. In experiments 1 and 3 listeners discriminated among three stimuli in a single-interval task: IRN stimuli generated with the delay and add operations (g = 1.0), IRN stimuli generated using the delay and subtract operations (g = -1.0), and a flat-spectrum noise stimulus. In experiment 2 the listeners were presented two IRN stimuli that differed in delay (4 vs 6 ms) and a flat-spectrum noise stimulus that was not an IRN stimulus. In experiments 1 and 2 both the envelope and waveform fine structure contained the spectral ripple and temporal regularity. In experiment 3 only the envelope had this spectral and temporal structure. In all experiments discrimination was determined as a function of high-pass filtering the stimuli, and listeners could discriminate between the two IRN stimuli up to frequency regions as high as 4000-6000 Hz. Listeners could discriminate the IRN stimuli from the flat-spectrum noise stimulus at even higher frequencies (as high as 8000 Hz), but these discriminations did not appear to depend on the pitch of the IRN stimuli. A control experiment (fourth experiment) suggests that IRN discriminations in high-frequency regions are probably not due entirely to low-frequency nonlinear distortion products. The results of the paper imply that pitch processing of IRN stimuli is based on the waveform fine structure.  相似文献   

20.
Three different waveforms were generated from the same component frequencies by setting the phase of the components so they were either homophasic (all component sinusoids start at 0 degree), diphasic (sinusoids alternate between -45 degrees and + 45 degrees), or heterophasic (starting phase randomly selected). Listeners were asked to rate the saliency of all periodicity pitches they could detect in stimuli which contained 12 or more components at frequencies above the region where pitches were perceived . A major finding was that the highest ratings of fundamental frequency (f1) pitch "strength" were always obtained for homophasic waveforms, which among the test stimuli have the most abrupt envelope fluctuations. In contrast, diphasic and heterophasic waveforms, which have smoother envelopes, yielded lower pitch strength estimates at f1 and higher ratings two octaves above the fundamental. These data indicate that information concerning the stimulus waveform envelope influences the relative prominence of competing pitches evoked by periodicity pitch stimuli. However, no one-to-one correspondence between pitch and waveform periodicity is apparent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号