首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
Two experiments investigated the effects of critical bandwidth and frequency region on the use of temporal envelope cues for speech. In both experiments, spectral details were reduced using vocoder processing. In experiment 1, consonant identification scores were measured in a condition for which the cutoff frequency of the envelope extractor was half the critical bandwidth (HCB) of the auditory filters centered on each analysis band. Results showed that performance is similar to those obtained in conditions for which the envelope cutoff was set to 160 Hz or above. Experiment 2 evaluated the impact of setting the cutoff frequency of the envelope extractor to values of 4, 8, and 16 Hz or to HCB in one or two contiguous bands for an eight-band vocoder. The cutoff was set to 16 Hz for all the other bands. Overall, consonant identification was not affected by removing envelope fluctuations above 4 Hz in the low- and high-frequency bands. In contrast, speech intelligibility decreased as the cutoff frequency was decreased in the midfrequency region from 16 to 4 Hz. The behavioral results were fairly consistent with a physical analysis of the stimuli, suggesting that clearly measurable envelope fluctuations cannot be attenuated without affecting speech intelligibility.  相似文献   

2.
The contribution of temporal fine structure (TFS) cues to consonant identification was assessed in normal-hearing listeners with two speech-processing schemes designed to remove temporal envelope (E) cues. Stimuli were processed vowel-consonant-vowel speech tokens. Derived from the analytic signal, carrier signals were extracted from the output of a bank of analysis filters. The "PM" and "FM" processing schemes estimated a phase- and frequency-modulation function, respectively, of each carrier signal and applied them to a sinusoidal carrier at the analysis-filter center frequency. In the FM scheme, processed signals were further restricted to the analysis-filter bandwidth. A third scheme retaining only E cues from each band was used for comparison. Stimuli processed with the PM and FM schemes were found to be highly intelligible (50-80% correct identification) over a variety of experimental conditions designed to affect the putative reconstruction of E cues subsequent to peripheral auditory filtering. Analysis of confusions between consonants showed that the contribution of TFS cues was greater for place than manner of articulation, whereas the converse was observed for E cues. Taken together, these results indicate that TFS cues convey important phonetic information that is not solely a consequence of E reconstruction.  相似文献   

3.
The present study investigated the hypothesis that the cues for modulation rate discrimination for unresolved spectral components differ as a function of the spectral region occupied by the stimuli. Specifically, it was hypothesized that when components occupy relatively low spectral regions, phase locking both to the fine structure and to the envelope are useful cues. However, as the spectral region occupied by the components increases, phase locking to the fine structure becomes less robust, whereas phase locking to the envelope remains as a potentially strong cue. Observers were asked to detect a decrease in modulation rate for carrier frequencies between 1500 and 6000 Hz. Both amplitude-modulated (AM) and quasifrequency-modulated (QFM) tones were used in order to produce stimuli having strong and weak envelope cues, respectively. Although there were marked individual differences, the results showed an interaction between modulation type and spectral region, with AM and QFM performance being relatively similar at low spectral region, but with QFM showing a steeper reduction in performance as the spectral region of the carrier frequency increased. Overall, the data are consistent with an interpretation that pitch perception for unresolved components depends upon both fine structure and envelope cues, and that the relative importance of these cues depends upon the spectral region occupied by the stimuli.  相似文献   

4.
The fused low pitch evoked by complex tones containing only unresolved high-frequency components demonstrates the ability of the human auditory system to extract pitch using a temporal mechanism in the absence of spectral cues. However, the temporal features used by such a mechanism have been a matter of debate. For stimuli with components lying exclusively in high-frequency spectral regions, the slowly varying temporal envelope of sounds is often assumed to be the only information contained in auditory temporal representations, and it has remained controversial to what extent the fast amplitude fluctuations, or temporal fine structure (TFS), of the conveyed signal can be processed. Using a pitch matching paradigm, the present study found that the low pitch of inharmonic transposed tones with unresolved components was consistent with the timing between the most prominent TFS maxima in their waveforms, rather than envelope maxima. Moreover, envelope cues did not take over as the absolute frequency or rank of the lowest component was raised and TFS cues thus became less effective. Instead, the low pitch became less salient. This suggests that complex pitch perception does not rely on envelope coding as such, and that TFS representation might persist at higher frequencies than previously thought.  相似文献   

5.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

6.
Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.  相似文献   

7.
Speech recognition with altered spectral distribution of envelope cues.   总被引:8,自引:0,他引:8  
Recognition of consonants, vowels, and sentences was measured in conditions of reduced spectral resolution and distorted spectral distribution of temporal envelope cues. Speech materials were processed through four bandpass filters (analysis bands), half-wave rectified, and low-pass filtered to extract the temporal envelope from each band. The envelope from each speech band modulated a band-limited noise (carrier bands). Analysis and carrier bands were manipulated independently to alter the spectral distribution of envelope cues. Experiment I demonstrated that the location of the cutoff frequencies defining the bands was not a critical parameter for speech recognition, as long as the analysis and carrier bands were matched in frequency extent. Experiment II demonstrated a dramatic decrease in performance when the analysis and carrier bands did not match in frequency extent, which resulted in a warping of the spectral distribution of envelope cues. Experiment III demonstrated a large decrease in performance when the carrier bands were shifted in frequency, mimicking the basal position of electrodes in a cochlear implant. And experiment IV showed a relatively minor effect of the overlap in the noise carrier bands, simulating the overlap in neural populations responding to adjacent electrodes in a cochlear implant. Overall, these results show that, for four bands, the frequency alignment of the analysis bands and carrier bands is critical for good performance, while the exact frequency divisions and overlap in carrier bands are not as critical.  相似文献   

8.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

9.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

10.
The role of harmonicity in masking was studied by comparing the effect of harmonic and inharmonic maskers on the masked thresholds of noise probes using a three-alternative, forced-choice method. Harmonic maskers were created by selecting sets of partials from a harmonic series with an 88-Hz fundamental and 45 consecutive partials. Inharmonic maskers differed in that the partial frequencies were perturbed to nearby values that were not integer multiples of the fundamental frequency. Average simultaneous-masked thresholds were as much as 10 dB lower with the harmonic masker than with the inharmonic masker, and this difference was unaffected by masker level. It was reduced or eliminated when the harmonic partials were separated by more than 176 Hz, suggesting that the effect is related to the extent to which the harmonics are resolved by auditory filters. The threshold difference was not observed in a forward-masking experiment. Finally, an across-channel mechanism was implicated when the threshold difference was found between a harmonic masker flanked by harmonic bands and a harmonic masker flanked by inharmonic bands. A model developed to explain the observed difference recognizes that an auditory filter output envelope is modulated when the filter passes two or more sinusoids, and that the modulation rate depends on the differences among the input frequencies. For a harmonic masker, the frequency differences of adjacent partials are identical, and all auditory filters have the same dominant modulation rate. For an inharmonic masker, however, the frequency differences are not constant and the envelope modulation rate varies across filters. The model proposes that a lower variability facilitates detection of a probe-induced change in the variability, thus accounting for the masked threshold difference. The model was supported by significantly improved predictions of observed thresholds when the predictor variables included envelope modulation rate variance measured using simulated auditory filters.  相似文献   

11.
Nowadays, it is widely believed that the temporal structure of the auditory nerve fibers' response to sound stimuli plays an important role in auditory perception. An influential hypothesis is that information is extracted from this temporal structure by neural operations akin to an autocorrelation algorithm. The goal of the present work was to test this hypothesis. The stimuli consisted of sequences of unipolar clicks that were high-pass filtered and mixed with low-pass noise so as to exclude spectral cues. In experiment 1, "interfering" clicks were inserted in an otherwise periodic (isochronous) click sequence. Each click belonging to the periodic sequence was followed, after a random portion of the period, by one interfering click. This disrupted the detection of temporal regularity, even when the interfering clicks were 5 dB less intense than the periodic clicks. Experiments 2-4 used click sequences that showed a single peak in their autocorrelation functions. For some sequences, this peak originated from "first-order" temporal regularities, that is from the temporal relations between consecutive clicks. For other sequences, the peak originated instead from "second-order" regularities, relative to nonconsecutive clicks. The detection of second-order regularities appeared to be much more difficult than the detection of comparable first-order regularities. Overall, these results do not tally with the current autocorrelation models of temporal processing. They suggest that the extraction of temporal information from a group of closely spaced spectral components makes no use of time intervals between nonconsecutive peaks of the amplitude envelope.  相似文献   

12.
The intelligibility of speech signals processed to retain either temporal envelope (E) or fine structure (TFS) cues within 16 0.4-oct-wide frequency bands was evaluated when processed stimuli were periodically interrupted at different rates. The interrupted E- and TFS-coded stimuli were highly intelligible in all conditions. However, the different patterns of results obtained for E- and TFS-coded speech suggest that the two types of stimuli do not convey identical speech cues. When an effect of interruption rate was observed, the effect occurred at low interruption rates (<8 Hz) and was stronger for E- than TFS-coded speech, suggesting larger involvement of modulation masking with E-coded speech.  相似文献   

13.
针对电子耳蜗音调信息感知差的问题,在多导电子耳蜗机理模型的基础上,利用零(或全)相位滤波器以及希尔伯特变换,提取并分解16个通道信号的包络和精细结构,用多种方式嵌合成声音,对正常人耳测听合成音的音调信息,以确定信号的包络和精细结构对音调感知的影响。测试结果表明:精细结构相对包络对音调感知起着更重要的作用,且该作用主要表现在低频段(约1.2 kHz以内)通道上。研究发现,在固定通道上,精细结构与通道中心频率和相位信息决定的包络出现时刻对应,其音调感知与低通道电极刺激的脉冲发放时间有关。研究结论:在低频段电极上增加刺激脉冲发放时刻的控制对提高电子耳蜗音调感知是重要的,同时应注意滤波器的相位保真。   相似文献   

14.
Three experiments were conducted to study relative contributions of speaking rate, temporal envelope, and temporal fine structure to clear speech perception. Experiment I used uniform time scaling to match the speaking rate between clear and conversational speech. Experiment II decreased the speaking rate in conversational speech without processing artifacts by increasing silent gaps between phonetic segments. Experiment III created "auditory chimeras" by mixing the temporal envelope of clear speech with the fine structure of conversational speech, and vice versa. Speech intelligibility in normal-hearing listeners was measured over a wide range of signal-to-noise ratios to derive speech reception thresholds (SRT). The results showed that processing artifacts in uniform time scaling, particularly time compression, reduced speech intelligibility. Inserting gaps in conversational speech improved the SRT by 1.3 dB, but this improvement might be a result of increased short-term signal-to-noise ratios during level normalization. Data from auditory chimeras indicated that the temporal envelope cue contributed more to the clear speech advantage at high signal-to-noise ratios, whereas the temporal fine structure cue contributed more at low signal-to-noise ratios. Taken together, these results suggest that acoustic cues for the clear speech advantage are multiple and distributed.  相似文献   

15.
Behavioral responses obtained from chinchillas trained to discriminate a cosine-phase harmonic tone complex from wideband noise indicate that the perception of 'pitch' strength in chinchillas is largely influenced by periodicity information in the stimulus envelope. The perception of 'pitch' strength was examined in chinchillas in a stimulus generalization paradigm after animals had been retrained to discriminate infinitely iterated rippled noise from wideband noise. Retrained chinchillas gave larger behavioral responses to test stimuli having strong fine structure periodicity, but weak envelope periodicity. That is, chinchillas learn to use the information in the fine structure and consequently, their perception of 'pitch' strength is altered. Behavioral responses to rippled noises having similar periodicity strengths, but large spectral differences were also tested. Responses to these rippled noises were similar, suggesting a temporal analysis can be used to account for the behavior. Animals were then retested using the cosine-phase harmonic tone complex as the expected signal stimulus. Generalization gradients returned to those obtained originally in the na?ve condition, suggesting that chinchillas do not remain "fine structure listeners," but rather revert back to being "envelope listeners" when the periodicity strength in the envelope of the expected stimulus is high.  相似文献   

16.
The present study investigated the effect of envelope modulations in a background masker on consonant recognition by normal hearing listeners. It is well known that listeners understand speech better under a temporally modulated masker than under a steady masker at the same level, due to masking release. The possibility of an opposite phenomenon, modulation interference, whereby speech recognition could be degraded by a modulated masker due to interference with auditory processing of the speech envelope, was hypothesized and tested under various speech and masker conditions. It was of interest whether modulation interference for speech perception, if it were observed, could be predicted by modulation masking, as found in psychoacoustic studies using nonspeech stimuli. Results revealed that masking release measurably occurred under a variety of conditions, especially when the speech signal maintained a high degree of redundancy across several frequency bands. Modulation interference was also clearly observed under several circumstances when the speech signal did not contain a high redundancy. However, the effect of modulation interference did not follow the expected pattern from psychoacoustic modulation masking results. In conclusion, (1) both factors, modulation interference and masking release, should be accounted for whenever a background masker contains temporal fluctuations, and (2) caution needs to be taken when psychoacoustic theory on modulation masking is applied to speech recognition.  相似文献   

17.
Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features.  相似文献   

18.
Recent temporal models of pitch and amplitude modulation perception converge on a relatively realistic implementation of cochlear processing followed by a temporal analysis of periodicity. However, for modulation perception, a modulation filterbank is applied whereas for pitch perception, autocorrelation is applied. Considering the large overlap in pitch and modulation perception, this is not parsimonious. Two experiments are presented to investigate the interaction between carrier periodicity, which produces strong pitch sensations, and envelope periodicity using broadband stimuli. Results show that in the presence of carrier periodicity, detection of amplitude modulation is impaired throughout the tested range (8-1000 Hz). On the contrary, detection of carrier periodicity in the presence of an additional amplitude modulation is impaired only for very low frequencies below the pitch range (<33 Hz). Predictions of a generic implementation of a modulation-filterbank model and an autocorrelation model are compared to the data. Both models were too insensitive to high-frequency envelope or carrier periodicity and to infra-pitch carrier periodicity. Additionally, both models simulated modulation detection quite well but underestimated the detrimental effect of carrier periodicity on modulation detection. It is suggested that a hybrid model consisting of bandpass envelope filters with a ripple in their passband may provide a functionally successful and physiologically plausible basis for a unified model of auditory periodicity extraction.  相似文献   

19.
Dynamic-range compression acting independently at each ear in a bilateral hearing-aid or cochlear-implant fitting can alter interaural level differences (ILDs) potentially affecting spatial perception. The influence of compression on the lateral position of sounds was studied in normal-hearing listeners using virtual acoustic stimuli. In a lateralization task, listeners indicated the leftmost and rightmost extents of the auditory event and reported whether they heard (1) a single, stationary image, (2) a moving/gradually broadening image, or (3) a split image. Fast-acting compression significantly affected the perceived position of high-pass sounds. For sounds with abrupt onsets and offsets, compression shifted the entire image to a more central position. For sounds containing gradual onsets and offsets, including speech, compression increased the occurrence of moving and split images by up to 57 percentage points and increased the perceived lateral extent of the auditory event. The severity of the effects was reduced when undisturbed low-frequency binaural cues were made available. At high frequencies, listeners gave increased weight to ILDs relative to interaural time differences carried in the envelope when compression caused ILDs to change dynamically at low rates, although individual differences were apparent. Specific conditions are identified in which compression is likely to affect spatial perception.  相似文献   

20.
The precedence effect (PE) describes the ability to localize a direct, leading sound correctly when its delayed copy (lag) is present, though not separately audible. The relative contribution of binaural cues in the temporal fine structure (TFS) of lead-lag signals was compared to that of interaural level differences (ILDs) and interaural time differences (ITDs) carried in the envelope. In a localization dominance paradigm participants indicated the spatial location of lead-lag stimuli processed with a binaural noise-band vocoder whose noise carriers introduced random TFS. The PE appeared for noise bursts of 10 ms duration, indicating dominance of envelope information. However, for three test words the PE often failed even at short lead-lag delays, producing two images, one toward the lead and one toward the lag. When interaural correlation in the carrier was increased, the images appeared more centered, but often remained split. Although previous studies suggest dominance of TFS cues, no image is lateralized in accord with the ITD in the TFS. An interpretation in the context of auditory scene analysis is proposed: By replacing the TFS with that of noise the auditory system loses the ability to fuse lead and lag into one object, and thus to show the PE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号