期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Integration of spectral and temporal cues in discrimination of nonspeech sounds: a psychoacoustic analysis

B Espinoza-Varas 《The Journal of the Acoustical Society of America》1983,74(6):1687-1694

This study presents a psychoacoustic analysis of the integration of spectral and temporal cues in the discrimination of simple nonspeech sounds. The experimental task was a same-different discrimination between a standard and a comparison pair of tones. Each pair consists of two 80-ms, 1500-Hz tone bursts separated by a 60-ms interval. The just-discriminable (d' = 2.0) increment in duration delta t, of one of the bursts was measured as a function of increments in the frequency delta f, of the other burst. A trade off between the values of delta t and delta f required to perform at d' = 2.0 was observed, which suggests that listeners integrate the evidence from the two dimensions. Integration occurred with both sub- and supra-threshold values of delta t or delta f, regardless of the order in which the cues were presented. The performance associated to the integration of cues was found to be determined by the discriminability of delta t plus that of delta f, and thus, it is within the psychophysical limits of auditory processing. To a first approximation the results agreed with the prediction of orthogonal vector summation of evidence stemming from signal detection theory. It is proposed that the ability to integrate spectral and temporal cues is in the repertoire of auditory processing capabilities. This integration does not appear to depend on perceiving sounds as members of phonetic classes. 相似文献

2.

Role of spectral and temporal cues in restoring missing speech information

Gilbert G Lorenzi C 《The Journal of the Acoustical Society of America》2010,128(5):EL294-EL299

This study measured the role of spectral details and temporal envelope (E) and fine structure (TFS) cues in reconstructing sentences from speech fragments. Four sets of sentences were processed using a 32-band vocoder. Twenty one bands were either processed or removed, leading to sentences differing in their amount of spectral details, E and TFS information. These sentences remained perfectly intelligible, but intelligibility significantly fell after the introduction of periodic silent gaps of 120-ms. While the role of E was unclear, the results unambiguously showed that TFS cues and spectral details influence the ability to reconstruct interrupted sentences. 相似文献

3.

Relative contributions of spectral and temporal cues for phoneme recognition 总被引：4，自引：0，他引：4

Xu L Thompson CS Pfingst BE 《The Journal of the Acoustical Society of America》2005,117(5):3255-3267

Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features. 相似文献

4.

Relative contributions of temporal and place pitch cues to fundamental frequency discrimination in cochlear implantees

Laneau J Wouters J Moonen M 《The Journal of the Acoustical Society of America》2004,116(6):3606-3619

The effect of the filter bank on fundamental frequency (F0) discrimination was examined in four Nucleus CI24 cochlear implant subjects for synthetic stylized vowel-like stimuli. The four tested filter banks differed in cutoff frequencies, amount of overlap between filters, and shape of the filters. To assess the effects of temporal pitch cues on F0 discrimination, temporal fluctuations were removed above 10 Hz in one condition and above 200 Hz in another. Results indicate that F0 discrimination based upon place pitch cues is possible, but just-noticeable differences exceed 1 octave or more depending on the filter bank used. Increasing the frequency resolution in the F0 range improves the F0 discrimination based upon place pitch cues. The results of F0 discrimination based upon place pitch agree with a model that compares the centroids of the electrical excitation pattern. The addition of temporal fluctuations up to 200 Hz significantly improves F0 discrimination. Just-noticeable differences using both place and temporal pitch cues range from 6% to 60%. Filter banks that do not resolve the higher harmonics provided the best temporal pitch cues, because temporal pitch cues are clearest when the fluctuation on all channels is at F0 and preferably in phase. 相似文献

5.

A spectral/temporal method for robust fundamental frequency tracking

Zahorian SA Hu H 《The Journal of the Acoustical Society of America》2008,123(6):4559-4571

In this paper, a fundamental frequency (F(0)) tracking algorithm is presented that is extremely robust for both high quality and telephone speech, at signal to noise ratios ranging from clean speech to very noisy speech. The algorithm is named "YAAPT," for "yet another algorithm for pitch tracking." The algorithm is based on a combination of time domain processing, using the normalized cross correlation, and frequency domain processing. Major steps include processing of the original acoustic signal and a nonlinearly processed version of the signal, the use of a new method for computing a modified autocorrelation function that incorporates information from multiple spectral harmonic peaks, peak picking to select multiple F(0) candidates and associated figures of merit, and extensive use of dynamic programming to find the "best" track among the multiple F(0) candidates. The algorithm was evaluated by using three databases and compared to three other published F(0) tracking algorithms by using both high quality and telephone speech for various noise conditions. For clean speech, the error rates obtained are comparable to those obtained with the best results reported for any other algorithm; for noisy telephone speech, the error rates obtained are lower than those obtained with other methods. 相似文献

6.

The effect of smoothing filter slope and spectral frequency on temporal speech information

Healy EW Steinbach HM 《The Journal of the Acoustical Society of America》2007,121(2):1177-1181

It is known that information contained within the filter skirts can provide cues important to speech intelligibility. However, the role of filter slope during temporal smoothing has received little attention. In experiment 1, smoothing filter slope angle was found to have a large effect on the intelligibility of sentences represented by three amplitude-modulated sinusoids. In experiment 2, the use of temporal cues above 16 Hz was examined across various regions of the spectrum. When increases in rate were presented to individual spectral bands, intelligibility only increased when presented in the higher spectral region. This result suggests a greater reliance on higher-rate cues in this region. However, intelligibility was greatest when these cues were distributed across the spectrum, indicating that their effective use is not restricted solely to this region. 相似文献

7.

The role of time and place cues in the detection of frequency modulation by hearing-impaired listeners

Ernst SM Moore BC 《The Journal of the Acoustical Society of America》2012,131(6):4722-4731

Frequency modulation detection limens (FMDLs) were measured for five hearing-impaired (HI) subjects for carrier frequencies f(c) = 1000, 4000, and 6000 Hz, using modulation frequencies f(m) = 2 and 10 Hz and levels of 20 dB sensation level and 90 dB SPL. FMDLs were smaller for f(m) = 10 than for f(m) = 2 Hz for the two higher f(c), but not for f(c) = 1000 Hz. FMDLs were also determined with additional random amplitude modulation (AM), to disrupt excitation-pattern cues. The disruptive effect was larger for f(m) = 10 than for f(m) = 2 Hz. The smallest disruption occurred for f(m) = 2 Hz and f(c) = 1000 Hz. AM detection thresholds for normal-hearing and HI subjects were measured for the same f(c) and f(m) values. Performance was better for the HI subjects for both f(m). AM detection was much better for f(m) = 10 than for f(m) = 2 Hz. Additional tests showed that most HI subjects could discriminate temporal fine structure (TFS) at 800 Hz. The results are consistent with the idea that, for f(m) = 2 Hz and f(c) = 1000 Hz, frequency modulation (FM) detection was partly based on the use of TFS information. For higher carrier frequencies and for all carrier frequencies with f(m) = 10 Hz, FM detection was probably based on place cues. 相似文献

8.

The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task.

J Vliegen B C Moore A J Oxenham 《The Journal of the Acoustical Society of America》1999,106(2):938-945

In a previous paper, it was shown that sequential stream segregation could be based on both spectral information and periodicity information, if listeners were encouraged to hear segregation [Vliegen and Oxenham, J. Acoust. Soc. Am. 105, 339-346 (1999)]. The present paper investigates whether segregation based on periodicity information alone also occurs when the task requires integration. This addresses the question: Is segregation based on periodicity automatic and obligatory? A temporal discrimination task was used, as there is evidence that it is difficult to compare the timing of auditory events that are perceived as being in different perceptual streams. An ABA ABA ABA... sequence was used, in which tone B could be either exactly at the temporal midpoint between two successive tones A or slightly delayed. The tones A and B were of three types: (1) both pure tones; (2) both complex tones filtered through a fixed passband so as to contain only harmonics higher than the 10th, thereby eliminating detectable spectral differences, where only the fundamental frequency (f0) was varied between tones A and B; and (3) both complex tones with the same f0, but where the center frequency of the spectral passband varied between tones. Tone A had a fixed frequency of 300 Hz (when A and B were pure tones) or a fundamental frequency (f0) of 100 Hz (when A and B were complex tones). Five different intervals, ranging from 1 to 18 semitones, were used. The results for all three conditions showed that shift thresholds increased with increasing interval between tones A and B, but the effect was largest for the conditions where A and B differed in spectrum (i.e., the pure-tone and the variable-center-frequency conditions). The results suggest that spectral information is dominant in inducing (involuntary) segregation, but periodicity information can also play a role. 相似文献

9.

Role of spectral cues in median plane localization 总被引：6，自引：0，他引：6

F Asano Y Suzuki T Sone 《The Journal of the Acoustical Society of America》1990,88(1):159-168

The role of spectral cues in the sound source to ear transfer function in median plane sound localization is investigated in this paper. At first, transfer functions were measured and analyzed. Then, these transfer functions were used in experiments where sounds from a source on the median plane were simulated and presented to subjects through headphones. In these simulation experiments, the transfer functions were smoothed by ARMA models with different degrees of simplification to investigate the role of microscopic and macroscopic patterns in the transfer functions for median plane localization. The results of the study are summarized as follows: (1) For front-rear judgment, information derived from microscopic peaks and dips in the low-frequency region (below 2 kHz) and the macroscopic patterns in the high-frequency region seems to be utilized; (2) for judgment of elevation angle, major cues exist in the high-frequency region above 5 kHz. The information in macroscopic patterns is utilized instead of that in small peaks and dips. 相似文献

10.

The ability of cochlear implant users to use temporal envelope cues recovered from speech frequency modulation

JH Won C Lorenzi K Nie X Li EM Jameyson WR Drennan JT Rubinstein 《The Journal of the Acoustical Society of America》2012,132(2):1113-1119

Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues. 相似文献

11.

Coherent radiation in time separated fields

V. P. Chebotayev N. M. Dyuba M. I. Skvortsov L. S. Vasilenko 《Applied Physics A: Materials Science & Processing》1978,15(3):319-322

This paper reports on an experimental observation of coherent radiation after interaction with two standing-wave pulses having the time delayT in an absorbing rarefied gas. The radiation appears at the time 2T, 3T,... after the first pulse. The coherent radiation arises due to spacial transfer of polarization and to formation of a spacial polarization harmonic owing to a phase jump after two-photon interaction with the field of the second pulse. We observed intensity oscillations with frequency tuning of the exciting laser, varying the time delay between pulses. The paper was reported at theVth Vavilov Conference, Novosibirsk, USSR (June 1977) 相似文献

12.

Spectral and temporal cues for phoneme recognition in noise

Xu L Zheng Y 《The Journal of the Acoustical Society of America》2007,122(3):1758

Cochlear implant users receive limited spectral and temporal information. Their speech recognition deteriorates dramatically in noise. The aim of the present study was to determine the relative contributions of spectral and temporal cues to speech recognition in noise. Spectral information was manipulated by varying the number of channels from 2 to 32 in a noise-excited vocoder. Temporal information was manipulated by varying the low-pass cutoff frequency of the envelope extractor from 1 to 512 Hz. Ten normal-hearing, native speakers of English participated in tests of phoneme recognition using vocoder processed consonants and vowels under three conditions (quiet, and +6 and 0 dB signal-to-noise ratios). The number of channels required for vowel-recognition performance to plateau increased from 12 in quiet to 16-24 in the two noise conditions. However, for consonant recognition, no further improvement in performance was evident when the number of channels was > or =12 in any of the three conditions. The contribution of temporal cues for phoneme recognition showed a similar pattern in both quiet and noise conditions. Similar to the quiet conditions, there was a trade-off between temporal and spectral cues for phoneme recognition in noise. 相似文献

13.

The role of frequency versus informational cues in uncertain frequency detection

J H Howard A J O'Toole S E Rice 《The Journal of the Acoustical Society of America》1986,79(3):788-791

The role of frequency and informational cues in the detection of tones of uncertain frequency was investigated using the probe-signal method [G.Z. Greenberg and W.D. Larkin, J. Acoust. Soc. Am. 44, 1513-1523 (1968)] with auditory and visual test patterns. Patterns consisted of a sequence of events, either tones or visual stimuli, preceding the tone to be detected. Both frequency and informational cues were available in the auditory patterns, whereas only informational cues were available in the visual patterns. Results indicated that observers in the auditory condition displayed trial-by-trial selective attention to one or another frequency band as a function of the cues provided by earlier pattern components. In contrast, listeners in the visual condition displayed simultaneous attention to the two separate frequency bands that could possibly contain the tone, regardless of the information provided by the cues. Results are discussed in terms of single- and multiple-band models of attention. 相似文献

14.

Target spectral, dynamic spectral, and duration cues in infant perception of German vowels

Bohn OS Polka L 《The Journal of the Acoustical Society of America》2001,110(1):504-515

Previous studies of vowel perception have shown that adult speakers of American English and of North German identify native vowels by exploiting at least three types of acoustic information contained in consonant-vowel-consonant (CVC) syllables: target spectral information reflecting the articulatory target of the vowel, dynamic spectral information reflecting CV- and -VC coarticulation, and duration information. The present study examined the contribution of each of these three types of information to vowel perception in prelingual infants and adults using a discrimination task. Experiment 1 examined German adults' discrimination of four German vowel contrasts (see text), originally produced in /dVt/ syllables, in eight experimental conditions in which the type of vowel information was manipulated. Experiment 2 examined German-learning infants' discrimination of the same vowel contrasts using a comparable procedure. The results show that German adults and German-learning infants appear able to use either dynamic spectral information or target spectral information to discriminate contrasting vowels. With respect to duration information, the removal of this cue selectively affected the discriminability of two of the vowel contrasts for adults. However, for infants, removal of contrastive duration information had a larger effect on the discrimination of all contrasts tested. 相似文献

15.

Importance of temporal-envelope speech cues in different spectral regions

Ardoint M Agus T Sheft S Lorenzi C 《The Journal of the Acoustical Society of America》2011,130(2):EL115-EL121

This study investigated the ability to use temporal-envelope (E) cues in a consonant identification task when presented within one or two frequency bands. Syllables were split into five bands spanning the range 70-7300 Hz with each band processed to preserve E cues and degrade temporal fine-structure cues. Identification scores were measured for normal-hearing listeners in quiet for individual processed bands and for pairs of bands. Consistent patterns of results were obtained in both the single- and dual-band conditions: identification scores increased systematically with band center frequency, showing that E cues in the higher bands (1.8-7.3 kHz) convey greater information. 相似文献

16.

Speaker recognition with temporal cues in acoustic and electric hearing

Vongphoe M Zeng FG 《The Journal of the Acoustical Society of America》2005,118(2):1055-1061

Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users. 相似文献

17.

Influence of monaural spectral cues on binaural localization 总被引：2，自引：0，他引：2

A D Musicant R A Butler 《The Journal of the Acoustical Society of America》1985,77(1):202-208

Seven subjects located, monaurally and binaurally, narrow bands of noise originating in the horizontal plane. The stimuli were 1.0 kHz wide and centered at 4.0-14.0 kHz in steps of 0.5 kHz. The loudspeakers, 15 deg apart, were arranged in a semicircle (0-270-180 deg, azimuth). In the first part of the experiment all sounds emanated from the loudspeaker at 270 deg, but their apparent locations varied widely as a function of their center frequency. For each subject, the pattern of location judgments under the binaural listening condition corresponded to that recorded for the monaural condition. In the second part of the experiment the loudspeaker from which each of the same narrow bands of noise emanated was varied in irregular order. Again, monaural location judgments were governed by the frequency content of the noise bands. Binaural location judgments were strongly influenced by the sounds' frequency composition when the stimuli originated from 315-225 deg, notwithstanding the presence of interaural differences in time and intensity. For narrow bands of noise emanating off midline, monaural spectral cues significantly override binaural difference cues, and they also determine the resolution of front-back ambiguities. 相似文献

18.

Vongpaisal T Trehub SE Glenn Schellenberg E van Lieshout P 《The Journal of the Acoustical Society of America》2012,131(1):501-508

Temporal information provided by cochlear implants enables successful speech perception in quiet, but limited spectral information precludes comparable success in voice perception. Talker identification and speech decoding by young hearing children (5-7 yr), older hearing children (10-12 yr), and hearing adults were examined by means of vocoder simulations of cochlear implant processing. In Experiment 1, listeners heard vocoder simulations of sentences from a man, woman, and girl and were required to identify the talker from a closed set. Younger children identified talkers more poorly than older listeners, but all age groups showed similar benefit from increased spectral information. In Experiment 2, children and adults provided verbatim repetition of vocoded sentences from the same talkers. The youngest children had more difficulty than older listeners, but all age groups showed comparable benefit from increasing spectral resolution. At comparable levels of spectral degradation, performance on the open-set task of speech decoding was considerably more accurate than on the closed-set task of talker identification. Hearing children's ability to identify talkers and decode speech from spectrally degraded material sheds light on the difficulty of these domains for child implant users. 相似文献

19.

Contribution of spectral cues to human sound localization 总被引：1，自引：0，他引：1

Langendijk EH Bronkhorst AW 《The Journal of the Acoustical Society of America》2002,112(4):1583-1596

The contribution of spectral cues to human sound localization was investigated by removing cues in 1/2-, 1- or 2-octave bands in the frequency range above 4 kHz. Localization responses were given by placing an acoustic pointer at the same apparent position as a virtual target. The pointer was generated by filtering a 100-ms harmonic complex with equalized head-related transfer functions (HRTFs). Listeners controlled the pointer via a hand-held stick that rotated about a fixed point. In the baseline condition, the target, a 200-ms noise burst, was filtered with the same HRTFs as the pointer. In other conditions, the spectral information within a certain frequency band was removed by replacing the directional transfer function within this band with the average transfer of this band. Analysis of the data showed that removing cues in 1/2-octave bands did not affect localization, whereas for the 2-octave band correct localization was virtually impossible. The results obtained for the 1-octave bands indicate that up-down cues are located mainly in the 6-12-kHz band, and front-back cues in the 8-16-kHz band. The interindividual spread in response patterns suggests that different listeners use different localization cues. The response patterns in the median plane can be predicted using a model based on spectral comparison of directional transfer functions for target and response directions. 相似文献

20.

Dichotic speech recognition in noise using reduced spectral cues

Loizou PC Mani A Dorman MF 《The Journal of the Acoustical Society of America》2003,114(1):475-483

It is generally accepted that the fusion of two speech signals presented dichotically is affected by the relative onset time. This study investigated the hypothesis that spectral resolution might be an additional factor influencing spectral fusion when the spectral information is split and presented dichotically to the two ears. To produce speech with varying degrees of spectral resolution, speech materials embedded in +5 dB S/N speech-shaped noise were processed through 6-12 channels and synthesized as a sum of sine waves. Two different methods of splitting the spectral information were investigated. In the first method, the odd-index channels were presented to one ear and the even-index channels to the other ear. In the second method the lower frequency channels were presented to one ear and the high-frequency channels to the other ear. Results indicated that spectral resolution did affect spectral fusion, and the effect differed across speech materials, with the sentences being affected the most. Sentences, processed through six or eight channels and presented dichotically in the low-high frequency condition were not fused as accurately as when presented monaurally. Sentences presented dichotically in the odd-even frequency condition were identified more accurately than when presented in the low-high condition. 相似文献