期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech identification based on temporal fine structure cues

Sheft S Ardoint M Lorenzi C 《The Journal of the Acoustical Society of America》2008,124(1):562-575

The contribution of temporal fine structure (TFS) cues to consonant identification was assessed in normal-hearing listeners with two speech-processing schemes designed to remove temporal envelope (E) cues. Stimuli were processed vowel-consonant-vowel speech tokens. Derived from the analytic signal, carrier signals were extracted from the output of a bank of analysis filters. The "PM" and "FM" processing schemes estimated a phase- and frequency-modulation function, respectively, of each carrier signal and applied them to a sinusoidal carrier at the analysis-filter center frequency. In the FM scheme, processed signals were further restricted to the analysis-filter bandwidth. A third scheme retaining only E cues from each band was used for comparison. Stimuli processed with the PM and FM schemes were found to be highly intelligible (50-80% correct identification) over a variety of experimental conditions designed to affect the putative reconstruction of E cues subsequent to peripheral auditory filtering. Analysis of confusions between consonants showed that the contribution of TFS cues was greater for place than manner of articulation, whereas the converse was observed for E cues. Taken together, these results indicate that TFS cues convey important phonetic information that is not solely a consequence of E reconstruction. 相似文献

2.

The effects of the addition of low-level, low-noise noise on the intelligibility of sentences processed to remove temporal envelope information

Hopkins K Moore BC Stone MA 《The Journal of the Acoustical Society of America》2010,128(4):2150-2161

The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low. 相似文献

3.

The role of temporal fine structure information for the low pitch of high-frequency complex tones

Santurette S Dau T 《The Journal of the Acoustical Society of America》2011,129(1):282-292

The fused low pitch evoked by complex tones containing only unresolved high-frequency components demonstrates the ability of the human auditory system to extract pitch using a temporal mechanism in the absence of spectral cues. However, the temporal features used by such a mechanism have been a matter of debate. For stimuli with components lying exclusively in high-frequency spectral regions, the slowly varying temporal envelope of sounds is often assumed to be the only information contained in auditory temporal representations, and it has remained controversial to what extent the fast amplitude fluctuations, or temporal fine structure (TFS), of the conveyed signal can be processed. Using a pitch matching paradigm, the present study found that the low pitch of inharmonic transposed tones with unresolved components was consistent with the timing between the most prominent TFS maxima in their waveforms, rather than envelope maxima. Moreover, envelope cues did not take over as the absolute frequency or rank of the lowest component was raised and TFS cues thus became less effective. Instead, the low pitch became less salient. This suggests that complex pitch perception does not rely on envelope coding as such, and that TFS representation might persist at higher frequencies than previously thought. 相似文献

4.

The dynamic range of useful temporal fine structure cues for speech in the presence of a competing talker

Stone MA Moore BC Füllgrabe C 《The Journal of the Acoustical Society of America》2011,130(4):2162-2172

Within an auditory channel, the speech waveform contains both temporal envelope (E(O)) and temporal fine structure (TFS) information. Vocoder processing extracts a modified version of the temporal envelope (E') within each channel and uses it to modulate a channel carrier. The resulting signal, E'(Carr), has reduced information content compared to the original "E(O)?+ TFS" signal. The dynamic range over which listeners make additional use of E(O)?+ TFS over E'(Carr) cues was investigated in a competing-speech task. The target-and-background mixture was processed using a 30-channel vocoder. In each channel, E(O)?+ TFS replaced E'(Carr) at either the peaks or the valleys of the signal. The replacement decision was based on comparing the short-term channel level to a parametrically varied "switching threshold," expressed relative to the long-term channel level. Intelligibility was measured as a function of switching threshold, carrier type, target-to-background ratio, and replacement method. Scores showed a dependence on all four parameters. Derived intensity-importance functions (IIFs) showed that E(O)?+ TFS information from 8-13 dB below to 10 dB above the channel long-term level was important. When E(O)?+ TFS information was added at the peaks, IIFs peaked around -2 dB, but when E(O)?+ TFS information was added at the valleys, the peaks lay around +1 dB. 相似文献

5.

Effects of periodic interruptions on the intelligibility of speech based on temporal fine-structure or envelope cues

Gilbert G Bergeras I Voillery D Lorenzi C 《The Journal of the Acoustical Society of America》2007,122(3):1336

The intelligibility of speech signals processed to retain either temporal envelope (E) or fine structure (TFS) cues within 16 0.4-oct-wide frequency bands was evaluated when processed stimuli were periodically interrupted at different rates. The interrupted E- and TFS-coded stimuli were highly intelligible in all conditions. However, the different patterns of results obtained for E- and TFS-coded speech suggest that the two types of stimuli do not convey identical speech cues. When an effect of interruption rate was observed, the effect occurred at low interruption rates (<8 Hz) and was stronger for E- than TFS-coded speech, suggesting larger involvement of modulation masking with E-coded speech. 相似文献

6.

Failure of the precedence effect with a noise-band vocoder

Seeber BU Hafter ER 《The Journal of the Acoustical Society of America》2011,129(3):1509-1521

The precedence effect (PE) describes the ability to localize a direct, leading sound correctly when its delayed copy (lag) is present, though not separately audible. The relative contribution of binaural cues in the temporal fine structure (TFS) of lead-lag signals was compared to that of interaural level differences (ILDs) and interaural time differences (ITDs) carried in the envelope. In a localization dominance paradigm participants indicated the spatial location of lead-lag stimuli processed with a binaural noise-band vocoder whose noise carriers introduced random TFS. The PE appeared for noise bursts of 10 ms duration, indicating dominance of envelope information. However, for three test words the PE often failed even at short lead-lag delays, producing two images, one toward the lead and one toward the lag. When interaural correlation in the carrier was increased, the images appeared more centered, but often remained split. Although previous studies suggest dominance of TFS cues, no image is lateralized in accord with the ITD in the TFS. An interpretation in the context of auditory scene analysis is proposed: By replacing the TFS with that of noise the auditory system loses the ability to fuse lead and lag into one object, and thus to show the PE. 相似文献

7.

Effects of spectral smearing and temporal fine-structure distortion on the fluctuating-masker benefit for speech at a fixed signal-to-noise ratio

Bernstein JG Brungart DS 《The Journal of the Acoustical Society of America》2011,130(1):473-488

Normal-hearing listeners receive less benefit from momentary dips in the level of a fluctuating masker for speech processed to degrade spectral detail or temporal fine structure (TFS) than for unprocessed speech. This has been interpreted as evidence that the magnitude of the fluctuating-masker benefit (FMB) reflects the ability to resolve spectral detail and TFS. However, the FMB for degraded speech is typically measured at a higher signal-to-noise ratio (SNR) to yield performance similar to normal speech for the baseline (stationary-noise) condition. Because the FMB decreases with increasing SNR, this SNR difference might account for the reduction in FMB for degraded speech. In this study, the FMB for unprocessed and processed (TFS-removed or spectrally smeared) speech was measured in a paradigm that adjusts word-set size, rather than SNR, to equate stationary-noise performance across processing conditions. Compared at the same SNR and percent-correct level (but with different set sizes), processed and unprocessed stimuli yielded a similar FMB for four different fluctuating maskers (speech-modulated noise, one opposite-gender interfering talker, two same-gender interfering talkers, and 16-Hz interrupted noise). These results suggest that, for these maskers, spectral or TFS distortions do not directly impair the ability to benefit from momentary dips in masker level. 相似文献

8.

The effect of smoothing filter slope and spectral frequency on temporal speech information

Healy EW Steinbach HM 《The Journal of the Acoustical Society of America》2007,121(2):1177-1181

It is known that information contained within the filter skirts can provide cues important to speech intelligibility. However, the role of filter slope during temporal smoothing has received little attention. In experiment 1, smoothing filter slope angle was found to have a large effect on the intelligibility of sentences represented by three amplitude-modulated sinusoids. In experiment 2, the use of temporal cues above 16 Hz was examined across various regions of the spectrum. When increases in rate were presented to individual spectral bands, intelligibility only increased when presented in the higher spectral region. This result suggests a greater reliance on higher-rate cues in this region. However, intelligibility was greatest when these cues were distributed across the spectrum, indicating that their effective use is not restricted solely to this region. 相似文献

9.

Speech recognition with altered spectral distribution of envelope cues. 总被引：8，自引：0，他引：8

R V Shannon F G Zeng J Wygonski 《The Journal of the Acoustical Society of America》1998,104(4):2467-2476

Recognition of consonants, vowels, and sentences was measured in conditions of reduced spectral resolution and distorted spectral distribution of temporal envelope cues. Speech materials were processed through four bandpass filters (analysis bands), half-wave rectified, and low-pass filtered to extract the temporal envelope from each band. The envelope from each speech band modulated a band-limited noise (carrier bands). Analysis and carrier bands were manipulated independently to alter the spectral distribution of envelope cues. Experiment I demonstrated that the location of the cutoff frequencies defining the bands was not a critical parameter for speech recognition, as long as the analysis and carrier bands were matched in frequency extent. Experiment II demonstrated a dramatic decrease in performance when the analysis and carrier bands did not match in frequency extent, which resulted in a warping of the spectral distribution of envelope cues. Experiment III demonstrated a large decrease in performance when the carrier bands were shifted in frequency, mimicking the basal position of electrodes in a cochlear implant. And experiment IV showed a relatively minor effect of the overlap in the noise carrier bands, simulating the overlap in neural populations responding to adjacent electrodes in a cochlear implant. Overall, these results show that, for four bands, the frequency alignment of the analysis bands and carrier bands is critical for good performance, while the exact frequency divisions and overlap in carrier bands are not as critical. 相似文献

10.

Acoustic and linguistic factors in the perception of bandpass-filtered speech

Stickney GS Assmann PF 《The Journal of the Acoustical Society of America》2001,109(3):1157-1165

Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering. 相似文献

11.

Importance of temporal-envelope speech cues in different spectral regions

Ardoint M Agus T Sheft S Lorenzi C 《The Journal of the Acoustical Society of America》2011,130(2):EL115-EL121

This study investigated the ability to use temporal-envelope (E) cues in a consonant identification task when presented within one or two frequency bands. Syllables were split into five bands spanning the range 70-7300 Hz with each band processed to preserve E cues and degrade temporal fine-structure cues. Identification scores were measured for normal-hearing listeners in quiet for individual processed bands and for pairs of bands. Consistent patterns of results were obtained in both the single- and dual-band conditions: identification scores increased systematically with band center frequency, showing that E cues in the higher bands (1.8-7.3 kHz) convey greater information. 相似文献

12.

Benefit of temporal fine structure to speech perception in noise measured with controlled temporal envelopes

Eaves JM Summerfield AQ Kitterick PT 《The Journal of the Acoustical Society of America》2011,130(1):501-507

Previous studies have assessed the importance of temporal fine structure (TFS) for speech perception in noise by comparing the performance of normal-hearing listeners in two conditions. In one condition, the stimuli have useful information in both their temporal envelopes and their TFS. In the other condition, stimuli are vocoded and contain useful information only in their temporal envelopes. However, these studies have confounded differences in TFS with differences in the temporal envelope. The present study manipulated the analytic signal of stimuli to preserve the temporal envelope between conditions with different TFS. The inclusion of informative TFS improved speech-reception thresholds for sentences presented in steady and modulated noise, demonstrating that there are significant benefits of including informative TFS even when the temporal envelope is controlled. It is likely that the results of previous studies largely reflect the benefits of TFS, rather than uncontrolled effects of changes in the temporal envelope. 相似文献

13.

On the mechanisms involved in the recovery of envelope information from temporal fine structure

Apoux F Millman RE Viemeister NF Brown CA Bacon SP 《The Journal of the Acoustical Society of America》2011,130(1):273-282

Three experiments were designed to provide psychophysical evidence for the existence of envelope information in the temporal fine structure (TFS) of stimuli that were originally amplitude modulated (AM). The original stimuli typically consisted of the sum of a sinusoidally AM tone and two unmodulated tones so that the envelope and TFS could be determined a priori. Experiment 1 showed that normal-hearing listeners not only perceive AM when presented with the Hilbert fine structure alone but AM detection thresholds are lower than those observed when presenting the original stimuli. Based on our analysis, envelope recovery resulted from the failure of the decomposition process to remove the spectral components related to the original envelope from the TFS and the introduction of spectral components related to the original envelope, suggesting that frequency- to amplitude-modulation conversion is not necessary to recover envelope information from TFS. Experiment 2 suggested that these spectral components interact in such a way that envelope fluctuations are minimized in the broadband TFS. Experiment 3 demonstrated that the modulation depth at the original carrier frequency is only slightly reduced compared to the depth of the original modulator. It also indicated that envelope recovery is not specific to the Hilbert decomposition. 相似文献

14.

Effects of moderate cochlear hearing loss on the ability to benefit from temporal fine structure information in speech

Hopkins K Moore BC Stone MA 《The Journal of the Acoustical Society of America》2008,123(2):1140-1153

Speech reception thresholds (SRTs) were measured with a competing talker background for signals processed to contain variable amounts of temporal fine structure (TFS) information, using nine normal-hearing and nine hearing-impaired subjects. Signals (speech and background talker) were bandpass filtered into channels. Channel signals for channel numbers above a "cut-off channel" (CO) were vocoded to remove TFS information, while channel signals for channel numbers of CO and below were left unprocessed. Signals from all channels were combined. As a group, hearing-impaired subjects benefited less than normal-hearing subjects from the additional TFS information that was available as CO increased. The amount of benefit varied between hearing-impaired individuals, with some showing no improvement in SRT and one showing an improvement similar to that for normal-hearing subjects. The reduced ability to take advantage of TFS information in speech may partially explain why subjects with cochlear hearing loss get less benefit from listening in a fluctuating background than normal-hearing subjects. TFS information may be important in identifying the temporal "dips" in such a background. 相似文献

15.

Dispersive white-light spectral interferometry including the effect of thin-film for distance measurement

P. Hlubina D. Ciprian J. Lu&#x; ek 《Optik》2007,118(7):319-324

A spectral-domain white-light interferometric technique is used for measuring distances in a Michelson interferometer with a mirror represented by a thin-film structure (TFS) on a substrate. A fibre-optic spectrometer is employed for recording spectral interferograms that include wide wavelength range effects of dispersion in a cube beam splitter and multiple reflection within the TFS. Knowing the effective thickness of the beam splitter, its dispersion and parameters of the TFS and substrate, the positions of the second interferometer mirror are determined precisely by a least-squares fitting of the theoretical spectral interferograms to the recorded ones. We apply the technique to the beam splitter made of BK7 optical glass and to a uniform SiO₂ thin film on a silicon wafer. We compare the results of the processing that include and do not include the effect of the TFS. 相似文献

16.

The role of contrasting temporal amplitude patterns in the perception of speech

Healy EW Warren RM 《The Journal of the Acoustical Society of America》2003,113(3):1676-1688

Despite a lack of traditional speech features, novel sentences restricted to a narrow spectral slit can retain nearly perfect intelligibility [R. M. Warren et al., Percept. Psychophys. 57, 175-182 (1995)]. The current study employed 514 listeners to elucidate the cues allowing this high intelligibility, and to examine generally the use of narrow-band temporal speech patterns. When 1/3-octave sentences were processed to preserve the overall temporal pattern of amplitude fluctuation, but eliminate contrasting amplitude patterns within the band, sentence intelligibility dropped from values near 100% to values near zero (experiment 1). However, when a 1/3-octave speech band was partitioned to create a contrasting pair of independently amplitude-modulated 1/6-octave patterns, some intelligibility was restored (experiment 2). An additional experiment (3) showed that temporal patterns can also be integrated across wide frequency separations, or across the two ears. Despite the linguistic content of single temporal patterns, open-set intelligibility does not occur. Instead, a contrast between at least two temporal patterns is required for the comprehension of novel sentences and their component words. These contrasting patterns can reside together within a narrow range of frequencies, or they can be integrated across frequencies or ears. This view of speech perception, in which across-frequency changes in energy are seen as systematic changes in the temporal fluctuation patterns at two or more fixed loci, is more in line with the physiological encoding of complex signals. 相似文献

17.

Beneficial acoustic speech cues for cochlear implant users with residual acoustic hearing

Visram AS Azadpour M Kluk K McKay CM 《The Journal of the Acoustical Society of America》2012,131(5):4042-4050

This study investigated which acoustic cues within the speech signal are responsible for bimodal speech perception benefit. Seven cochlear implant (CI) users with usable residual hearing at low frequencies in the non-implanted ear participated. Sentence tests were performed in near-quiet (some noise on the CI side to reduce scores from ceiling) and in a modulated noise background, with the implant alone and with the addition, in the hearing ear, of one of four types of acoustic signals derived from the same sentences: (1) a complex tone modulated by the fundamental frequency (F0) and amplitude envelope contours; (2) a pure tone modulated by the F0 and amplitude contours; (3) a noise-vocoded signal; (4) unprocessed speech. The modulated tones provided F0 information without spectral shape information, whilst the vocoded signal presented spectral shape information without F0 information. For the group as a whole, only the unprocessed speech condition provided significant benefit over implant-alone scores, in both near-quiet and noise. This suggests that, on average, F0 or spectral cues in isolation provided limited benefit for these subjects in the tested listening conditions, and that the significant benefit observed in the full-signal condition was derived from implantees' use of a combination of these cues. 相似文献

18.

Relative contribution of target and masker temporal fine structure to the unmasking of consonants in noise

Apoux F Healy EW 《The Journal of the Acoustical Society of America》2011,130(6):4044-4052

The present study assessed the relative contribution of the "target" and "masker" temporal fine structure (TFS) when identifying consonants. Accordingly, the TFS of the target and that of the masker were manipulated simultaneously or independently. A 30 band vocoder was used to replace the original TFS of the stimuli with tones. Four masker types were used. They included a speech-shaped noise, a speech-shaped noise modulated by a speech envelope, a sentence, or a sentence played backward. When the TFS of the target and that of the masker were disrupted simultaneously, consonant recognition dropped significantly compared to the unprocessed condition for all masker types, except the speech-shaped noise. Disruption of only the target TFS led to a significant drop in performance with all masker types. In contrast, disruption of only the masker TFS had no effect on recognition. Overall, the present data are consistent with previous work showing that TFS information plays a significant role in speech recognition in noise, especially when the noise fluctuates over time. However, the present study indicates that listeners rely primarily on TFS information in the target and that the nature of the masker TFS has a very limited influence on the outcome of the unmasking process. 相似文献

19.

Relative contributions of spectral and temporal cues for phoneme recognition 总被引：4，自引：0，他引：4

Xu L Thompson CS Pfingst BE 《The Journal of the Acoustical Society of America》2005,117(5):3255-3267

Cochlear implants provide users with limited spectral and temporal information. In this study, the amount of spectral and temporal information was systematically varied through simulations of cochlear implant processors using a noise-excited vocoder. Spectral information was controlled by varying the number of channels between 1 and 16, and temporal information was controlled by varying the lowpass cutoff frequencies of the envelope extractors from 1 to 512 Hz. Consonants and vowels processed using those conditions were presented to seven normal-hearing native-English-speaking listeners for identification. The results demonstrated that both spectral and temporal cues were important for consonant and vowel recognition with the spectral cues having a greater effect than the temporal cues for the ranges of numbers of channels and lowpass cutoff frequencies tested. The lowpass cutoff for asymptotic performance in consonant and vowel recognition was 16 and 4 Hz, respectively. The number of channels at which performance plateaued for consonants and vowels was 8 and 12, respectively. Within the above-mentioned ranges of lowpass cutoff frequency and number of channels, the temporal and spectral cues showed a tradeoff for phoneme recognition. Information transfer analyses showed different relative contributions of spectral and temporal cues in the perception of various phonetic/acoustic features. 相似文献

20.

Sexual characteristics of preadolescent childrens' voices.

S Bennett B Weinberg 《The Journal of the Acoustical Society of America》1979,65(1):179-189

This investigation was undertaken to enlarge current understanding of the acoustic properties which influence the perception of maleness and femaleness in the voices of prepubertal children. Perceptual judgments of sexual identity were obtained in response to tape recordings of whispered and normally phonated vowels, normally spoken sentences, and sentences spoken in a monotonous fashion. Seventy-three children provided recordings. The four utterance types were chosen to experimentally manipulate selected physical properties of speech thought to exert an influence on listener judgments of sexual identity. The results of this work suggest that cues stemming from differences in vocal tract dimensions and/or articulatory behaviors provided the primary cues about the sexual identity of these preadolescent children. Although laryngeal source cues could have provided relevant information about the sex of a few children, this variable was felt to play a relatively minor role in the sex recognition process. New information was uncovered about the role certain suprasegmental factors play in the identification of child sex. 相似文献