期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Contextual effects in the identification of nonspeech auditory patterns

Kidd G Richards VM Streeter T Mason CR Huang R 《The Journal of the Acoustical Society of America》2011,130(6):3926-3938

This study investigated the benefit of a priori cues in a masked nonspeech pattern identification experiment. Targets were narrowband sequences of tone bursts forming six easily identifiable frequency patterns selected randomly on each trial. The frequency band containing the target was randomized. Maskers were also narrowband sequences of tone bursts chosen randomly on every trial. Targets and maskers were presented monaurally in mutually exclusive frequency bands, producing large amounts of informational masking. Cuing the masker produced a significant improvement in performance, while holding the target frequency band constant provided no benefit. The cue providing the greatest benefit was a copy of the masker presented ipsilaterally before the target-plus-masker. The masker cue presented contralaterally, and a notched-noise cue produced smaller benefits. One possible mechanism underlying these findings is auditory "enhancement" in which the neural response to the target is increased relative to the masker by differential prior stimulation of the target and masker frequency regions. A second possible mechanism provides a benefit to performance by comparing the spectrotemporal correspondence of the cue and target-plus-masker and is effective for either ipsilateral or contralateral cue presentation. These effects improve identification performance by emphasizing spectral contrasts in sequences or streams of sounds. 相似文献

2.

Speech categorization in context: joint effects of nonspeech and speech precursors

Holt LL 《The Journal of the Acoustical Society of America》2006,119(6):4016-4026

The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners' context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli. 相似文献

3.

The use of acoustic cues for phonetic identification: effects of spectral degradation and electric hearing

Winn MB Chatterjee M Idsardi WJ 《The Journal of the Acoustical Society of America》2012,131(2):1465-1479

Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process. 相似文献

4.

Effects of differing phonetic contexts on spectrographic speaker identification

B Hazen 《The Journal of the Acoustical Society of America》1973,54(3):650-660

相似文献

5.

Sex differences in voice onset time: a developmental study of phonetic context effects in British English

Whiteside SP Henry L Dobbin R 《The Journal of the Acoustical Society of America》2004,116(2):1179-1183

Voice onset time (VOT) data for the plosives /p b t d k g/ in two vowel contexts (/i a/) for 5 groups of 46 boys and girls aged 5; 8 (5 years, 8 months) to 13;2 years were investigated to examine patterns of sex differences. Results indicated that there was some evidence of females displaying longer VOT values than the males. In addition, these were found to be most marked for the data of the 13;2-year olds. Furthermore, the sex differences in the VOT values displayed phonetic context effects. For example, the greatest sex differences were observed for the voiceless plosives, and within the context of the vowel /i/. 相似文献

6.

Acoustic variability within and across German, French, and American English vowels: phonetic context effects

Strange W Weber A Levy ES Shafiro V Hisagi M Nishi K 《The Journal of the Acoustical Society of America》2007,122(2):1111-1129

Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels. 相似文献

7.

Cross-language specialization in phonetic processing: English and Hindi perception of /w/-/v/ speech and nonspeech

Iverson P Wagner A Pinet M Rosen S 《The Journal of the Acoustical Society of America》2011,130(5):EL297-EL303

This study examined the perceptual specialization for native-language speech sounds, by comparing native Hindi and English speakers in their perception of a graded set of English /w/-/v/ stimuli that varied in similarity to natural speech. The results demonstrated that language experience does not affect general auditory processes for these types of sounds; there were strong cross-language differences for speech stimuli, and none for stimuli that were nonspeech. However, the cross-language differences extended into a gray area of speech-like stimuli that were difficult to classify, suggesting that the specialization occurred in phonetic processing prior to categorization. 相似文献

8.

Kidd G Mason CR Arbogast TL 《The Journal of the Acoustical Society of America》2002,111(3):1367-1376

This study examined whether increasing the similarity between informational maskers and signals would increase the amount of masking obtained in a nonspeech pattern identification task. The signals were contiguous sequences of pure-tone bursts arranged in six narrow-band spectro-temporal patterns. The informational maskers were sequences of multitone bursts played synchronously with the signal tones. The listener's task was to identify the patterns in a 1-interval 6-alternative forced-choice procedure. Three types of multitone maskers were generated according to different randomization rules. For the least signal-like informational masker, the components in each multitone burst were chosen at random within the frequency range of 200-6500 Hz, excluding a "protected region" around the signal frequencies. For the intermediate masker, the frequency components in the first burst were chosen quasirandomly, but the components in successive bursts were constrained to fall in narrow frequency bands around the frequencies of the components in the initial burst. Within the narrow bands the frequencies were randomized. This masker was considered to be more similar to the signal patterns because it consisted of a set of narrow-band sequences any one of which might be mistaken for a signal pattern. The most signal-like masker was similar to the intermediate masker in that it consisted of a set of synchronously played narrow-band sequences, but the variation in frequency within each sequence was sinusoidal, completing roughly one period in a sequence. This masker consisted of discernible patterns but not patterns that were part of the set of signals. In addition, masking produced by Gaussian noise bursts--thought to produce primarily peripherally based "energetic masking"--was measured and compared to the informational masking results. For the three informational maskers, more masking was produced by the maskers comprised of narrow-band sequences than for the masker in which the frequencies were not constrained to narrow bands. Also, the slopes of the performance-level functions for the three informational maskers were much shallower than for the Gaussian noise masker or for no masker. The findings provided qualified support for the hypothesis that increasing the similarity between signals and maskers, or parts of the maskers, causes greater informational masking. However, it is also possible that the greater masking was a consequence of increasing the number of perceptual "streams" that had to be evaluated by the listener. 相似文献

9.

Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones

C Zhang G Peng WS Wang 《The Journal of the Acoustical Society of America》2012,132(2):1088-1099

Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech?×?F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception. 相似文献

10.

Consonant identification in noise by native and non-native listeners: effects of local context

Cutler A Garcia Lecumberri ML Cooke M 《The Journal of the Acoustical Society of America》2008,124(2):1264-1268

Speech recognition in noise is harder in second (L2) than first languages (L1). This could be because noise disrupts speech processing more in L2 than L1, or because L1 listeners recover better though disruption is equivalent. Two similar prior studies produced discrepant results: Equivalent noise effects for L1 and L2 (Dutch) listeners, versus larger effects for L2 (Spanish) than L1. To explain this, the latter experiment was presented to listeners from the former population. Larger noise effects on consonant identification emerged for L2 (Dutch) than L1 listeners, suggesting that task factors rather than L2 population differences underlie the results discrepancy. 相似文献

11.

Stimulus variability and the phonetic relevance hypothesis: effects of variability in speaking style, fundamental frequency, and speaking rate on spoken word identification

Sommers MS Barcroft J 《The Journal of the Acoustical Society of America》2006,119(4):2406-2416

Three experiments were conducted to examine the effects of trial-to-trial variations in speaking style, fundamental frequency, and speaking rate on identification of spoken words. In addition, the experiments investigated whether any effects of stimulus variability would be modulated by phonetic confusability (i.e., lexical difficulty). In Experiment 1, trial-to-trial variations in speaking style reduced the overall identification performance compared with conditions containing no speaking-style variability. In addition, the effects of variability were greater for phonetically confusable words than for phonetically distinct words. In Experiment 2, variations in fundamental frequency were found to have no significant effects on spoken word identification and did not interact with lexical difficulty. In Experiment 3, two different methods for varying speaking rate were found to have equivalent negative effects on spoken word recognition and similar interactions with lexical difficulty. Overall, the findings are consistent with a phonetic-relevance hypothesis, in which accommodating sources of acoustic-phonetic variability that affect phonetically relevant properties of speech signals can impair spoken word identification. In contrast, variability in parameters of the speech signal that do not affect phonetically relevant properties are not expected to affect overall identification performance. Implications of these findings for the nature and development of lexical representations are discussed. 相似文献

12.

Effect of spatial uncertainty of masker on masked detection for nonspeech stimuli

Fan WL Streeter TM Durlach NI 《The Journal of the Acoustical Society of America》2008,124(1):36-39

Research on informational masking for nonspeech stimuli has focused on the effects of spectral uncertainty in the masker. In this letter, results are presented from some preliminary probe experiments in which the spectrum of the masker is held fixed but the spatial properties of the masker are randomized. In addition, in some tests, the overall level of the stimulus is randomized. These experiments differ from previous experiments that have measured the effect of spatial uncertainty on masking in that the only attributes (aside from level) that distinguish the target from the masker are the spatial attributes; in all of the tests, the target and masker were statistically identical, statistically independent, narrowband noise signals. In general, the results indicate that detection performance is degraded by spatial uncertainty in the masker but that compared both to the effects of spectral uncertainty and to the effects of overall-level uncertainty, the effects of spatial uncertainty are relatively small. 相似文献

13.

Temporal discrimination for single components of nonspeech auditory patterns

B Espinoza-Varas C S Watson 《The Journal of the Acoustical Society of America》1986,80(6):1685-1694

This paper extends previous research on listeners' abilities to discriminate the details of brief tonal components occurring within sequential auditory patterns (Watson et al., 1975, 1976). Specifically, the ability to discriminate increments in the duration delta t of tonal components was examined. Stimuli consisted of sequences of ten sinusoidal tones: a 40-ms test tone to which delta t was added, plus nine context tones with individual durations fixed at 40 ms or varying between 20 and 140 ms. The level of stimulus uncertainty was varied from high (any of 20 test tones occurring in any of nine factorial contexts), through medium (any of 20 test tones occurring in ten contexts), to minimal levels (one test tone occurring in a single context). The ability to discriminate delta t depended strongly on the level of stimulus uncertainty, and on the listener's experience with the tonal context. Asymptotic thresholds under minimal uncertainty approached 4-6 ms, or 15% of the duration of the test tones; under high uncertainty, they approached 40 ms, or 10% of the total duration of the tonal sequence. Initial thresholds exhibited by inexperienced listeners are two-to-four times greater than the asymptotic thresholds achieved after considerable training (20,000-30,000 trials). Isochronous sequences, with context tones of uniform, 40-ms duration, yield lower thresholds than those with components of varying duration. The frequency and temporal position of the test tones had only minor effects on temporal discrimination. It is proposed that a major determinant of the ability to discriminate the duration of components of sequential patterns is the listener's knowledge about "what to listen for and where." Reduced stimulus uncertainty and extensive practice increase the precision of this knowledge, and result in high-resolution discrimination performance. Increased uncertainty, limited practice, or both, would allow only discrimination of gross changes in the temporal or spectral structure of the sequential patterns. 相似文献

14.

Masking and stimulus intensity effects on duplex perception: a confirmation of the dissociation between speech and nonspeech modes

S Bentin V Mann 《The Journal of the Acoustical Society of America》1990,88(1):64-74

Using the phenomenon of duplex perception, previous researchers have shown that certain manipulations affect the perception of formant transitions as speech but not their perception as nonspeech "chirps," a dissociation that is consistent with the hypothesized distinction between speech and nonspeech modes of perception [Liberman et al., Percept. Psychophys. 30, 133-143 (1981); Mann and Liberman, Cognition 14, 211-235 (1983)]. The present study supports this interpretation of duplex perception by showing the existence of a "double dissociation" between the speech and chirp percepts. Five experiments compared the effects of stimulus onset asynchrony, backward masking, and transition intensity on the two sides of duplex percepts. It was found that certain manipulations penalize the chirp side but not the speech side, whereas other manipulations had the opposite effect of penalizing the speech side but not the chirp side. In addition, although effects on the speech side of duplex percepts have appeared to be much the same as in the case of normal (electronically fused) speech stimuli, the present study discovered that manipulations that impaired the chirp side of duplex percepts had considerably less effect on the perception of isolated chirps. Thus it would seem that duplex perception makes chirp perception more vulnerable to the effects of stimulus degradation. Several explanations of the data are discussed, among them, the view that speech perception may take precedence over other forms of auditory perception [Mattingly and Liberman, in Signals and Sense: Local and Global Order in Perceptual Maps, edited by G.M. Edelman, W.E. Gall, and W.M. Cowan (Wiley, New York, in press); Whalen and Liberman, Science 237, 169-171 (1987)]. 相似文献

15.

Auditory-visual speech perception and synchrony detection for speech and nonspeech signals

Conrey B Pisoni DB 《The Journal of the Acoustical Society of America》2006,119(6):4065-4073

Previous research has identified a "synchrony window" of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception. 相似文献

16.

Evaluation of context effects in sentence recognition

Bronkhorst AW Brand T Wagener K 《The Journal of the Acoustical Society of America》2002,111(6):2874-2886

It was investigated whether the model for context effects, developed earlier by Bronkhorst et al. [J. Acoust. Soc. Am. 93, 499-509 (1993)], can be applied to results of sentence tests, used for the evaluation of speech recognition. Data for two German sentence tests, that differed with respect to their semantic content, were analyzed. They had been obtained from normal-hearing listeners using adaptive paradigms in which the signal-to-noise ratio was varied. It appeared that the model can accurately reproduce the complete pattern of scores as a function of signal-to-noise ratio: both sentence recognition scores and proportions of incomplete responses. In addition, it is shown that the model can provide a better account of the relationship between average word recognition probability (p(e)) and sentence recognition probability (p(w)) than the relationship p(w) =p(e)j, which has been used in previous studies. Analysis of the relationship between j and the model parameters shows that j is, nevertheless, a very useful parameter, especially when it is combined with the parameter j', which can be derived using the equivalent relationship p(w,0) = (1 - p(e))(j'), where p(w,0) is the probability of recognizing none of the words in the sentence. These parameters not only provide complementary information on context effects present in the speech material, but they also can be used to estimate the model parameters. Because the model can be applied to both speech and printed text, an experiment was conducted in which part of the sentences was presented orthographically with 1-3 missing words. The results revealed a large difference between the values of the model parameters for the two presentation modes. This is probably due to the fact that, with speech, subjects can reduce the number of alternatives for a certain word using partial information that they have perceived (i.e., not only using the sentence context). A method for mapping model parameters from one mode to the other is suggested, but the validity of this approach has to be confirmed with additional data. 相似文献

17.

一种基于模糊聚类分析的异音混合共享模型

徐向华朱杰郭强《声学学报》2005,30(5):457-461

为减少语音识别中声学模型的参数量,提高参数训练的鲁棒性,提出了一种基于升值法模糊聚类的异音混合共享模型。在决策树结构的基础上,通过对初始三音子模型的高斯函数做模糊聚类得到该模型的高斯码本,并进一步通过对模型的方差做模糊聚类完成对方差的共享。识别实验结果表明,与相近高斯数量的传统异音混合共享模型相比,提出的异音混合共享模型的高斯权值数减少77.59%时,识别率提高7.92%;与相近参数量的三音子模型相比,方差共享的异音混合模型误识率降低了3.01%。相似文献

18.

A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition

Juneja A Espy-Wilson C 《The Journal of the Acoustical Society of America》2008,123(2):1154-1168

A probabilistic framework for a landmark-based approach to speech recognition is presented for obtaining multiple landmark sequences in continuous speech. The landmark detection module uses as input acoustic parameters (APs) that capture the acoustic correlates of some of the manner-based phonetic features. The landmarks include stop bursts, vowel onsets, syllabic peaks and dips, fricative onsets and offsets, and sonorant consonant onsets and offsets. Binary classifiers of the manner phonetic features-syllabic, sonorant and continuant-are used for probabilistic detection of these landmarks. The probabilistic framework exploits two properties of the acoustic cues of phonetic features-(1) sufficiency of acoustic cues of a phonetic feature for a probabilistic decision on that feature and (2) invariance of the acoustic cues of a phonetic feature with respect to other phonetic features. Probabilistic landmark sequences are constrained using manner class pronunciation models for isolated word recognition with known vocabulary. The performance of the system is compared with (1) the same probabilistic system but with mel-frequency cepstral coefficients (MFCCs), (2) a hidden Markov model (HMM) based system using APs and (3) a HMM based system using MFCCs. 相似文献

19.

Adaptation of phonetic feature analyzers for place of articulation

W E Cooper 《The Journal of the Acoustical Society of America》1974,56(2):617-627

相似文献

20.

Informational masking for simultaneous nonspeech stimuli: psychometric functions for fixed and randomly mixed maskers

Durlach NI Mason CR Gallun FJ Shinn-Cunningham B Colburn HS Kidd G 《The Journal of the Acoustical Society of America》2005,118(4):2482-2497

Sensitivity d' and response bias beta were measured as a function of target level for the detection of a 1000-Hz tone in multitone maskers using a one interval, two-alternative forced-choice (1I-2AFC) paradigm. Ten such maskers, each with eight randomly selected components in the region 200-5000 Hz, with 800-1250 Hz excluded to form a protected zone, were presented under two conditions: the fixed condition, in which the same eight-component masker is used throughout an experimental run, and the random condition, in which an eight-component masker is chosen randomly trial-to-trial from the given set of ten such maskers. Differences between the results obtained with these two conditions help characterize the listener's susceptibility to informational masking (IM). The d' results show great intersubject variability, but can be reasonably well fit by simple energy-detector models in which internal noise and filter bandwidth are used as fitting parameters. In contrast, the beta results are not well fit by these models. In addition to presentation of new data and its relation to energy-detector models, this paper provides comments on a variety of issues, problems, and research needs in the IM area. 相似文献