期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multiple routes to the perceptual learning of speech

Loebach JL Bent T Pisoni DB 《The Journal of the Acoustical Society of America》2008,124(1):552-561

相似文献

2.

Priming and sentence context support listening to noise-vocoded speech by younger and older adults

Sheldon S Pichora-Fuller MK Schneider BA 《The Journal of the Acoustical Society of America》2008,123(1):489-499

Older adults are known to benefit from supportive context in order to compensate for age-related reductions in perceptual and cognitive processing, including when comprehending spoken language in adverse listening conditions. In the present study, we examine how younger and older adults benefit from two types of contextual support, predictability from sentence context and priming, when identifying target words in noise-vocoded sentences. In the first part of the experiment, benefit from context based on primarily semantic knowledge was evaluated by comparing the accuracy of identification of sentence-final target words that were either highly predictable or not predictable from the sentence context. In the second part of the experiment, benefit from priming was evaluated by comparing the accuracy of identification of target words when noise-vocoded sentences were either primed or not by the presentation of the sentence context without noise vocoding and with the target word replaced with white noise. Younger and older adults benefited from each type of supportive context, with the most benefit realized when both types were combined. Supportive context reduced the number of noise-vocoded bands needed for 50% word identification more for older adults than their younger counterparts. 相似文献

3.

Effect of age, presentation method, and learning on identification of noise-vocoded words

Sheldon S Pichora-Fuller MK Schneider BA 《The Journal of the Acoustical Society of America》2008,123(1):476-488

Noise vocoding was used to investigate the ability of younger and older adults with normal audiometric thresholds in the speech range to use amplitude envelope cues to identify words. In Experiment 1, four 50-word lists were tested, with each word presented initially with one frequency band and the number of bands being incremented until it was correctly identified by the listener. Both age groups required an average of 5.25 bands for 50% correct word identification and performance improved across the four lists. In Experiment 2, the same participants who completed Experiment 1 identified words in four blocked noise-vocoded conditions (16, 8, 4, 2 bands). Compared to Experiment 1, both age groups required more bands to reach the 50% correct word identification threshold in Experiment 2, 6.13, and 8.55 bands, respectively, with younger adults outperforming older adults. Experiment 3 was identical to Experiment 2 except the participants had no prior experience with noise-vocoded speech. Again, younger adults outperformed older adults, with thresholds of 6.67 and 8.97 bands, respectively. The finding of age effects in Experiments 2 and 3, but not in Experiment 1, seems more likely to be related to differences in the presentation methods than to experience with noise vocoding. 相似文献

4.

Reiterant speech: an acoustic and perceptual validation

L S Larkey 《The Journal of the Acoustical Society of America》1983,73(4):1337-1345

Reiterant speech, or nonsense syllable mimicry, has been proposed as a way to study prosody, particularly syllable and word durations, unconfounded by segmental influences. Researchers have shown that segmental influences on durations can be neutralized in reiterant speech. If it is to be a useful tool in the study of prosody, it must also be shown that reiterant speech preserves the suprasegmental duration and intonation differences relevant to perception. In the present study, syllable durations for nonreiterant and reiterant ambiguous sentences were measured to seek evidence of the duration differences which can enable listeners to resolve surface structure ambiguities in nonreiterant speech. These duration patterns were found in both nonreiterant and reiterant speech. A perceptual study tested listeners' perception of these ambiguous sentences as spoken by four "good" speakers--speakers who neutralized intrinsic duration differences and whose sentences were independently rated by skilled listeners as good imitations of normal speech. The listeners were able to choose the correct interpretation when the ambiguous sentences were in reiterant form as well as they did when the sentences were spoken normally. These results support the notion that reiterant speech is like nonreiterant speech in aspects which are important in the study of prosody. 相似文献

5.

Revisiting perceptual compensation for effects of reverberation in speech identification

Nielsen JB Dau T 《The Journal of the Acoustical Society of America》2010,128(5):3088-3094

Listeners were given the task to identify the stop-consonant [t] in the test-word "stir" when the word was embedded in a carrier sentence. Reverberation was added to the test-word, but not to the carrier, and the ability to identify the [t] decreased because the amplitude modulations associated with the [t] were smeared. When a similar amount of reverberation was also added to the carrier sentence, the listeners' ability to identify the stop-consonant was restored. This phenomenon has in previous research been considered as evidence for an extrinsic compensation mechanism for reverberation in the human auditory system [Watkins (2005). J. Acoust. Soc. Am. 118, 249-262]. In the present study, the reverberant test-word was embedded in additional non-reverberant carriers, such as white noise, speech-shaped noise and amplitude modulated noise. In addition, a reference condition was included where the test-word was presented in isolation, i.e., without any carrier stimulus. In all of these conditions, the ability to identify the stop-consonant [t] was enhanced relative to the condition using the non-reverberant speech carrier. The results suggest that the non-reverberant speech carrier produces an interference effect that impedes the identification of the stop-consonant. These findings raise doubts about the existence of the compensation mechanism. 相似文献

6.

Steady-spectrum contexts and perceptual compensation for reverberation in speech identification

Watkins AJ Makin SJ 《The Journal of the Acoustical Society of America》2007,121(1):257-266

Perceptual compensation for reverberation was measured by embedding test words in contexts that were either spoken phrases or processed versions of this speech. The processing gave steady-spectrum contexts with no changes in the shape of the short-term spectral envelope over time, but with fluctuations in the temporal envelope. Test words were from a continuum between "sir" and "stir." When the amount of reverberation in test words was increased, to a level above the amount in the context, they sounded more like "sir." However, when the amount of reverberation in the context was also increased, to the level present in the test word, there was perceptual compensation in some conditions so that test words sounded more like "stir" again. Experiments here found compensation with speech contexts and with some steady-spectrum contexts, indicating that fluctuations in the context's temporal envelope can be sufficient for compensation. Other results suggest that the effectiveness of speech contexts is partly due to the narrow-band "frequency-channels" of the auditory periphery, where temporal-envelope fluctuations can be more pronounced than they are in the sound's broadband temporal envelope. Further results indicate that for compensation to influence speech, the context needs to be in a broad range of frequency channels. 相似文献

7.

Effects of noise on speech production: acoustic and perceptual analyses 总被引：4，自引：0，他引：4

W V Summers D B Pisoni R H Bernacki R I Pedlow M A Stokes 《The Journal of the Acoustical Society of America》1988,84(3):917-928

Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise. 相似文献

8.

Specificity of perceptual learning in a frequency discrimination task 总被引：3，自引：0，他引：3

Irvine DR Martin RL Klimkeit E Smith R 《The Journal of the Acoustical Society of America》2000,108(6):2964-2968

On a variety of visual tasks, improvement in perceptual discrimination with practice (perceptual learning) has been found to be specific to features of the training stimulus, including retinal location. This specificity has been interpreted as evidence that the learning reflects changes in neuronal tuning at relatively early processing stages. The aim of the present study was to examine the frequency specificity of human auditory perceptual learning in a frequency discrimination task. Difference limens for frequency (DLFs) were determined at 5 and 8 kHz, using a three-alternative forced choice method, for two groups of eight subjects before and after extensive training at one or the other frequency. Both groups showed substantial improvement at the training frequency, and much of this improvement generalized to the nontrained frequency. However, a small but statistically significant component of the improvement was specific to the training frequency. Whether this specificity reflects changes in neural frequency tuning or attentional changes remains unclear. 相似文献

9.

The effects of lung volume initiation on speech: a perceptual study

Claudio F. Milstein PhD Peter J. Watson PhD 《Journal of voice》2004,18(1):38-45

Audio recordings were made while six vocally untrained individuals read sentences aloud after breathing to three different lung volume levels-typical, high, and low. A perceptual experiment was conducted on these speech samples. The perceptual experiment consisted of a two-alternative forced-choice design, in which listeners heard matched pairs of sentences and were asked to identify which sentence in the pair departed from normal sounding speech. The results of the perceptual experiment showed that listeners can accurately discriminate between speech produced at both lung volume extremes. The percentage of correct identification was higher for speech produced at low lung volumes than that for high lung volumes. Factors such as order of presentation and removal of SPL as an acoustic cue made little difference in the ability of listeners to discriminate lung volume level from the speech signals. 相似文献

10.

Temporal-envelope constancy of speech in rooms and the perceptual weighting of frequency bands

Watkins AJ Raimond AP Makin SJ 《The Journal of the Acoustical Society of America》2011,130(5):2777-2788

Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding "auditory" filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The "sir" or "stir" test-words were distinguished by degrees of amplitude modulation, and played in the context; "next you'll get _ to click on." Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a "noise-like" quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word's [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word's bands. 相似文献

11.

用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数

下载免费PDF全文

蔡尚金鑫高圣翔潘接林颜永红《声学学报》2012,37(6):667-672

为了提高感知线性预测系数(PLP)在噪声环境下的识别性能,使用子带能量偏差减的方法,提出了一种基于子带能量规整的感知线性预测系数(SPNPLP)。PLP有效地集中了语音中的有用信息,在安静环境下自动语音识别系统使用PLP可以取得良好的识别率;但是在噪声环境中其识别性能急剧下降。通过使用能量偏差减的方法对PLP的子带能量进行规整,抑制背景噪声激励,提出了SPNPLP,增强自动语音识别系统在噪声环境下的鲁棒性。在一个语法大小为501的孤立词识别任务和一个大词表连续语音识别任务上做了测试,SPNPLP在这两个任务上,与PLP相比,汉字识别精度分别绝对提升了11.26%和9.2%。实验结果表明SPNPLP比PLP具有更好的噪声鲁棒性。相似文献

12.

Acoustical and perceptual characteristics of speech produced with an electronic artificial larynx.

M S Weiss G H Yeni-Komshian J M Heinz 《The Journal of the Acoustical Society of America》1979,65(5):1298-1308

相似文献

13.

A perceptual learning investigation of the pitch elicited by amplitude-modulated noise

Fitzgerald MB Wright BA 《The Journal of the Acoustical Society of America》2005,118(6):3794-3803

Noise that is amplitude modulated at rates ranging from 40 to 850 Hz can elicit a sensation of pitch. Here, the processing of this temporally based pitch was investigated using a perceptual-learning paradigm. Nine listeners were trained (1 hour per day for 6-8 days) to discriminate a standard rate of sinusoidal amplitude modulation (SAM) from a faster rate in a single condition (150 Hz SAM rate, 5 kHz low-pass carrier). All trained listeners improved significantly on that condition. These trained listeners subsequently showed no more improvement than nine untrained controls on pure-tone and rippled-noise discrimination with the same pitch, and on SAM-rate discrimination with a 30 Hz rate, although they did show some improvement with a 300 Hz rate. In addition, most trained, but not control, listeners were worse at detecting SAM at 150 Hz after, compared to before training. These results indicate that listeners can learn to improve their ability to discriminate SAM rate with multiple-hour training and that the mechanism that is modified by learning encodes (1) the pitch of SAM noise but not that of pure tones and rippled noise, (2) different SAM rates separately, and (3) differences in SAM rate more effectively than cues for SAM detection. 相似文献

14.

Perceptual learning in speech: stability over time

Eisner F McQueen JM 《The Journal of the Acoustical Society of America》2006,119(4):1950-1953

Perceptual representations of phonemes are flexible and adapt rapidly to accommodate idiosyncratic articulation in the speech of a particular talker. This letter addresses whether such adjustments remain stable over time and under exposure to other talkers. During exposure to a story, listeners learned to interpret an ambiguous sound as [f] or [s]. Perceptual adjustments measured after 12 h were as robust as those measured immediately after learning. Equivalent effects were found when listeners heard speech from other talkers in the 12 h interval, and when they had the opportunity to consolidate learning during sleep. 相似文献

15.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

16.

Influences of pellet markers on speech production behavior: acoustical and perceptual measures.

G Weismer K Bunton 《The Journal of the Acoustical Society of America》1999,105(5):2882-2894

Peri- and intraoral devices are often used to obtain measurements concerning articulator motions and placements. Surprisingly, there are few formal evaluations of the potential influence of these devices on speech production behavior. In particular, the potential effects of lingual pellets or coils used in x-ray or electromagnetic studies of tongue motion have never been evaluated formally, even though a large x-ray database exists and electromagnetic systems are commercially available. The x-ray microbeam database [Westbury, J. "X-ray Microbeam Speech Production Database User's Handbook, version 1" (1994)] includes several utterances produced with pellets-off and -on, which allowed us to evaluate effects of pellets for the utterance, She had your dark suit in greasy wash water all year, using acoustic and perceptual measures. Overall, there were no acoustic or perceptual measures that showed consistent effects of pellets across speakers, but certain effects were consistent either within a given speaker or in direction across a subgroup of the speakers. The results are discussed in terms of the general goodness of the assumption that point parameterization of lingual motion does not interfere with normal articulatory behaviors. A brief screening procedure is suggested to protect articulatory kinematic experiments from those individuals who may show consistent effects of having devices placed on perioral structures. 相似文献

17.

Single channel speech enhancement via time-frequency dictionary learning

HUANG Jianjun ZHANG Xiongwei ZHANG Yafei ZOU Xia 《声学学报：英文版》2013,(1):90-102

A time-frequency dictionary learning approach is proposed to enhance speech contaminated by additive nonstationary noise.In this framework,a time-frequency dictionary which is learned from noise data is incorporated into the convolutive nonnegative matrix factorization framework.The update rules for the time-varying gains and speech dictionary are derived by precomputing the noise dictionary.The magnitude spectra of speech are estimated using convolution operation between the learned speech dictionary and the time-varying gains. Finally,noise is removed via binary time-frequency masking.The experimental results indicate that the proposed scheme gives better enhancement results in terms of quality measures of speech.Moreover,the proposed algorithm outperforms the multiband spectra subtraction and the non-negative sparse coding based noise reduction algorithm in nonstationary noise conditions. 相似文献

18.

Perceptual learning of spectrally degraded speech and environmental sounds

Loebach JL Pisoni DB 《The Journal of the Acoustical Society of America》2008,123(2):1126-1139

Adaptation to the acoustic world following cochlear implantation does not typically include formal training or extensive audiological rehabilitation. Can cochlear implant (CI) users benefit from formal training, and if so, what type of training is best? This study used a pre-/posttest design to evaluate the efficacy of training and generalization of perceptual learning in normal hearing subjects listening to CI simulations (eight-channel sinewave vocoder). Five groups of subjects were trained on words (simple/complex), sentences (meaningful/anomalous), or environmental sounds, and then were tested using an open-set identification task. Subjects were trained on only one set of materials but were tested on all stimuli. All groups showed significant improvement due to training, which successfully generalized to some, but not all stimulus materials. For easier tasks, all types of training generalized equally well. For more difficult tasks, training specificity was observed. Training on speech did not generalize to the recognition of environmental sounds; however, explicit training on environmental sounds successfully generalized to speech. These data demonstrate that the perceptual learning of degraded speech is highly context dependent and the type of training and the specific stimulus materials that a subject experiences during perceptual learning has a substantial impact on generalization to new materials. 相似文献

19.

结合注意力机制的改进U-Net网络在端到端语音增强中的应用

下载免费PDF全文

武瑞沁陈雪勤俞杰王丽荣赵鹤鸣《声学学报》2022,47(2):266-275

设计了一个适用于端到端语音增强的改进的U-Net (Attention Dilated Convolution U-Net,ADC-U-Net)网络模型。与基线U-Net网络相比,一方面通过加入空洞卷积减小由采样带来的信息损失;另一方面引入了注意力机制结构,结合了含噪语音更多的上下文信息,提取更深层次和更丰富的特征信息。与传统语音增强方法相比,所提模型无需提取特征、对特征去噪、重构语音3个步骤,避免了对显性特征的依赖,转而由网络模型通过多层次多尺度学习获得隐性特征。用多个主客观指标对增强语音的质量和可懂度进行了评价。实验数据显示所提算法在噪声抑制能力和对噪声的适应度方面均表现出良好的性能,与基线U-Net网络及其它模型相比,展示了良好的语音质量和可懂度。相似文献

20.

Unequal effects of speech and nonspeech contexts on the perceptual normalization of Cantonese level tones

C Zhang G Peng WS Wang 《The Journal of the Acoustical Society of America》2012,132(2):1088-1099

Context is important for recovering language information from talker-induced variability in acoustic signals. In tone perception, previous studies reported similar effects of speech and nonspeech contexts in Mandarin, supporting a general perceptual mechanism underlying tone normalization. However, no supportive evidence was obtained in Cantonese, also a tone language. Moreover, no study has compared speech and nonspeech contexts in the multi-talker condition, which is essential for exploring the normalization mechanism of inter-talker variability in speaking F0. The other question is whether a talker's full F0 range and mean F0 equally facilitate normalization. To answer these questions, this study examines the effects of four context conditions (speech/nonspeech?×?F0 contour/mean F0) in the multi-talker condition in Cantonese. Results show that raising and lowering the F0 of speech contexts change the perception of identical stimuli from mid level tone to low and high level tone, whereas nonspeech contexts only mildly increase the identification preference. It supports the speech-specific mechanism of tone normalization. Moreover, speech context with flattened F0 trajectory, which neutralizes cues of a talker's full F0 range, fails to facilitate normalization in some conditions, implying that a talker's mean F0 is less efficient for minimizing talker-induced lexical ambiguity in tone perception. 相似文献