首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Specificity of perceptual learning in a frequency discrimination task   总被引:3,自引:0,他引:3  
On a variety of visual tasks, improvement in perceptual discrimination with practice (perceptual learning) has been found to be specific to features of the training stimulus, including retinal location. This specificity has been interpreted as evidence that the learning reflects changes in neuronal tuning at relatively early processing stages. The aim of the present study was to examine the frequency specificity of human auditory perceptual learning in a frequency discrimination task. Difference limens for frequency (DLFs) were determined at 5 and 8 kHz, using a three-alternative forced choice method, for two groups of eight subjects before and after extensive training at one or the other frequency. Both groups showed substantial improvement at the training frequency, and much of this improvement generalized to the nontrained frequency. However, a small but statistically significant component of the improvement was specific to the training frequency. Whether this specificity reflects changes in neural frequency tuning or attentional changes remains unclear.  相似文献   

3.
Fluctuations in sound amplitude provide important cues to the identity of many sounds including speech. Of interest here was whether the ability to detect these fluctuations can be improved with practice, and if so whether this learning generalizes to untrained cases. To address these issues, normal-hearing adults (n = 9) were trained to detect sinusoidal amplitude modulation (SAM; 80-Hz rate, 3-4 kHz bandpass carrier) 720 trials/day for 6-7 days and were tested before and after training on related SAM-detection and SAM-rate-discrimination conditions. Controls (n = 9) only participated in the pre- and post-tests. The trained listeners improved more than the controls on the trained condition between the pre- and post-tests, but different subgroups of trained listeners required different amounts of practice to reach asymptotic performance, ranging from 1 (n = 6) to 4-6 (n = 3) sessions. This training-induced learning did not generalize to detection with two untrained carrier spectra (5 kHz low-pass and 0.5-1.5 kHz bandpass) or to rate discrimination with the trained rate and carrier spectrum, but there was some indication that it generalized to detection with two untrained rates (30 and 150 Hz). Thus, practice improved the ability to detect amplitude modulation, but the generalization of this learning to untrained cases was somewhat limited.  相似文献   

4.
The Tickle Talker is an electrotactile speech perception device. Subjects were evaluated using the device in various tactile-alone and tactile-visual contexts to assess the generalization to other contexts of tactile-alone perceptual skills. The subjects were from a group of six normally hearing subjects who had previously received 12 to 33 h of tactile-alone word recognition training and had learned an average vocabulary of 50 words [Galvin et al., J. Acoust. Soc. Am. 106, 1084-1089 (1999)]. The tactile-alone evaluation contexts were sentences, unfamiliar talkers, and untrained words. The tactile-visual evaluation contexts were closed-set words, open-set words, and open-set sentences. Tactile-alone perceptual skills were generalized to unfamiliar speakers, sentences, and untrained words, though scores indicated that generalization was not complete. In contrast, the generalization of skills to tactile-visual contexts was minimal or absent. The potential value of tactile-alone training for hearing-impaired users of the Tickle Talker is discussed.  相似文献   

5.
Speech perception in the presence of another competing voice is one of the most challenging tasks for cochlear implant users. Several studies have shown that (1) the fundamental frequency (F0) is a useful cue for segregating competing speech sounds and (2) the F0 is better represented by the temporal fine structure than by the temporal envelope. However, current cochlear implant speech processing algorithms emphasize temporal envelope information and discard the temporal fine structure. In this study, speech recognition was measured as a function of the F0 separation of the target and competing sentence in normal-hearing and cochlear implant listeners. For the normal-hearing listeners, the combined sentences were processed through either a standard implant simulation or a new algorithm which additionally extracts a slowed-down version of the temporal fine structure (called Frequency-Amplitude-Modulation-Encoding). The results showed no benefit of increasing F0 separation for the cochlear implant or simulation groups. In contrast, the new algorithm resulted in gradual improvements with increasing F0 separation, similar to that found with unprocessed sentences. These results emphasize the importance of temporal fine structure for speech perception and demonstrate a potential remedy for difficulty in the perceptual segregation of competing speech sounds.  相似文献   

6.
Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories.  相似文献   

7.
Listeners' abilities to learn to hear all the details of an initially unfamiliar sequence of ten 45-ms tones were studied by tracking detection thresholds for each tonal component over a prolonged period of training. After repeated listening to this sequence, the presence or absence of individual tones could be recognized, even though they were attenuated by 40-50 dB relative to the remainder of the pattern. Threshold-tracking histories suggest that listeners tend to employ two different learning strategies, one of which is considerably more efficient. Special training by reducing stimulus uncertainty and extending the duration of the target component was effective in increasing the rate of threshold improvement. Strategies acquired with the first pattern studied generalized to new sequences of tones. The possible implications of these results for the perceptual learning of speech or other auditory codes are discussed.  相似文献   

8.
The present study examined the effects of short-term perceptual training on normal-hearing listeners' ability to adapt to spectrally altered speech patterns. Using noise-band vocoder processing, acoustic information was spectrally distorted by shifting speech information from one frequency region to another. Six subjects were tested with spectrally shifted sentences after five days of practice with upwardly shifted training sentences. Training with upwardly shifted sentences significantly improved recognition of upwardly shifted speech; recognition of downwardly shifted speech was nearly unchanged. Three subjects were later trained with downwardly shifted speech. Results showed that the mean improvement was comparable to that observed with the upwardly shifted training. In this retrain and retest condition, performance was largely unchanged for upwardly shifted sentence recognition, suggesting that these listeners had retained some of the improved speech perception resulting from the previous training. The results suggest that listeners are able to partially adapt to a spectral shift in acoustic speech patterns over the short-term, given sufficient training. However, the improvement was localized to where the spectral shift was trained, as no change in performance was observed for spectrally altered speech outside of the trained regions.  相似文献   

9.
Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentence recognition across sentence lists. Given the phonetic balancing within lists and the validation with spectrally degraded speech, the present MSP test materials may be useful for assessing speech performance of Mandarin-speaking CI listeners.  相似文献   

10.
Several studies have shown that extensive training with synthetic speech sounds can result in substantial improvements in listeners' perception of intraphonemic differences. The purpose of the present study was to investigate the effects of listening experience on the perception of intraphonemic differences in the absence of specific training with the synthetic speech sounds being tested. Phonetically trained listeners, musicians, and untrained listeners were tested on a two-choice identification task, a three-choice identification task, and an ABX discrimination task using a synthetic [bi]-[phi] continuum and a synthetic [wei]-[rei] continuum. The three-choice identification task included the identification of stimuli with an "indefinite" or "ambiguous" quality in addition to clear instances of the opposing phonetic categories. Results included: (1) All three subject groups showed some ability to identify ambiguous stimuli; (2) phonetically trained listeners were better at identifying ambiguous stimuli than musicians and untrained listeners; (3) phonetically trained listeners performed better on the discrimination task than musicians and untrained listeners; (4) musicians and untrained listeners did not differ on any of the listening tasks; and (5) participation by the inexperienced listeners in a 10-week introductory phonetics course did not result in improvements in either the three-choice identification task or the discrimination task.  相似文献   

11.
In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/output pattern pairs and attempts to learn their functional relationship; it develops the necessary representational features during the course of learning. A series of computer simulation studies was carried out to assess the ability of these networks to accurately label sounds, to learn to recognize sounds without labels, and to learn feature representations of continuous speech. These studies demonstrated that the networks can learn to label presegmented test tokens with accuracies of up to 95%. Networks trained on segmented sounds using a strategy that requires no external labels were able to recognize and delineate sounds in continuous speech. These networks developed rich internal representations that included units which corresponded to such traditional distinctions as vowels and consonants, as well as units that were sensitive to novel and nonstandard features. Networks trained on a large corpus of unsegmented, continuous speech without labels also developed interesting feature representations, which may be useful in both segmentation and label learning. The results of these studies, while preliminary, demonstrate that backpropagation learning can be used with complex, natural data to identify a feature structure that can serve as the basis for both analysis and nontrivial pattern recognition.  相似文献   

12.
This study aimed to determine whether bats using frequency modulated (FM) echolocation signals adapt the features of their vocalizations to the perceptual demands of a particular sonar task. Quantitative measures were obtained from the vocal signals produced by echolocating bats (Eptesicus fuscus) that were trained to perform in two distinct perceptual tasks, echo delay and Doppler-shift discriminations. In both perceptual tasks, the bats learned to discriminate electronically manipulated playback signals of their own echolocation sounds, which simulated echoes from sonar targets. Both tasks utilized a single-channel electronic target simulator and tested the bat's in a two-alternative forced choice procedure. The results of this study demonstrate changes in the features of the FM bats' sonar sounds with echolocation task demands, lending support to the notion that this animal actively controls the echo information that guides its behavior.  相似文献   

13.
Studies evaluating phonological contrast learning typically investigate either the predictiveness of specific pretraining aptitude measures or the efficacy of different instructional paradigms. However, little research considers how these factors interact--whether different students learn better from different types of instruction--and what the psychological basis for any interaction might be. The present study demonstrates that successfully learning a foreign-language phonological contrast for pitch depends on an interaction between individual differences in perceptual abilities and the design of the training paradigm. Training from stimuli with high acoustic-phonetic variability is generally thought to improve learning; however, we found high-variability training enhanced learning only for individuals with strong perceptual abilities. Learners with weaker perceptual abilities were actually impaired by high-variability training relative to a low-variability condition. A second experiment assessing variations on the high-variability training design determined that the property of this learning environment most detrimental to perceptually weak learners is the amount of trial-by-trial variability. Learners' perceptual limitations can thus override the benefits of high-variability training where trial-by-trial variability in other irrelevant acoustic-phonetic features obfuscates access to the target feature. These results demonstrate the importance of considering individual differences in pretraining aptitudes when evaluating the efficacy of any speech training paradigm.  相似文献   

14.
Speech recognition was measured as a function of spectral resolution (number of spectral channels) and speech-to-noise ratio in normal-hearing (NH) and cochlear-implant (CI) listeners. Vowel, consonant, word, and sentence recognition were measured in five normal-hearing listeners, ten listeners with the Nucleus-22 cochlear implant, and nine listeners with the Advanced Bionics Clarion cochlear implant. Recognition was measured as a function of the number of spectral channels (noise bands or electrodes) at signal-to-noise ratios of + 15, + 10, +5, 0 dB, and in quiet. Performance with three different speech processing strategies (SPEAK, CIS, and SAS) was similar across all conditions, and improved as the number of electrodes increased (up to seven or eight) for all conditions. For all noise levels, vowel and consonant recognition with the SPEAK speech processor did not improve with more than seven electrodes, while for normal-hearing listeners, performance continued to increase up to at least 20 channels. Speech recognition on more difficult speech materials (word and sentence recognition) showed a marginally significant increase in Nucleus-22 listeners from seven to ten electrodes. The average implant score on all processing strategies was poorer than scores of NH listeners with similar processing. However, the best CI scores were similar to the normal-hearing scores for that condition (up to seven channels). CI listeners with the highest performance level increased in performance as the number of electrodes increased up to seven, while CI listeners with low levels of speech recognition did not increase in performance as the number of electrodes was increased beyond four. These results quantify the effect of number of spectral channels on speech recognition in noise and demonstrate that most CI subjects are not able to fully utilize the spectral information provided by the number of electrodes used in their implant.  相似文献   

15.
This study examined whether cochlear implant users must perceive differences along phonetic continua in the same way as do normal hearing listeners (i.e., sharp identification functions, poor within-category sensitivity, high between-category sensitivity) in order to recognize speech accurately. Adult postlingually deafened cochlear implant users, who were heterogeneous in terms of their implants and processing strategies, were tested on two phonetic perception tasks using a synthetic /da/-/ta/ continuum (phoneme identification and discrimination) and two speech recognition tasks using natural recordings from ten talkers (open-set word recognition and forced-choice /d/-/t/ recognition). Cochlear implant users tended to have identification boundaries and sensitivity peaks at voice onset times (VOT) that were longer than found for normal-hearing individuals. Sensitivity peak locations were significantly correlated with individual differences in cochlear implant performance; individuals who had a /d/-/t/ sensitivity peak near normal-hearing peak locations were most accurate at recognizing natural recordings of words and syllables. However, speech recognition was not strongly related to identification boundary locations or to overall levels of discrimination performance. The results suggest that perceptual sensitivity affects speech recognition accuracy, but that many cochlear implant users are able to accurately recognize speech without having typical normal-hearing patterns of phonetic perception.  相似文献   

16.
We present results from a pilot study directed at developing an anchorable subjective speech quality test. The test uses multidimensional scaling techniques to obtain quantitative information about the perceptual attributes of speech. In the first phase of the study, subjects ranked perceptual distances between samples of speech produced by two different talkers, one male and one female, processed by a variety of codecs. The resulting distance matrices were processed to obtain, for each talker, a stimulus space for the various speech samples. This stimulus space has the properties that distances between stimuli in this space correspond to perceptual distances between stimuli and that the dimensions of this space correspond to attributes used by the subjects in determining perceptual distances. Mean opinion scores (MOS) scores obtained in an earlier study were found to be highly correlated with position in the stimulus space, and the three dimensions of the stimulus space were found to have identifiable physical and perceptual correlates. In the second phase of the study, we developed techniques for fitting speech generated by a new codec under investigation into a previously established stimulus space. The user is provided with a collection of speech samples and with the stimulus space for these speech samples as determined by a large-scale listening test. The user then carries out a much smaller listening test to determine the position of the new stimulus in the previously established stimulus space. This system is anchorable, so that different versions of a codec under development can be compared directly, and it provides more detailed information than the single number provided by MOS testing. We suggest that this information could be used to advantage in algorithm development and in development of objective measures of speech quality.  相似文献   

17.
The recognition of three suprasegmental aspects of speech--the number of syllables in a word, the stress pattern of a word, and rising or falling intonation patterns--through a single-channel tactile device and through a 24-channel tactile vocoder, using two groups of normal-hearing subjects, was compared. All subjects received an initial pretest on three recognition tasks, one for each prosodic feature. Half the subjects from each group then received 12 h of training with feedback on the tasks and stimuli used in the pretest. All subjects received a post-test which contained physically different stimuli from those previously tested. Performance was significantly better on the syllable-number and syllabic stress tasks with the single-channel than with the multichannel device on both the pre- and post-tests; no difference was found for the intonation task. Performance on the post-test was poorer for all trained subjects compared to their final training results, suggesting that cues learned in training were not readily transferable to new stimuli, even those with similar prosodic characteristics. Overall, the results provide support for the notion that certain prosodic features of speech may be conveyed more readily when the waveform envelope is preserved.  相似文献   

18.
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.  相似文献   

19.
对14位正常听力者开展了环境声的人工耳蜗仿真声识别实验,比较了两类声码器仿真(正弦载波和噪声载波)条件下的环境声识别效果差异,然后对9位讲普通话的成年人工耳蜗植入者开展了环境声识别实验。实验材料是从互联网上搜集,并经过12位正常听力者主观测试验证后,筛选出的67种环境声。结果显示,载波类型没有对67种环境声的平均识别效果产生显著影响,但是声学特征的差异会导致单个环境声的识别效果对载波类型有依赖。另外,人工耳蜗植入者的环境声识别效果较差,有待通过信号处理策略、神经接口和康复手段的改进而得到提高。本研究中开发的环境声材料可以用于评估人工耳蜗环境声识别效果。   相似文献   

20.
This study examined perceptual learning of spectrally complex nonspeech auditory categories in an interactive multi-modal training paradigm. Participants played a computer game in which they navigated through a three-dimensional space while responding to animated characters encountered along the way. Characters' appearances in the game correlated with distinctive sound category distributions, exemplars of which repeated each time the characters were encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became harder to identify visually, so quick identification of approaching characters by sound patterns was, although never required or encouraged, of gradually increasing benefit. After 30 min of play, participants performed a categorization task, matching sounds to characters. Despite not being informed of audio-visual correlations, participants exhibited reliable learning of these patterns at posttest. Categorization accuracy was related to several measures of game performance and category learning was sensitive to category distribution differences modeling acoustic structures of speech categories. Category knowledge resulting from the game was qualitatively different from that gained from an explicit unsupervised categorization task involving the same stimuli. Results are discussed with respect to information sources and mechanisms involved in acquiring complex, context-dependent auditory categories, including phonetic categories, and to multi-modal statistical learning.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号