首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two experiments were conducted to investigate whether or not anchoring and selective adaptation induce basically the same psychological effects. The purpose of the first experiment is to show how an audiovisual anchor modifies the perception of consonant-vowel (CV) syllables. The anchors were two purely acoustical, two purely optical, and three audiovisual CV syllables. The results were compared with those of audiovisual speech selective-adaptation experiments conducted by Roberts and Summerfield [Percept. Psychophys. 30, 309-314 (1981)] and Salda?a and Rosenblum [J. Acoust. Soc. Am. 95, 3658-3661 (1994)]. The audiovisual anchoring effects were found to be very similar to the audiovisual selective-adaptation effects, but the incompatible audiovisual anchor produced more auditory-based contrast than the purely acoustical anchor or the compatible audiovisual anchor. This difference in contrast had not been found in the previous selective-adaptation experiments. The second experiment was conducted to directly compare audiovisual anchoring and selective-adaptation effects under the same stimuli and with the same subjects. It was found that the compatible audiovisual syllable (AbVb) caused more contrast in selective adaptation than in anchoring, although the discrepant audiovisual syllable (AbVg) caused no difference between anchoring and selective adaptation. It was also found that the anchor AbVg caused more auditory-based contrast than the anchor AbVb. It is suggested that the mechanisms behind these results are different.  相似文献   

2.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

3.
4.
言语知觉是心理学的一个领域,其发展与语音学、语音工程和人工智能等许多学科有关.本文简要介绍了在认知心理学和其它相关学科的推动下,言语知觉发展的主线、当前的主要问题和研究现状。  相似文献   

5.
6.
Three experiments were conducted to study relative contributions of speaking rate, temporal envelope, and temporal fine structure to clear speech perception. Experiment I used uniform time scaling to match the speaking rate between clear and conversational speech. Experiment II decreased the speaking rate in conversational speech without processing artifacts by increasing silent gaps between phonetic segments. Experiment III created "auditory chimeras" by mixing the temporal envelope of clear speech with the fine structure of conversational speech, and vice versa. Speech intelligibility in normal-hearing listeners was measured over a wide range of signal-to-noise ratios to derive speech reception thresholds (SRT). The results showed that processing artifacts in uniform time scaling, particularly time compression, reduced speech intelligibility. Inserting gaps in conversational speech improved the SRT by 1.3 dB, but this improvement might be a result of increased short-term signal-to-noise ratios during level normalization. Data from auditory chimeras indicated that the temporal envelope cue contributed more to the clear speech advantage at high signal-to-noise ratios, whereas the temporal fine structure cue contributed more at low signal-to-noise ratios. Taken together, these results suggest that acoustic cues for the clear speech advantage are multiple and distributed.  相似文献   

7.
As the human ear is dull to the phase in speech, little attention has been paid to phase information in speech coding. In fact, the speech perceptual quality may be degenerated if the phase distortion is very large. The perceptual effect of the STFT (Short time Fourier transform) phase spectrum is studied by auditory subjective hearing tests. Three main conclusions are (1) If the phase information is neglected completely, the subjective quality of the reconstructed speech may be very poor; (2) Whether the neglected phase is in low frequency band or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality between original speech and reconstructed speech while the phase quantization step size is shorter than π/7.  相似文献   

8.
Children 5, 9, and 11 years of age and young adults attempted to identify the final word of sentences recorded by a female speaker. The sentences were presented in two levels of multitalker babble, and participants responded by selecting one of four pictures. In a low-noise condition, the signal-to-noise ratio (SNR) was adjusted for each age group to yield 85% correct performance. In a high-noise condition, the SNR was set 7 dB lower than the low-noise condition. Although children required more favorable SNRs than adults to achieve comparable performance in low noise, an equivalent decrease in SNR had comparable consequences for all age groups. Thus age-related differences on this task can be attributed primarily to sensory factors.  相似文献   

9.
语音中相位的听觉感知实验研究   总被引:2,自引:0,他引:2  
人的听觉对语音信号中相位的感知比较迟钝,因而对语音信号进行处理和编码时常常不关心相位失真。实际上,相位失真到一定程度时会明显导致语音质量的下降。为了取得高质量的声码器,语音谱分量的相位信息是不能不考虑的。本文通过主观听觉测试实验研究了语音信号的短时Fourier变换相位谱对人的听觉感知的影响。测试结果表明:(1)如果完全舍弃原相位信息,则得到的重建语音含有很强的噪声且自然度很差;(2)不论舍弃高频段还是低频段的相位信息,均能导致听觉感知差异;(3)当相位的量阶小于π/7时,人的听觉系统将分辨不出重建语音和原始语音之间存在的差异.  相似文献   

10.
11.
This study assessed the effects of binaural spectral resolution mismatch on the intelligibility of Mandarin speech in noise using bilateral cochlear implant simulations. Noise-vocoded Mandarin speech, corrupted by speech-shaped noise at 0 and 5?dB signal-to-noise ratios, were presented unilaterally or bilaterally to normal-hearing listeners with mismatched spectral resolution between ears. Significant binaural benefits for Mandarin speech recognition were observed only with matched spectral resolution between ears. In addition, the performance of tone identification was more robust to noise than that of sentence recognition, suggesting factors other than tone identification might account more for the degraded sentence recognition in noise.  相似文献   

12.
A glimpsing model of speech perception in noise   总被引:5,自引:0,他引:5  
Do listeners process noisy speech by taking advantage of "glimpses"-spectrotemporal regions in which the target signal is least affected by the background? This study used an automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise. Twelve masking conditions were chosen to create a range of glimpse sizes. Several different glimpsing models were employed, differing in the local signal-to-noise ratio (SNR) used for detection, the minimum glimpse size, and the use of information in the masked regions. Recognition results were compared with behavioral data. A quantitative analysis demonstrated that the proportion of the time-frequency plane glimpsed is a good predictor of intelligibility. Recognition scores in each noise condition confirmed that sufficient information exists in glimpses to support consonant identification. Close fits to listeners' performance were obtained at two local SNR thresholds: one at around 8 dB and another in the range -5 to -2 dB. A transmitted information analysis revealed that cues to voicing are degraded more in the model than in human auditory processing.  相似文献   

13.
The present study evaluated auditory-visual speech perception in cochlear-implant users as well as normal-hearing and simulated-implant controls to delineate relative contributions of sensory experience and cues. Auditory-only, visual-only, or auditory-visual speech perception was examined in the context of categorical perception, in which an animated face mouthing ba, da, or ga was paired with synthesized phonemes from an 11-token auditory continuum. A three-alternative, forced-choice method was used to yield percent identification scores. Normal-hearing listeners showed sharp phoneme boundaries and strong reliance on the auditory cue, whereas actual and simulated implant listeners showed much weaker categorical perception but stronger dependence on the visual cue. The implant users were able to integrate both congruent and incongruent acoustic and optical cues to derive relatively weak but significant auditory-visual integration. This auditory-visual integration was correlated with the duration of the implant experience but not the duration of deafness. Compared with the actual implant performance, acoustic simulations of the cochlear implant could predict the auditory-only performance but not the auditory-visual integration. These results suggest that both altered sensory experience and improvised acoustic cues contribute to the auditory-visual speech perception in cochlear-implant users.  相似文献   

14.
15.
The respective influences of spectral and temporal aspects of sound in roughness perception are examined by way of phase manipulations. In a first experiment, the phase of the central component of three-component signals is shown to modify perceived roughness, for a given amplitude spectrum, regardless of whether it modifies the waveform envelope. A second experiment shows that the shape of the waveform envelope, for a given amplitude spectrum and a given modulation depth, also influences perceived roughness. We interpret both of these results by considering the envelope of an internal representation that is deduced from the physical signal by taking into account peripheral auditory processing. The results indicate that the modulation depth of such an internal representation is not the only determinant of roughness, but that an effect of temporal asymmetry is also to be taken into account.  相似文献   

16.
Previous research has identified a "synchrony window" of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception.  相似文献   

17.
Although both perceived vocal effort and intensity are known to influence the perceived distance of speech, little is known about the processes listeners use to integrate these two parameters into a single estimate of talker distance. In this series of experiments, listeners judged the distances of prerecorded speech samples presented over headphones in a large open field. In the first experiment, virtual synthesis techniques were used to simulate speech signals produced by a live talker at distances ranging from 0.25 to 64 m. In the second experiment, listeners judged the apparent distances of speech stimuli produced over a 60-dB range of different vocal effort levels (production levels) and presented over a 34-dB range of different intensities (presentation levels). In the third experiment, the listeners judged the distances of time-reversed speech samples. The results indicate that production level and presentation level influence distance perception differently for each of three distinct categories of speech. When the stimulus was high-level voiced speech (produced above 66 dB SPL 1 m from the talker's mouth), the distance judgments doubled with each 8-dB increase in production level and each 12-dB decrease in presentation level. When the stimulus was low-level voiced speech (produced at or below 66 dB SPL at 1 m), the distance judgments doubled with each 15-dB increase in production level but were relatively insensitive to changes in presentation level at all but the highest intensity levels tested. When the stimulus was whispered speech, the distance judgments were unaffected by changes in production level and only decreased with increasing presentation level when the intensity of the stimulus exceeded 66 dB SPL. The distance judgments obtained in these experiments were consistent across a range of different talkers, listeners, and utterances, suggesting that voice-based distance cueing could provide a robust way to control the apparent distances of speech sounds in virtual audio displays.  相似文献   

18.
This positron emission tomography study used a correlational design to investigate neural activity during speech perception in six normal subjects and two aphasic patients. The normal subjects listened either to speech or to signal-correlated noise equivalents; the latter were nonspeech stimuli, similar to speech in complexity but not perceived as speechlike. Regions common to the auditory processing of both types of stimuli were dissociated from those specific to spoken words. Increasing rates of presentation of both speech and nonspeech correlated with cerebral activity in bilateral transverse gyri and adjacent superior temporal cortex. Correlations specific to speech stimuli were located more anteriorly in both superior temporal sulci. The only asymmetry in normal subjects was a left lateralized response to speech in the posterior superior temporal sulcus, corresponding closely to structural asymmetry on the subjects' magnetic resonance images. Two patients, who had left temporal infarction but performed well on single word comprehension tasks, were also scanned while listening to speech. These cases showed right superior temporal activity correlating with increasing rates of hearing speech, but no significant left temporal activation. These findings together suggest that the dorsolateral temporal cortex of both hemispheres can be involved in prelexical processing of speech.  相似文献   

19.
20.
Studies comparing native and non-native listener performance on speech perception tasks can distinguish the roles of general auditory and language-independent processes from those involving prior knowledge of a given language. Previous experiments have demonstrated a performance disparity between native and non-native listeners on tasks involving sentence processing in noise. However, the effects of energetic and informational masking have not been explicitly distinguished. Here, English and Spanish listener groups identified keywords in English sentences in quiet and masked by either stationary noise or a competing utterance, conditions known to produce predominantly energetic and informational masking, respectively. In the stationary noise conditions, non-native talkers suffered more from increasing levels of noise for two of the three keywords scored. In the competing talker condition, the performance differential also increased with masker level. A computer model of energetic masking in the competing talker condition ruled out the possibility that the native advantage could be explained wholly by energetic masking. Both groups drew equal benefit from differences in mean F0 between target and masker, suggesting that processes which make use of this cue do not engage language-specific knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号