首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.  相似文献   

2.
Learning to recognize complex sensory signals can change the way they are perceived. European starlings (Sturnus vulgaris) recognize other starlings by their song, which consists of a series of complex, stereotyped motifs. Song recognition learning is accompanied by plasticity in secondary auditory areas, suggesting that perceptual learning is involved. Here, to investigate whether perceptual learning can be observed behaviorally, a same-different operant task was used to measure how starlings perceived small differences in motif structure. Birds trained to recognize conspecific songs were better at detecting variations in motifs from the songs they learned, even though this variation was not directly necessary to learn the associative task. Discrimination also improved as the reference stimulus was repeated multiple times. Perception of the much larger differences between different motifs was unaffected by training. These results indicate that sensory representations of motifs are enhanced when starlings learn to recognize songs.  相似文献   

3.
A method of image enhancement and real-time input of 3-D, microscopic phase objects into a coherent optical pattern recognition system is described. The method consists of directing a low-power laser beam into a microscope objective to produce a real, magnified, coherent image of the specimen under test. The image plane is followed by two successive Fourier transform (FT) planes. In the first FT plane, low and high frequency spatial filters, one of which is photographically produced, are used as pre-processing filters to enhance the image quality. The enhanced signal is imaged from the first FT plane to the second FT plane which contains a matched spatial filter used for specimen identification. The system does not require an expensive incoherent-to-coherent light transducer and in addition, is capable of utilizing both phase and amplitude information from 3-D objects. Examples of results are given.  相似文献   

4.
The relationship between auditory perception and vocal production has been typically investigated by evaluating the effect of either altered or degraded auditory feedback on speech production in either normal hearing or hearing-impaired individuals. Our goal in the present study was to examine this relationship in individuals with superior auditory abilities. Thirteen professional musicians and thirteen nonmusicians, with no vocal or singing training, participated in this study. For vocal production accuracy, subjects were presented with three tones. They were asked to reproduce the pitch using the vowel /a/. This procedure was repeated three times. The fundamental frequency of each production was measured using an autocorrelation pitch detection algorithm designed for this study. The musicians' superior auditory abilities (compared to the nonmusicians) were established in a frequency discrimination task reported elsewhere. Results indicate that (a) musicians had better vocal production accuracy than nonmusicians (production errors of 1/2 a semitone compared to 1.3 semitones, respectively); (b) frequency discrimination thresholds explain 43% of the variance of the production data, and (c) all subjects with superior frequency discrimination thresholds showed accurate vocal production; the reverse relationship, however, does not hold true. In this study we provide empirical evidence to the importance of auditory feedback on vocal production in listeners with superior auditory skills.  相似文献   

5.
Vocal vibrato and tremor are characterized by oscillations in voice fundamental frequency (F0). These oscillations may be sustained by a control loop within the auditory system. One component of the control loop is the pitch-shift reflex (PSR). The PSR is a closed loop negative feedback reflex that is triggered in response to discrepancies between intended and perceived pitch with a latency of approximately 100 ms. Consecutive compensatory reflexive responses lead to oscillations in pitch every approximately 200 ms, resulting in approximately 5-Hz modulation of F0. Pitch-shift reflexes were elicited experimentally in six subjects while they sustained /u/ vowels at a comfortable pitch and loudness. Auditory feedback was sinusoidally modulated at discrete integer frequencies (1 to 10 Hz) with +/- 25 cents amplitude. Modulated auditory feedback induced oscillations in voice F0 output of all subjects at rates consistent with vocal vibrato and tremor. Transfer functions revealed peak gains at 4 to 7 Hz in all subjects, with an average peak gain at 5 Hz. These gains occurred in the modulation frequency region where the voice output and auditory feedback signals were in phase. A control loop in the auditory system may sustain vocal vibrato and tremorlike oscillations in voice F0.  相似文献   

6.
The problem of recognition of blurred-edge objects is discussed. An optimal algorithm for the recognition of such objects is proposed. The holographic filter realizing this algorithm can be formed with a spatially modulated reference wave by using the non-linear regime of the filter recording. The choice of the optimal filter recording regime was made by computer simulation of the holographic process.  相似文献   

7.
利用偏振技术识别人造目标   总被引:17,自引:1,他引:17  
孙玮  刘政凯  单列 《光学技术》2004,30(3):267-269
提出了一种利用目标的偏振信息识别人造目标的新型方法。利用自制的多波段偏振CCD地面实验装置获取目标的偏振图像,并提取其中的偏振信息。由于人造目标和自然目标的偏振特性上有较大差别,因而根据这些信息,通过较常规的图像处理手段,即可很好地识别出图像中的人造目标。实验证明,该方法识别自然背景下的人造目标是相当有效的。  相似文献   

8.
The neural processes underlying concurrent sound segregation were examined by using event-related brain potentials. Participants were presented with complex sounds comprised of multiple harmonics, one of which could be mistuned so that it was no longer an integer multiple of the fundamental. In separate blocks of trials, short-, middle-, and long-duration sounds were presented and participants indicated whether they heard one sound (i.e., buzz) or two sounds (i.e., buzz plus another sound with a pure-tone quality). The auditory stimuli were also presented while participants watched a silent movie in order to evaluate the extent to which the mistuned harmonic could be automatically detected. The perception of the mistuned harmonic as a separate sound was associated with a biphasic negative-positive potential that peaked at about 150 and 350 ms after sound onset, respectively. Long duration sounds also elicited a sustained potential that was greater in amplitude when the mistuned harmonic was perceptually segregated from the complex sound. The early negative wave, referred to as the object-related negativity (ORN), was present during both active and passive listening, whereas the positive wave and the mistuning-related changes in sustained potentials were present only when participants attended to the stimuli. These results are consistent with a two-stage model of auditory scene analysis in which the acoustic wave is automatically decomposed into perceptual groups that can be identified by higher executive functions. The ORN and the positive waves were little affected by sound duration, indicating that concurrent sound segregation depends on transient neural responses elicited by the discrepancy between the mistuned harmonic and the harmonic frequency expected based on the fundamental frequency of the incoming stimulus.  相似文献   

9.
Auditory scene analysis involves the simultaneous grouping and parsing of acoustic data into separate mental representations (i.e., objects). Over two experiments, we examined the sequence of neural processes underlying concurrent sound segregation by means of recording of human middle latency auditory evoked responses. Participants were presented with complex sounds comprising several harmonics, one of which could be mistuned such that it was not an integer multiple of the fundamental frequency. In both experiments, Na (approximately 22 ms) and Pa (approximately 32 ms) waves were reliably generated for all classes of stimuli. For stimuli with a fundamental frequency of 200 Hz, the mean Pa amplitude was significantly larger when the third harmonic was mistuned by 16% of its original value, relative to when it was tuned. The enhanced Pa amplitude was related to an increased likelihood in reporting the presence of concurrent auditory objects. Our results are consistent with a low-level stage of auditory scene analysis in which acoustic properties such as mistuning act as preattentive segregation cues that can subsequently lead to the perception of multiple auditory objects.  相似文献   

10.
A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.  相似文献   

11.
This study tested the hypothesis that temporal processing deficits are evident in the pre-senescent (middle-aged) auditory system for listening tasks that involve brief stimuli, across-frequency-channel processing, and/or significant processing loads. A gap duration discrimination (GDD) task was employed that used either fixed-duration gap markers (experiment 1) or random-duration markers (experiment 2). Independent variables included standard gap duration (0, 35, and 250 ms), marker frequency (within- and across-frequency), and task complexity. A total of 18 young and 23 middle-aged listeners with normal hearing participated in the GDD experiments. Middle age was defined operationally as 40-55 years of age. The results indicated that middle-aged listeners performed more poorly than the young listeners in general, and that this deficit was sometimes, but not always, exacerbated by increases in task complexity. A third experiment employed a categorical perception task that measured the gap duration associated with a perceptual boundary. The results from 12 young and 12 middle-aged listeners with normal hearing indicated that the categorical boundary was associated with shorter gaps in the young listeners. The results of these experiments indicate that temporal processing deficits can be observed relatively early in the aging process, and are evident in middle age.  相似文献   

12.
Temporal processing in the aging auditory system.   总被引:2,自引:0,他引:2  
Measures of monaural temporal processing and binaural sensitivity were obtained from 12 young (mean age = 26.1 years) and 12 elderly (mean age = 70.9 years) adults with clinically normal hearing (pure-tone thresholds < or = 20 dB HL from 250 to 6000 Hz). Monaural temporal processing was measured by gap detection thresholds. Binaural sensitivity was measured by interaural time difference (ITD) thresholds. Gap and ITD thresholds were obtained at three sound levels (4, 8, or 16 dB above individual threshold). Subjects were also tested on two measures of speech perception, a masking level difference (MLD) task, and a syllable identification/discrimination task that included phonemes varying in voice onset time (VOT). Elderly listeners displayed poorer monaural temporal analysis (higher gap detection thresholds) and poorer binaural processing (higher ITD thresholds) at all sound levels. There were significant interactions between age and sound level, indicating that the age difference was larger at lower stimulus levels. Gap detection performance was found to correlate significantly with performance on the ITD task for young, but not elderly adult listeners. Elderly listeners also performed more poorly than younger listeners on both speech measures; however, there was no significant correlation between psychoacoustic and speech measures of temporal processing. Findings suggest that age-related factors other than peripheral hearing loss contribute to temporal processing deficits of elderly listeners.  相似文献   

13.
针对低信噪比说话人识别中缺失数据特征方法鲁棒性下降的问题,提出了一种采用感知听觉场景分析的缺失数据特征提取方法。首先求取语音的缺失数据特征谱,并由语音的感知特性求出感知特性的语音含量。含噪语音经过感知特性的语音增强和对其语谱的二维增强后求解出语音的分布,联合感知特性语音含量和缺失强度参数提取出感知听觉因子。再结合缺失数据特征谱把特征的提取过程分解为不同听觉场景进行区分地分析和处理,以增强说话人识别系统的鲁棒性能。实验结果表明,在-10 dB到10 dB的低信噪比环境下,对于4种不同的噪声,提出的方法比5种对比方法的鲁棒性均有提高,平均识别率分别提高26.0%,19.6%,12.7%,4.6%和6.5%。论文提出的方法,是一种在时-频域中寻找语音鲁棒特征的方法,更适合于低信噪比环境下的说话人识别。  相似文献   

14.
Human babies need to learn how to talk. The need of a tutor to achieve acceptable vocalisations is a feature that we share with a few species in the animal kingdom. Among those are Songbirds, which account for nearly half of the known bird species. For that reason, Songbirds have become an ideal animal model to study how a brain reconfigures itself during the process of learning a complex task. In the last few years, neuroscientists have invested important resources in order to unveil the neural architecture involved in birdsong production and learning. Yet, behaviour emerges from the interaction between a nervous system, a peripheral biomechanical architecture and environment, and therefore its study should be just as integrated. In particular, the physical study of the avian vocal organ can help to elucidate which features found in the song of birds are under direct control of specific neural instructions and which emerge from the biomechanics involved in its generation. This work describes recent advances in the study of the physics of birdsong production.  相似文献   

15.
16.
This paper extends previous research on listeners' abilities to discriminate the details of brief tonal components occurring within sequential auditory patterns (Watson et al., 1975, 1976). Specifically, the ability to discriminate increments in the duration delta t of tonal components was examined. Stimuli consisted of sequences of ten sinusoidal tones: a 40-ms test tone to which delta t was added, plus nine context tones with individual durations fixed at 40 ms or varying between 20 and 140 ms. The level of stimulus uncertainty was varied from high (any of 20 test tones occurring in any of nine factorial contexts), through medium (any of 20 test tones occurring in ten contexts), to minimal levels (one test tone occurring in a single context). The ability to discriminate delta t depended strongly on the level of stimulus uncertainty, and on the listener's experience with the tonal context. Asymptotic thresholds under minimal uncertainty approached 4-6 ms, or 15% of the duration of the test tones; under high uncertainty, they approached 40 ms, or 10% of the total duration of the tonal sequence. Initial thresholds exhibited by inexperienced listeners are two-to-four times greater than the asymptotic thresholds achieved after considerable training (20,000-30,000 trials). Isochronous sequences, with context tones of uniform, 40-ms duration, yield lower thresholds than those with components of varying duration. The frequency and temporal position of the test tones had only minor effects on temporal discrimination. It is proposed that a major determinant of the ability to discriminate the duration of components of sequential patterns is the listener's knowledge about "what to listen for and where." Reduced stimulus uncertainty and extensive practice increase the precision of this knowledge, and result in high-resolution discrimination performance. Increased uncertainty, limited practice, or both, would allow only discrimination of gross changes in the temporal or spectral structure of the sequential patterns.  相似文献   

17.
Application of an auditory model to speech recognition   总被引:3,自引:0,他引:3  
Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.  相似文献   

18.
19.
This paper describes a detailed study of recovery from forward masking in six users of the Nucleus-22 cochlear implant with a range of performance in speech-recognition tests. Recovery from a 300-ms-long pulse train presented at 1000 pps was found to be fastest in the poorer performers. The shape of the recovery function was found to be most strongly influenced by masker duration, suggesting that temporal integration plays a prominent role in recovery from forward masking. The recovery functions are reasonably well described by a sum of two exponentially decaying processes. Their relative weights depend on the amount of temporal integration occurring during the masker, and show strong intersubject variability. Nonmonotonicities sometimes observed in the recovery functions may be accounted for by considering the influence of neural adaptation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号