首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Previous research has identified a "synchrony window" of several hundred milliseconds over which auditory-visual (AV) asynchronies are not reliably perceived. Individual variability in the size of this AV synchrony window has been linked with variability in AV speech perception measures, but it was not clear whether AV speech perception measures are related to synchrony detection for speech only or for both speech and nonspeech signals. An experiment was conducted to investigate the relationship between measures of AV speech perception and AV synchrony detection for speech and nonspeech signals. Variability in AV synchrony detection for both speech and nonspeech signals was found to be related to variability in measures of auditory-only (A-only) and AV speech perception, suggesting that temporal processing for both speech and nonspeech signals must be taken into account in explaining variability in A-only and multisensory speech perception.  相似文献   

2.
陈锴  卢晶  徐柏龄 《声学学报》2006,31(3):211-216
提出了一种基于话者状态检测的语音分离算法。该算法对话者状态进行自动检测,并根据相应的状态对自适应滤波过程加以控制,以此对各路的声场传递函数进行估计,进而使混合的语音信号得到分离。仿真实验结果表明:与传统的输出信号互为参考的信号的分离算法相比,该算法克服了参考信号不纯导致自适应语音分离结果恶化的缺陷;该算法不需要人为地降低自适应滤波器的收敛速度,所以具有较快的收敛和跟踪性能;此外,该算法还具有运算量较小,实时性好等特点。  相似文献   

3.
噪声环境中的汉语浊语音检测   总被引:1,自引:0,他引:1  
为了在低信噪比和复杂噪声环境下检测汉语浊语音,根据浊语音谐波结构特性,提出了一种鲁棒的浊语音检测方法。通过改进的谱跟踪算法,得到能表征浊语音谐波特性的一簇谱线;从谱线簇中提取谐波特征作为汉语浊语音检测的依据。在不同信噪比和不同噪声环境下的浊语音检测对比实验中全面优于传统方法,在0 dB信噪比时正识率高于传统方法约30%。实验结果表明,该方法在低信噪比和非平稳复杂噪声环境下都具有较好的浊语音检测效果。  相似文献   

4.
This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.  相似文献   

5.
The effect of perceived spatial differences on masking release was examined using a 4AFC speech detection paradigm. Targets were 20 words produced by a female talker. Maskers were recordings of continuous streams of nonsense sentences spoken by two female talkers and mixed into each of two channels (two talker, and the same masker time reversed). Two masker spatial conditions were employed: "RF" with a 4 ms time lead to the loudspeaker 60 degrees horizontally to the right, and "FR" with the time lead to the front (0 degrees ) loudspeaker. The reference nonspatial "F" masker was presented from the front loudspeaker only. Target presentation was always from the front loudspeaker. In Experiment 1, target detection threshold for both natural and time-reversed spatial maskers was 17-20 dB lower than that for the nonspatial masker, suggesting that significant release from informational masking occurs with spatial speech maskers regardless of masker understandability. In Experiment 2, the effectiveness of the FR and RF maskers was evaluated as the right loudspeaker output was attenuated until the two-source maskers were indistinguishable from the F masker, as measured independently in a discrimination task. Results indicated that spatial release from masking can be observed with barely noticeable target-masker spatial differences.  相似文献   

6.
万伊  杨飞然  杨军 《应用声学》2023,42(1):26-33
自动说话人认证系统是一种常用的目标说话人身份认证方案,但它在合成语声的攻击下表现出脆弱性,合成语声检测系统试图解决这一问题。该文提出了一种基于Transformer编码器的合成语声检测方法,利用自注意力机制学习输入特征内部的长期依赖关系。合成语声检测问题并不关注句子的抽象语义特征,用参数量较小的模型也能得到较好的检测性能。该文分别测试了4种常用合成语声检测特征在Transformer编码器上的表现,在国际标准的ASVspoof2019挑战赛的逻辑攻击数据集上,基于线性频率倒谱系数特征和Transformer编码器的系统等错误率与串联检测代价函数分别为3.13%和0.0708,且模型参数量仅为0.082 M,在较小参数量下得到了较好的检测性能。  相似文献   

7.
Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems.  相似文献   

8.
Current automatic acoustic detection and classification of microchiroptera utilize global features of individual calls (i.e., duration, bandwidth, frequency extrema), an approach that stems from expert knowledge of call sonograms. This approach parallels the acoustic phonetic paradigm of human automatic speech recognition (ASR), which relied on expert knowledge to account for variations in canonical linguistic units. ASR research eventually shifted from acoustic phonetics to machine learning, primarily because of the superior ability of machine learning to account for signal variation. To compare machine learning with conventional methods of detection and classification, nearly 3000 search-phase calls were hand labeled from recordings of five species: Pipistrellus bodenheimeri, Molossus molossus, Lasiurus borealis, L. cinereus semotus, and Tadarida brasiliensis. The hand labels were used to train two machine learning models: a Gaussian mixture model (GMM) for detection and classification and a hidden Markov model (HMM) for classification. The GMM detector produced 4% error compared to 32% error for a baseline broadband energy detector, while the GMM and HMM classifiers produced errors of 0.6 +/- 0.2% compared to 16.9 +/- 1.1% error for a baseline discriminant function analysis classifier. The experiments showed that machine learning algorithms produced errors an order of magnitude smaller than those for conventional methods.  相似文献   

9.
10.
《Physics of life reviews》2014,11(4):650-686
The term “personality” is used to describe a distinctive and relatively stable set of mental traits that aim to explain the organism's behavior. The concept of personality that emerged in human psychology has been also applied to the study of non-human organisms from birds to horses. In this paper, I critically review the concept of personality from an interdisciplinary perspective, and point to some ideas that may be used for developing a cognitive-biological theory of personality. Integrating theories and research findings from various fields such as cognitive ethnology, clinical psychology, and neuroscience, I argue that the common denominator of various personality theories are neural systems of threat/trust management and their emotional, cognitive, and behavioral dimensions. In this context, personality may be also conceived as a meta-heuristics both human and non-human organisms apply to model and predict the behavior of others. The paper concludes by suggesting a minimal computational model of personality that may guide future research.  相似文献   

11.
基于基音参数规整及统计分布模型距离的语音情感识别   总被引:17,自引:0,他引:17  
提出一种根据基音提取的频率分辨率确定自适应窗口的改进Parzen窗方法估计基音概率密度,兼顾了基音统计分布模型在低频段的高分辨率和高频段的平滑;提出利用不同性别的基音分布规律的性别区分算法,对于长句可以达到98%的识别率;通过分析基音均值、方差、统计分布模型在性别上的差异,对基音参数进行基于性别差异的规整;引入规整后的基音均值和基音方差,以及基音统计分布模型距离作为情感特征参数;最后利用K最近邻方法对汉语情感语料进行识别。利用常规方法提取的参数最后得到的识别率为73.8%,而使用经过性别差异规整的基音参数和基音统计分布距离的识别率提高到81%。  相似文献   

12.
张东  余朝刚 《应用声学》2016,24(1):14-14
针对当前电气化铁路接触网几何参数日常检测的需求,提出了一种基于激光扫描的接触网几何参数检测方法,即结合激光扫描仪、光电编码器和工控机等硬件设备进行编程处理以实现相关几何参数的非接触式采集、采样点定位和数据处理以获得接触网的各个几何参数,经试验该方法能有效的提高接触网几何参数检测精度,测量精度达到±3mm,具有实际意义。  相似文献   

13.
A method for computing the speech transmission index (STI) using real speech stimuli is presented and evaluated. The method reduces the effects of some of the artifacts that can be encountered when speech waveforms are used as probe stimuli. Speech-based STIs are computed for conversational and clearly articulated speech in several noisy, reverberant, and noisy-reverberant environments and compared with speech intelligibility scores. The results indicate that, for each speaking style, the speech-based STI values are monotonically related to intelligibility scores for the degraded speech conditions tested. Therefore, the STI can be computed using speech probe waveforms and the values of the resulting indices are as good predictors of intelligibility scores as those derived from MTFs by theoretical methods.  相似文献   

14.
移位帧累积技术中的移位参数检测研究   总被引:1,自引:0,他引:1  
宋勇  郝群 《光学技术》2005,31(5):786-789
移位帧累积技术在抑制图像噪声的同时,避免了运动图像直接帧累积所带来的边缘模糊,从而有效地提高了运动图像的像质。准确判定目标在图像中的运动方向及移位量是移位帧累积技术的关键。提出了一种适用于运动图像序列的目标移位参数快速检测方法。该方法通过相应算法产生包含特征像素的帧差图像,并通过对特征像素的统计进行目标运动方向的判别及移位量的计算,从而实现对目标运动参数的自动检测。实验结果表明移位量的检测值与实际值具有良好的一致性。  相似文献   

15.
基于Hilbert-Huang变换的基音周期提取方法   总被引:6,自引:0,他引:6  
黄海  潘家强 《声学学报》2006,31(1):35-41
提出了一种基于Hilbert-Huang变换的语音信号基音周期提取方法。该方法无须对语音信号进行分帧截断。语音信号直接进行Hilbert-Huang变换后,通过基音频率搜索处理得到基音频率及其随时间的变化。实验结果表明,与传统的基音周期提取方法相比,该方法既能真实描述语音信号的非平稳非线性特性,又能提高基音周期提取的准确性和分辨率。  相似文献   

16.
基于子带能量特征的最优化语音端点检测算法研究   总被引:9,自引:2,他引:7  
陈振标  徐波 《声学学报》2005,30(2):171-176
为了提高噪声环境下语音端点检测的鲁棒性,提出了一种结合多子带能量特征和最优化边缘检测判决准则的算法。该算法的突出优点在于:在不同信噪比情况下,其端点检测滤波器的输出基本不变,从而避免了门限调整所带来的困难。实验结果表明,这种算法在多种噪声环境下都能够达到较好的语音检出效果。这种算法克服了传统语音端点检测以短时能量、基频、过零率等作为检测特征时,需要动态调整门限且在低信噪比情况下鲁棒性较差的缺点。  相似文献   

17.
The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects' speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant.  相似文献   

18.
Inspired by recent evidence that a binary pattern may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed by classifying binary masks corresponding to a word utterance. The proposed method is evaluated using a subset of the TIDigits corpus to perform isolated digit recognition. Despite dramatic reduction of speech information encoded in a binary mask, the proposed system performs surprisingly well. The system is compared with a traditional HMM based approach and is shown to perform well under low SNR conditions.  相似文献   

19.
An illusion is explored in which a spoken phrase is perceptually transformed to sound like song rather than speech, simply by repeating it several times over. In experiment I, subjects listened to ten presentations of the phrase and judged how it sounded on a five-point scale with endpoints marked "exactly like speech" and "exactly like singing." The initial and final presentations of the phrase were identical. When the intervening presentations were also identical, judgments moved solidly from speech to song. However, this did not occur when the intervening phrases were transposed slightly or when the syllables were presented in jumbled orderings. In experiment II, the phrase was presented either once or ten times, and subjects repeated it back as they finally heard it. Following one presentation, the subjects repeated the phrase back as speech; however, following ten presentations they repeated it back as song. The pitch values of the subjects' renditions following ten presentations were closer to those of the original spoken phrase than were the pitch values following a single presentation. Furthermore, the renditions following ten presentations were even closer to a hypothesized representation in terms of a simple tonal melody than they were to the original spoken phrase.  相似文献   

20.
In order to improve the performance of deception detection based on Chinese speech signals, a method of sparse decomposition on spectral feature is proposed. First, the wavelet packet transform is applied to divide the speech signal into multiple sub-bands. Band cepstral features of wavelet packets are obtained by operating the discrete cosine transform on loga?rithmic energy of each sub-band. The cepstral feature is generated by combing Mel Frequency Cepstral Coefficient and Wavelet Packet Band Cepstral Coefficient. Second, K-singular value decomposition algorithm is employed to achieve the training of an over-complete mixture dictionary based on both the truth and deceptive feature sets, and an orthogonal matching pursuit algorithm is used for sparse coding according to the mixture dictionary to get sparse feature.Finally, recognition experiments axe performed with various classified modules. Experimental results show that the sparse decomposition method has better performance comparied with con?ventional dimension reduced methods. The recognition accuracy of the method proposed in this paper is 78.34%, which is higher than methods using other features, improving the recognition ability of deception detection system significantly.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号