共查询到20条相似文献,搜索用时 62 毫秒
1.
2.
说话人识别技术是一项重要的生物特征识别技术。近年来,使用深度神经网络提取发声特征的说话人识别算法取得了突出成果。时延神经网络作为其中的典型代表之一已被证明具有出色的特征提取能力。为进一步提升识别准确率并节约计算资源,通过对现有的说话人识别算法进行研究,提出一种带有注意力机制的密集连接时延神经网络用于说话人识别。密集连接的网络结构在增强不同网络层之间的信息复用的同时能有效控制模型体积。通道注意力机制和帧注意力机制帮助网络聚焦于更关键的细节特征,使得通过统计池化提取出的说话人特征更具有代表性。实验结果表明,在VoxCeleb1测试数据集上取得了1.40%的等错误率(EER)和0.15的最小检测代价标准(DCF),证明了在说话人识别任务上的有效性。 相似文献
3.
通过运用mat1ab软件编程对语音信号进行了基音提取,并以MFCC、基于基音周期和MFCC的特征组合参数作为特征参数,建立了基于GMM模型的说话人识别系统。并通过识别实验得出的实验结果,发现使用基于基音周期和MFCC的特征组合参数作为特征参数,在人数为50-180人范围内,能够有效提高基于GMM说话人识别系统识别率。 相似文献
4.
提出了基于极大似然线性回归(MLLR)调整的说话人模型合成和特征映射方法。MAP调整事后确定相应模型间线性关系,变换参数人为确定;而MLLR调整首先确定相应模型间线性关系,变换参数由训练数据确定,并且可以只调整均值向量。模型合成时,MLLR调整指定通用信道背景模型参数间的线性变换;特征映射时,MLLR调整指定Root GMM-UBM与通用信道背景模型参数间的线性变换。通过对模型参数进行分组调整,可以在训练数据和参数数目间达成平衡。实验结果表明,合适选取MLLR回归类,可以取得比相应MAP调整方法更好的识别效果。 相似文献
5.
尺度不变特征与几何特征融合的人耳识别方法 总被引:2,自引:1,他引:2
要提高人耳的识别率,关键是特征的提取与表达.尺度不变特征变换(SIFT)技术是局部点特征提取算法,在尺度空间寻找极值点,提取对图像的尺度和旋转变化具有不变性,对光照变化和图像变形具有较强的适应性的特征向量.尝试用SIFT技术来提取外耳图像的结构特征点以形成稳定的特征描述子,为了克服一幅图像中有多个局部描述子相似的问题,在SIFT特征描述子中融入一个耳廓几何特征.最后采用特征向量的欧氏距离作为两幅图像相似性度量标准进行人耳识别.在耳图像库七进行实验.结果表明,该方法不仅可以有效地提取人耳特征,通过少量特征可获得较高的识别率,而且对耳图像刚体变化具有较强的稳健性. 相似文献
6.
7.
8.
研究韵律特征在说话人确认中的应用。将整个韵律轨迹以固定段长和段移进行片段划分,并对其进行勒让德多项式拟合从而获取连续性的韵律特征,将特征映射到总变化因子空间,并用概率线性判别分析来补偿说话人和场景的差异。在美国国家标准技术研究院2010年说话人识别评测扩展核心测试集5的基础上加入噪声构造测试集,并分别对韵律特征和传统Mel频率倒谱系数进行测试。结果显示,随着信噪比的逐渐减小,Mel频率倒谱系数性能出现大幅度下降,而韵律特征性能相对比较稳定,两种特征融合后能使系统性能得到进一步提升,等错率和最小检测错误代价相对于Mel频率倒谱系数单系统最多能分别下降9%和11%。实验表明,韵律特征应用于说话人识别中具有较强的噪声鲁棒性,且与传统的Mel频率倒谱系数存在较强的互补性。 相似文献
9.
在太阳能热水器及太阳能电池等太阳能发电领域,下雨、下雪、阴天等气候因素将严重影响发电效果,而太阳能随动系统工作也必须消耗能量,所以迅速判断当前的天气状况,并设计自适应的开关随动系统极其重要。当天气状况为阴雨或者雪天时,系统应当关闭从而减少能耗。鉴于传统的天气识别方法效率低、准确度差、计算量大的问题,在公开的天气图像基础上创建了一个具有多种类别的天气分类集,并提供了一种基于卷积神经网络与特征融合的天气图像识别技术。通过采用传统方式获取图像的颜色、纹理、形状3种特征作为整个模型的底层特征,在原本的VGG16(visual geometry group-16)模型基础上进行了改进,从而提取图像的深层特征,最后将底层特征与深层特征融合起来在Softmax上进行输出,总识别率达到94%。 相似文献
10.
11.
In order to further improve the performance of speaker recognition, features fusion and models fusion are proposed. The features fusion method is to fuse deep and shallow features. The fused feature describes speaker characteristics more comprehensively than a single feature because of the complementarity between different levels of features. The models fusion method is to fuse i-vectors extracted from different speaker recognition systems. The fused model can combine advantages of different speaker recognition systems. Experimental results show the effectiveness of the proposed methods. Compared with the state-of-the-art system on CASIA North and South dialect corpus,the proposed features fusion system and models fusion system achieved about 54.8% and 69.5% relative improvement on the equal error rate(EER),respectively. 相似文献
12.
针对直升机探测中目标运动过程连续识别的鲁棒性问题,提出了一种基于复合深度神经网络的直升机声学特征提取和识别框架。复合深度神经网络由卷积神经网络和长短时记忆神经网络以并行结构组合,进行直升机声学特征的优化,完成直升机类型识别。针对直升机声信号特性,对卷积神经网络进行了改进,使得该复合深度神经网络在信号短时谱基础上优化声信号特征表征并提取前后帧之间的相关信息,弥补通常声目标识别方法不能充分利用目标信号时间历程信息的缺陷。真实外场实验数据测试结果显示:相较于传统识别方法,该算法显著提升了直升机进入有效探测范围后连续识别的鲁棒性和目标识别正确率。 相似文献
13.
针对声纹识别领域不匹配,且目标领域缺少标注数据的难题,提出在对抗学习基础上融合分布对齐的无监督领域自适应方法,通过训练过程中统计分布的对齐,以减小领域差异,从而提取声音中更有声纹鉴别性的特征,取得了稳定的性能提升。在文本相关的声纹识别任务中,对抗学习和分布对齐的方法能协同发挥作用,等错率相对降低11%;在文本无关的任务中,对抗学习效果不稳定,而分布对齐的方法依然有相对8%的性能提升。实验结果证明该方法在领域不匹配且目标领域缺少标注数据时,能有效提取语音中声纹鉴别信息,稳定提升识别性能。 相似文献
14.
Automatic modulation recognition (AMR) can be used in dynamic spectrum access (DSA) techniques to reduce the pressure on spectrum resources. In this paper, we propose a multiscale convolution-based network model called MSNet-SF, which combines traditional statistical features and deep learning (DL) to balance recognition accuracy and complexity. In the model, the feature information is extracted by two multiscale modules, which consist of unit convolution and three different sizes of convolution kernels arranged in parallel. Additionally, the sparse connectivity of unit convolution enables the network to be more lightweight. Five statistical features (four higher-order cumulants (HOCs) and one zero-centered normalized instantaneous magnitude tightness) are also input into the model and are fully fused with the main feature map by multiplication to achieve complementarity of long-term and short-term features. This approach yields a large performance gain at a small cost and greatly reduces the confusion between QAM16 and QAM64. Simulation results in the RML2018.10A dataset show that the average recognition accuracy of the model improved by 4% after adding the statistical features and achieved an accuracy of more than 97% from 12 dB. 相似文献
15.
深度学习(Deep Learning)是目前最强大的机器学习算法之一,其中卷积神经网络(Convolutional Neural Network, CNN)模型具有自动学习特征的能力,在图像处理领域较其他深度学习模型有较大的性能优势。本文先简述了深度学习的发展史,然后综述了深度学习在超声检测缺陷识别中的应用与发展,从早期浅层神经网络到现在深度学习的应用现状,并借鉴医学影像识别和射线图像识别领域的方法,分析了卷积神经网络对超声图像缺陷识别的适用性。最后,探讨归纳了目前在超声检测图像识别中使用CNN存在的一些问题,及其主要应对策略的研究方向。 相似文献
16.
17.
本文基于人耳听觉模型提出了一种鲁棒性的话者特征参数提取方法。该种方法中,首先由Gamma tone听觉滤波器组和Meddis内耳毛细胞发放模型获得表征听觉神经活动特性的听觉相关图。由听觉神经脉冲发放的锁相特性和双声抑制特性,我们将听觉相关图每个频带中的幅值最大频率分量作为表征当前频带特性的特征参量,于是所有频带的特征参量便构成了表征当前语音段特性的特征矢量;我们采用DCT交换进一步消除各个特征参量之间的相关性,压缩特征矢量的维数。有效性试验表明,该种特征矢量基本上反映了输入语音的谱包络特性;抗噪声性能实验表明,在高斯白噪声和汽车噪声干扰下,这种特征参数比LPCC和MFCC有较小的相对失真;基于矢量量化的文本无关话者辨识表明,对于三种类型的噪声干扰该种特征参数在低信噪比下都获得了较好的识别结果。 相似文献
18.
19.
The quality of feature extraction plays a significant role in the performance of speech emotion recognition. In order to extract discriminative, affect-salient features from speech signals and then improve the performance of speech emotion recognition, in this paper, a multi-stream convolution-recurrent neural network based on attention mechanism (MSCRNN-A) is proposed. Firstly, a multi-stream sub-branches full convolution network (MSFCN) based on AlexNet is presented to limit the loss of emotional information. In MSFCN, sub-branches are added behind each pooling layer to retain the features of different resolutions, different features from which are fused by adding. Secondly, the MSFCN and Bi-LSTM network are combined to form a hybrid network to extract speech emotion features for the purpose of supplying the temporal structure information of emotional features. Finally, a feature fusion model based on a multi-head attention mechanism is developed to achieve the best fusion features. The proposed method uses an attention mechanism to calculate the contribution degree of different network features, and thereafter realizes the adaptive fusion of different network features by weighting different network features. Aiming to restrain the gradient divergence of the network, different network features and fusion features are connected through shortcut connection to obtain fusion features for recognition. The experimental results on three conventional SER corpora, CASIA, EMODB, and SAVEE, show that our proposed method significantly improves the network recognition performance, with a recognition rate superior to most of the existing state-of-the-art methods. 相似文献
20.
《中国物理 B》2021,30(5):54201-054201
We present a ghost handwritten digit recognition method for the unknown handwritten digits based on ghost imaging(GI) with deep neural network, where a few detection signals from the bucket detector, generated by the cosine transform speckle, are used as the characteristic information and the input of the designed deep neural network(DNN), and the output of the DNN is the classification. The results show that the proposed scheme has a higher recognition accuracy(as high as98% for the simulations, and 91% for the experiments) with a smaller sampling ratio(say 12.76%). With the increase of the sampling ratio, the recognition accuracy is enhanced. Compared with the traditional recognition scheme using the same DNN structure, the proposed scheme has slightly better performance with a lower complexity and non-locality property.The proposed scheme provides a promising way for remote sensing. 相似文献