首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
针对近红外光谱高维、高冗余、非线性和小样本等特点导致光谱相似性度量时出现的“维度灾难”,提出一种基于核映射和rank-order距离的局部保持投影(KRLPP)算法。首先将光谱数据经过核变换映射到更高维空间,有效保证了流形结构的非线性特征。然后改进局部保持投影(LPP)算法对数据进行降维操作,将rank-order距离替代传统的欧氏距离或测地线距离,通过共享邻近点的信息,得到更加准确的局部邻域关系。最后在低维空间通过距离的计算实现光谱的度量。该方法不仅有效解决了高维空间存在的“距离失效”问题,同时还提高了相似性度量结果的精度。为了验证KRLPP算法的有效性,首先根据降维前后数据集信息残差的变化确定了最佳参数近邻点的个数k和降维后的维数d。其次,从光谱降维投影效果和模型分类效果两个角度与PCA,LPP和INLPP算法进行了对比,结果表明KRLPP算法对于烟叶的部位有较好的区分能力,降维效果以及对于不同部位的正确识别率明显优于PCA,LPP和INLPP。最后,从某品牌卷烟叶组配方中选取了5个代表性烟叶作为目标烟叶,分别采用PCA,LPP和KRLPP方法从300个用于配方维护的烟叶样品中为每个目标烟叶寻找相似烟叶,并从化学成分和感官评价两方面对替换前后的烟叶及叶组配方进行了评价分析。其中LPP和KRLPP用于降维的参数选择保持一致,PCA选择前6个主成分。结果表明,由KRLPP选出的替换烟叶与替换配方在总糖、还原糖、总烟碱、总氮等化学成分以及香气、烟气、口感等感官指标上较PCA、LPP方法差异最小,相似性度量准确度最高。该方法可应用于配方产品替换原料的查找,辅助企业实现产品质量的维护。  相似文献   

2.
铁矿资源是我国国民经济基础产业中的重要组成要素,在我国经济发展中有举足轻重的地位。铁矿品位的检定效率对铁矿石开采效率有重大影响。目前,铁矿石品位的化学分析检定法,不仅存在成本较高,化验周期长的问题,更主要的是其无法实现铁矿品位原位测定,相对配矿流程存在滞后效应,无法有效降低矿石开采的损失贫化率;基于可见光-近红外光谱分析的铁矿品位原位测定技术是解决这一问题的有效途径。以225个红岭矽卡岩型铁矿测试样本的可见光-近红外光谱数据及化学分析数据为数据源,首先对原始数据进行了平滑处理,并分析了矽卡岩型铁矿可见光-近红外光谱特征,然后利用倒数对数、多元散射校正(MSC)两种预处理方法对平滑后的光谱数据进行处理,再分别以主成分分析法(PCA)、遗传算法(GA)两种降维算法对预处理前后的光谱数据进行了处理,获取了六种不同预处理组合算法处理后的数据源。其中以PCA降维算法所降维数分别为3维、3维、7维;以GA降维算法所降维数分别为477维、489维、509维。最后基于随机森林(RF)和极限学习机(ELM)建立了矽卡岩型矿石金属铁品位的定量反演模型,以决定系数(R2)、均方根误差(RMSE)和平均相对误差(MRE)三个指标分别对模型的稳定性、精确度、可信度进行评价。结果表明,经MSC处理及PCA降维后的数据基于ELM算法建立的定量反演模型效果最优,其R2可达0.99、RMSE为0.005 7、MRE为2.0%,该方法所建模型对红岭矽卡岩型铁矿品位反演精度有明显的提升。对矽卡岩铁矿品位的实时、快速分析提供了一种有效的方法,对实现矽卡岩型铁矿的高效开采具有重要的现实意义。  相似文献   

3.
4.
Principal component analysis (PCA) is a popular technique in remote sensing for dimensionality reduction. While PCA is suitable for data compression, it is not necessarily an optimal technique for feature extraction, particularly when the features are exploited in supervised learning applications (Cheriyadat and Bruce, 2003) [1]. Preserving features belonging to the target is very crucial to the performance of target detection/recognition techniques. Fukunaga–Koontz Transform (FKT) based supervised band reduction technique can be used to provide this requirement. FKT achieves feature selection by transforming into a new space in where feature classes have complimentary eigenvectors. Analysis of these eigenvectors under two classes, target and background clutter, can be utilized for target oriented band reduction since each basis functions best represent target class while carrying least information of the background class. By selecting few eigenvectors which are the most relevant to the target class, dimension of hyperspectral data can be reduced and thus, it presents significant advantages for near real time target detection applications. The nonlinear properties of the data can be extracted by kernel approach which provides better target features. Thus, we propose constructing kernel FKT (KFKT) to present target oriented band reduction. The performance of the proposed KFKT based target oriented dimensionality reduction algorithm has been tested employing two real-world hyperspectral data and results have been reported consequently.  相似文献   

5.
Singular spectrum analysis and its multivariate or multichannel singular spectrum analysis(MSSA)variant are effective methods for time series representation,denoising and prediction,with broad application in many fields.However,a key element in MSSA is singular value decomposition of a high-dimensional matrix stack of component matrices,where the spatial(structural)information among multivariate time series is lost or distorted.This vector-space model also leads to difficulties including high dimensionality,small sample size,and numerical instability when applied to multi-dimensional time series.We present a generalized multivariate singular spectrum analysis(GMSSA)method to simultaneously decompose multivariate time series into constituent components,which can overcome the limitations of conventional multivariate singular spectrum analysis.In addition,we propose a Samp En-based method to determine the dominant components in GMSSA.We demonstrate the effectiveness and efficiency of GMSSA to simultaneously de-noise multivariate time series for attractor reconstruction,and to predict both simulated and real-world multivariate noisy time series.  相似文献   

6.
As a powerful tool for measuring complexity and randomness, multivariate multi-scale permutation entropy (MMPE) has been widely applied to the feature representation and extraction of multi-channel signals. However, MMPE still has some intrinsic shortcomings that exist in the coarse-grained procedure, and it lacks the precise estimation of entropy value. To address these issues, in this paper a novel non-linear dynamic method named composite multivariate multi-scale permutation entropy (CMMPE) is proposed, for optimizing insufficient coarse-grained process in MMPE, and thus to avoid the loss of information. The simulated signals are used to verify the validity of CMMPE by comparing it with the often-used MMPE method. An intelligent fault diagnosis method is then put forward on the basis of CMMPE, Laplacian score (LS), and bat optimization algorithm-based support vector machine (BA-SVM). Finally, the proposed fault diagnosis method is utilized to analyze the test data of rolling bearings and is then compared with the MMPE, multivariate multi-scale multiscale entropy (MMFE), and multi-scale permutation entropy (MPE) based fault diagnosis methods. The results indicate that the proposed fault diagnosis method of rolling bearing can achieve effective identification of fault categories and is superior to comparative methods.  相似文献   

7.
杨丽荣  江川  黎嘉骏  曹冲  周俊 《应用声学》2023,42(5):971-983
为了获取岩石破裂过程有效的声发射信号特征,更好的对岩石破裂状态进行分类,提出一种基于流形学习算法的LLE特征融合方法进行数据降维。以红砂岩为研究对象设计室内单轴压缩实验采集信号,然后对原始声发射信号预处理并对信号进行特征提取,以时域、频域下的特征向量重新组合成一组新的多维特征向量,采用线性主元(PCA)和流形学习LLE算法分别进行降维。比较两种算法降维后融合特征的聚类效果二维和三维分布图,使用LLE算法降维后,四种状态分布相对更近,呈一条水平线趋势,且各状态交叉混叠数目较少,第一状态没有一个样本错判,且四个状态相比于PCA降维后的聚类效果更集中。再比较两种算法降维后融合特征的敏感度之和,LLE算法融合特征敏感度之和远大于PCA算法,说明经过LLE算法降维后得到的融合特征更多地表征了原始信号包含的局部信息同时证明了LLE算法相比PCA算法具有更好的聚类效果。最后经LLE特征融合下的砂岩破裂状态分类实验验证,融合特征后的识别率相对单一的时域特征识别提高了6%。表明该方法能显著提高岩石破裂状态分类的识别率,降维性能相对突出。  相似文献   

8.
石油污染的出现,导致生态环境遭到破坏。因此,油类识别方法的研究对于环境的保护具有重要意义。采用荧光光谱法获得石油光谱数据,并对其进行预处理,再通过降维方法来提取特征信息,最后利用模式识别算法进行分类,从而可以实现对油类的定性分析,因此研究一种更高效的数据降维方法以及识别分类算法极其重要。基于三维荧光光谱技术,利用稀疏主成分分析(SPCA)对FS920光谱仪测得的荧光光谱数据进行特征提取,再利用支持向量机(SVM)算法对提取的特征数据进行分类识别,获得了一种更加高效的油类识别方法。首先,利用海水和十二烷基硫酸钠(SDS)配制成浓度为0.1 mol·L-1的胶束溶液,将其作为溶剂配制柴油、航空煤油、汽油以及润滑油各20种不同浓度的溶液;然后,利用FS920光谱仪测得样本溶液的三维荧光光谱数据,对得到的光谱数据进行预处理;最后,对预处理后的数据分别利用SPCA和主成分分析(PCA)进行特征提取,再利用SVM和K最近邻(KNN)两种模式识别算法对特征向量进行分类,最终得到四种模型PCA-KNN,SPCA-KNN,PCA-SVM以及SPCA-SVM的分类结果。研究结果表明,由四种模型得到的分类准确率分别为85%,90%,90%和95%,其中,在同种分类算法中,利用SPCA进行特征提取得到的分类准确率均比PCA的准确率高5%,因此可知,SPCA的稀疏性具有突出主要成分的作用,在提取光谱特征时能够减小非必要成分的影响,并且载荷矩阵的稀疏化可以去除变量之间的冗余信息,优化降维特征信息,为后续分类提供更有效的数据特征信息;在同种特征提取算法下,利用SVM算法进行分类得到的分类准确率均比KNN算法得到的准确率高5%,表明SVM算法在分类中更具有优势。因此,本文利用三维荧光光谱技术结合SPCA和SVM算法,实现了对石油的准确识别与分类,为今后对石油污染物的高效检测提供了新思路。  相似文献   

9.
Shi-Jie Pan 《中国物理 B》2022,31(6):60304-060304
Neighborhood preserving embedding (NPE) is an important linear dimensionality reduction technique that aims at preserving the local manifold structure. NPE contains three steps, i.e., finding the nearest neighbors of each data point, constructing the weight matrix, and obtaining the transformation matrix. Liang et al. proposed a variational quantum algorithm (VQA) for NPE [Phys. Rev. A 101 032323 (2020)]. The algorithm consists of three quantum sub-algorithms, corresponding to the three steps of NPE, and was expected to have an exponential speedup on the dimensionality n. However, the algorithm has two disadvantages: (i) It is not known how to efficiently obtain the input of the third sub-algorithm from the output of the second one. (ii) Its complexity cannot be rigorously analyzed because the third sub-algorithm in it is a VQA. In this paper, we propose a complete quantum algorithm for NPE, in which we redesign the three sub-algorithms and give a rigorous complexity analysis. It is shown that our algorithm can achieve a polynomial speedup on the number of data points m and an exponential speedup on the dimensionality n under certain conditions over the classical NPE algorithm, and achieve a significant speedup compared to Liang et al.'s algorithm even without considering the complexity of the VQA.  相似文献   

10.
With population explosion and globalization, the spread of infectious diseases has been a major concern. In 2019, a newly identified type of Coronavirus caused an outbreak of respiratory illness, popularly known as COVID-19, and became a pandemic. Although enormous efforts have been made to understand the spread of COVID-19, our knowledge of the COVID-19 dynamics still remains limited. The present study employs the concepts of chaos theory to examine the temporal dynamic complexity of COVID-19 around the world. The false nearest neighbor (FNN) method is applied to determine the dimensionality and, hence, the complexity of the COVID-19 dynamics. The methodology involves: (1) reconstruction of a single-variable COVID-19 time series in a multi-dimensional phase space to represent the underlying dynamics; and (2) identification of “false” neighbors in the reconstructed phase space and estimation of the dimension of the COVID-19 series. For implementation, COVID-19 data from 40 countries/regions around the world are studied. Two types of COVID-19 data are analyzed: (1) daily COVID-19 cases; and (2) daily COVID-19 deaths. The results for the 40 countries/regions indicate that: (1) the dynamics of COVID-19 cases exhibit low- to medium-level complexity, with dimensionality in the range 3 to 7; and (2) the dynamics of COVID-19 deaths exhibit complexity anywhere from low to high, with dimensionality ranging from 3 to 13. The results also suggest that the complexity of the dynamics of COVID-19 deaths is greater than or at least equal to that of the dynamics of COVID-19 cases for most (three-fourths) of the countries/regions. These results have important implications for modeling and predicting the spread of COVID-19 (and other infectious diseases), especially in the identification of the appropriate complexity of models.  相似文献   

11.
Linear regression (LR) is a core model in supervised machine learning performing a regression task. One can fit this model using either an analytic/closed-form formula or an iterative algorithm. Fitting it via the analytic formula becomes a problem when the number of predictors is greater than the number of samples because the closed-form solution contains a matrix inverse that is not defined when having more predictors than samples. The standard approach to solve this issue is using the Moore–Penrose inverse or the L2 regularization. We propose another solution starting from a machine learning model that, this time, is used in unsupervised learning performing a dimensionality reduction task or just a density estimation one—factor analysis (FA)—with one-dimensional latent space. The density estimation task represents our focus since, in this case, it can fit a Gaussian distribution even if the dimensionality of the data is greater than the number of samples; hence, we obtain this advantage when creating the supervised counterpart of factor analysis, which is linked to linear regression. We also create its semisupervised counterpart and then extend it to be usable with missing data. We prove an equivalence to linear regression and create experiments for each extension of the factor analysis model. The resulting algorithms are either a closed-form solution or an expectation–maximization (EM) algorithm. The latter is linked to information theory by optimizing a function containing a Kullback–Leibler (KL) divergence or the entropy of a random variable.  相似文献   

12.
Subject-level resting-state fMRI (RS-fMRI) spatial independent component analysis (sICA) may provide new ways to analyze the data when performed in the sliding time window. However, whether principal component analysis (PCA) and voxel-wise variance normalization (VN) are applicable pre-processing procedures in the sliding-window context, as they are for regular sICA, has not been addressed so far. Also model order selection requires further studies concerning sliding-window sICA. In this paper we have addressed these concerns. First, we compared PCA-retained subspaces concerning overlapping parts of consecutive temporal windows to answer whether in-window PCA and VN can confound comparisons between sICA analyses in consecutive windows. Second, we compared the PCA subspaces between windowed and full data to assess expected comparability between windowed and full-data sICA results. Third, temporal evolution of dimensionality estimates in RS-fMRI data sets was monitored to identify potential challenges in model order selection in a sliding-window sICA context. Our results illustrate that in-window VN can be safely used, in-window PCA is applicable with most window widths and that comparisons between windowed and full data should not be performed from a subspace similarity point of view. In addition, our studies on dimensionality estimates demonstrated that there are sustained, periodic and very case-specific changes in signal-to-noise ratio within RS-fMRI data sets. Consequently, dimensionality estimation is needed for well-founded model order determination in the sliding-window case. The observed periodic changes correspond to a frequency band of ≤ 0.1 Hz, which is commonly associated with brain activity in RS-fMRI and become on average most pronounced at window widths of 80 and 60 time points (144 and 108 s, respectively). Wider windows provided only slightly better comparability between consecutive windows, and 60 time point or shorter windows also provided the best comparability with full-data results. Further studies are needed to determine the cause for dimensionality variations.  相似文献   

13.
Software aging is a phenomenon referring to the performance degradation of a long-running software system. This phenomenon is an accumulative process during execution, which will gradually lead the system from a normal state to a failure-prone state. It is a crucial challenge for system reliability to predict the Aging-Related Failures (ARFs) accurately. In this paper, permutation entropy (PE) is modified to Multidimensional Multi-scale Permutation Entropy (MMPE) as a novel aging indicator to detect performance anomalies, since MMPE is sensitive to dynamic state changes. An experiment is set on the distributed database system Voldemort, and MMPE is calculated based on the collected performance metrics during execution. Finally, based on MMPE, a failure prediction model using the machine learning method to reveal the anomalies is presented, which can predict failures with high accuracy.  相似文献   

14.
Principal component analysis (PCA), also known as proper orthogonal decomposition or Karhunen-Loève transform, is commonly used to reduce the dimensionality of a data set with a large number of interdependent variables. PCA is the optimal linear transformation with respect to minimizing the mean square reconstruction error but it only considers second-order statistics. If the data have non-linear dependencies, an important issue is to develop a technique which takes higher order statistics into account and which can eliminate dependencies not removed by PCA. Recognizing the shortcomings of PCA, researchers in the field of statistics and neural networks have developed non-linear extensions of PCA. The purpose of this paper is to present a non-linear generalization of PCA, called VQPCA. This algorithm builds local linear models by combining PCA with clustering of the input space. This paper concludes by observing from two illustrative examples that VQPCA is potentially a more effective tool than conventional PCA.  相似文献   

15.
A novel joint kernel principal component analysis (PCA) and relational perspective map (RPM) method called KPmapper is proposed for hyperspectral dimensionality reduction and spectral feature recognition. Kernel PCA is used to analyze hyperspectral data so that the major information corresponding to features can be better extracted. RPM is used to visualize hyperspectral data through two-dimensional (2D) maps, and it is an efficient approach to discover regularities and extract information by partitioning the data into pieces and mapping them onto a 2D space. The experimental results prove that the KPmapper algorithm can effectively obtain the intrinsic features in nonlinear high dimensional data. It is useful and impressing for dimensionality reduction and spectral feature recognition.  相似文献   

16.
基于正交投影散度的高光谱遥感波段选择算法   总被引:2,自引:0,他引:2  
由于高光谱数据的海量高维特征,对其进行降维处理成为高光谱遥感研究的一个重要问题.波段选择算法由于能够有效地保留原始数据的信息,在高光谱数据降维及后续的遥感识别与分类等方面具有明显的优越性.文章提出了一种基于正交投影散度(OPD)的波段选择方法,该方法继承了正交子空间投影(OSP)算法的特点,通过把原始数据投影到特征空间...  相似文献   

17.
激光诱导击穿光谱技术具有微损、原位、快速分析的特点,在样品分类识别、成分分析等领域有广阔的应用前景。为探索该技术在天然地质样品识别应用的可行性,提出了一种自组织特征映射神经网络结合相关判别对天然地质样品LIBS光谱分类识别的方法。为减小全谱中背景噪声等不相关数据干扰、降低计算量,在元素谱线归属的基础上进行了特征谱线提取,实现了高维光谱数据的降维。以特征谱数据为输入建立网络训练模型,得到具有输入样本特征的权向量,通过权向量与待测样本进行相关分析可以实现样品分类。对16种天然地质样品的分类算法实验证明,在全谱、主成分降维和特征谱段三种数据处理方法中,特征谱的降维和提取LIBS数据主特征效果最优。改进的SOM网络结合相关判别算法比支持向量机方法和直接应用SOM网络方法的分类准确度更高,初步证实了该方法的有效性。  相似文献   

18.
近红外光谱是热门的食品检测方法之一,对于这种高维光谱数据的分析常常需采用数据降维算法提取其中的特征,然而绝大多数算法都只能针对单个数据集进行分析。虽然已有基于对比学习的对比主成分分析成功应用于不同水果表面农残的近红外光谱检测中,但是该方法只能以线性的方式组合原有特征,特征提取效果存在局限性,并且需要调节对比参数来控制背景集影响,需要消耗更大的时间成本。cVAE(contrastive variational autoencoder)是一种基于对比学习和变分自编码器的改进算法,被用于图像去噪和RNA序列分析中,它仍然具备分析多个数据集的特点,同时因为组合了神经网络的概率生成模型而具备了提取非线性隐含特征的能力。将cVAE算法应用于近红外光谱分析,建立了准确的近红外光谱数据降维模型。在实际验证中,使用cVAE算法对购买的不同品牌和批次纯牛奶中掺假三聚氰胺进行检测。结果表明,使用VAE算法只能区分出不同品牌和批次的纯牛奶,而其中是否掺假三聚氰胺这一重要信息无法表现出来;而使用cVAE算法进行数据分析时,由于添加了背景数据集分离了无关变量,能够清晰的将有无掺假三聚氰胺的样本分类。这说明了,cV...  相似文献   

19.
应用近红外光谱技术快速检测果醋糖度   总被引:7,自引:0,他引:7  
为了对果醋糖度值进行快速准确检测,应用近红外光谱技术并结合最小二乘支持向量机分析方法建立了果醋糖度检测模型.应用近红外透射光谱获取五种类型共计300份果醋样本的光谱透射曲线,利用主成分分析方法对原始光谱数据进行降维处理,根据主成分的累计贡献率选取6个主成分.选取的主成分即作为光谱优化特征子集以替代原来复杂的光谱数据.随后将300份果醋样本数据随机分为定标集和预测集,利用最小二乘支持向量机在225个定标集样本数据基础上建立起果醋糖度预测模型,应用此模型对75个预测集样本进行糖度预测.根据预测均方根误差(RMSEP)和预测结果的相关系数(r)对预测模型进行评价,利用此模型得到的样本糖度预测值r=0.993 9,RMSEP=0.363,均达到了较好的预测效果.  相似文献   

20.
针对大白菜农药残留传统化学检测手段存在前期处理过程繁琐、检测周期长等不足,提出了一种快速无损识别大白菜农药残留种类的方法.以1组无农药残留和4组含有均匀喷洒农药(毒死蜱、乐果、灭多威和氯氰菊酯)的大白菜样本为研究对象(药液浓度配比分别为0.10,1.00,0.20和2.00 mg·kg-1),经12小时自然吸收后,利用...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号