首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
陈婧  张振兴 《光子学报》2021,50(2):103-114
由于高光谱图像分类中现有的无监督波段选择方法无法计算出波段之间的相似性,以及在选择过程中存在的高维度特性,本文提出了一种基于变精度粗糙集的贪婪无监督高光谱波段选择方法。首先利用变精度粗糙集定义了一种新的依赖度量,使得它对变精度粗糙集中误分类参数变得不敏感,从而充分利用波段之间的相似性。其次,提出了一种新的判别准则,找出未选择和已选择波段子集中具有较高和较低的相似性值的波段。然后,采用一阶增量搜索法,逐条选择出所需的信息波段,从而避免大量信息的产生,减少计算复杂度。最后通过使用三个高光谱数据集将所提出的波段选择技术与五种最新技术进行比较。结果显示提出的方法对所有数据集均具有较好的分类精度,且在50%标记像素条件下平均分类精度相对于所有像素点平均分类精度分别仅下降了1.9%,3.1%以及4.1%。所提方法能够保证良好的分类性能与数据集泛化能力,并且对参数具有鲁棒性。  相似文献   

2.
傅里叶变换红外光谱通常包含有大量的波长变量点,对其进行定性分析需要建立稳健的、可解释性的分类模型。稀疏线性判别分析(SLDA)是一种较为新颖和有效的机器学习算法,常用于高维度、小样本数据的变量筛选和判别分析,SLDA通过在线性判别分析中引入正则项,使分类器训练过程和变量选择过程同时完成,不同判别方向上载荷系数的稀疏性则增强了模型的可解释性。采集甘肃不同产地的秦艽样本94个,其中麻花秦艽(Gentiana straminea Maxim)30个,黄管秦艽(Gentiana officinalis)28个,大叶秦艽(Gentiana macrophylla Pall)36个,利用傅里叶变换红外光谱法获得所有样本的光谱图。取其中70个样本构成训练集,剩余24个为测试集。使用训练集建立SLDA模型,对2个判别方向上不为0的载荷系数个数进行网格化寻优,得到了最优的参数空间。利用建立的SLDA模型对测试样本进行预测,其分类准确率达到100%,实现了对三种秦艽的快速、准确鉴别。实验结果表明,与PLS-DA方法相比,SLDA模型在分类准确率、稀疏性及可解释性方面均具有一定优势,是一种新颖、有效的光谱定性分析方法。  相似文献   

3.
种子活力是种子质量的一项重要指标,高活力的种子具有较强的抗逆性、生长优势及生产潜力。而种子活力在种子生理成熟时最高,随后随着贮藏时间的延长而发生着自然不可逆的降低。因此,在播种前及时、准确地对种子活力进行检测和筛选具有重要的实践意义。针对传统种子活力检测方法存在的操作过程复杂繁琐、耗时长、重复性差且对种子有破坏性等缺点,研究尝试利用高光谱成像技术建立单粒小麦种子生活力快速、无损、精确的检测方法。以高温高湿老化后的190粒小麦种子(发芽128粒,不发芽62粒)作为研究样本,先利用可见-近红外(Vis-NIR)高光谱成像系统采集样本种子的光谱图像和进行标准发芽试验,并确保光谱采集试验和标准发芽试验的小麦种子一一对应。随后提取种子光谱图像的感兴趣区域并对其光谱数据进行平均和特征分析。分别采用一阶导数(FD)、均值中心化(MC)、正交信号校正(OSC)和多元散射校正(MSC)对原始光谱数据进行预处理,结合偏最小二乘辨别分析(PLS-DA)建立全波段PLS-DA模型,比较分析,并筛选出最适预处理方法。分别利用无信息变量消除算法(UVE)、竞争性自适应重加权算法(CARS)、连续投影算法(SPA)及耦合不同变量筛选方法对特征波段进行筛选提取,再分别基于所提取出的特征波段建立PLS-DA定性判别模型,对比分析,最终确立提取与单粒小麦种子生活力相关性最高的高光谱特征波段方法体系。结果表明:不同光谱预处理建立的模型其表现有所差异,在MC,FD,OSC和MSC中,采用MC对原始高光谱数据进行预处理,建立的全波段MC-PLS-DA判别模型,其校正集和预测集对小麦种子生活力的整体鉴别正确率分别为82.5%和83.0%,优于原始及其他预处理后建立的全波段PLS-DA判别模型,其校正集和预测集对小麦种子活种子鉴别正确率分别为94.8%和90.6%。进一步对比3种单特征波段提取方法及其耦合分析建模中,发现3种变量筛选方法耦合(UVE-CARS-SPA)的方式能够将光谱全波段的688个变量压缩至8个变量(473,492,811,829,875,880,947和969 nm),利用所筛选出的8个变量建立的MC-UVE-CARS-SPA-PLS-DA模型获得了最优秀的鉴别效果,其校正集和预测集对小麦种子生活力的整体鉴别正确率分别为86.7%和85.1%,较全波段模型(MC-Full-PLS-DA)分别提升了4.2%和2.1%,活种子的鉴别正确率分别为93.8%和84.4%,经过此优秀模型筛选后,种子批最终发芽率可达到93.1%。实验结果表明,基于高光谱成像技术结合UVE-CARS-SPA-PLS-DA模型能够实现对单粒小麦种子生活力的定性判别。研究工作为小麦种子活力的快速、精确且无损的检测提供理论支持。  相似文献   

4.
由工业发展需求,针对菱镁矿石矿物含量不同以及分布不均匀而难以判定其品级的情况,提出一种由近红外光谱技术结合ELM的菱镁矿石品级分类模型。该模型可以实现菱镁矿石品级的快速分类。近红外光谱利用菱镁矿中不同种类含H基团对近红外光谱有不同吸收的特性,用来测定菱镁矿石的成分及其含量,其操作简便、不破坏样品、速度快、准确高效。以辽宁省营口市大石桥的菱镁矿石30组为研究对象,采集菱镁矿石的近红外光谱数据样本30×973。采用主成分分析(PCA)对其进行降维处理,以主元贡献率大于99.99%而得到10维的特征变量值。建立了ELM算法定量分析数学模型,取20组样本为训练样本(包括6组特级,14组非特),其余10组作为测试样本(其中4组特级,6组非特),ELM算法模型的隐含层节点数选取20。为了进一步提高分类效果,提出两种ELM算法模型的改进:采用循环模式对传统ELM的输入权值和阈值进行寻优的精选ELM和在精选ELM基础上进行集成的集成-精选ELM。并与用人工方法、化学方法和BP神经网络模型方法对菱镁矿石样品品级分类作对比。结果表明:近红外光谱和ELM菱镁矿石品级分类模型不论在时间上还是成本上,都具有明显的优势,且其准确率能够达到90%以上,为菱镁矿石品级分类提供了一条新的途径。  相似文献   

5.
基于成像光谱技术的作物杂草识别研究   总被引:5,自引:0,他引:5  
杂草识别是变量喷雾和物理方法精确除草的前提。利用自主设计的地面成像光谱系统在自然环境下获取了胡萝卜幼苗以及马齿苋、牛筋草和地锦等杂草在380~760 nm波长区间的高光谱数据,通过对数据归一化消除光照条件的影响之后,运用逐步法进行波段选择,采用Fisher线性判别方法对杂草与胡萝卜幼苗进行了识别。结果表明,当把每种杂草都作为一类加以精细区分时,运用选择的8个波段建立模型对杂草和胡萝卜幼苗的识别率达85%左右;当把杂草整体作为一类与胡萝卜幼苗进行区分时,运用选择的7个波段识别率高于91%。同时为了设计低成本的杂草识别系统,通过穷举法选择最优的2和3波段组合,其中最优3波段组合对杂草胡萝卜幼苗的识别能力与逐步法选择的5个波段相当,整体识别率达89%。此外发现,红边波段对杂草有着显著的识别能力。  相似文献   

6.
中国是水产品生产和消费大国。由于不同鱼产品的品质和价格差距悬殊,近缘鱼类外观质地相似等特点,鱼产品掺假和错贴标签的现象频发,直接损害了消费者的消费和健康权益,因此实现鱼产品品种品质的快速检测具有重要的现实意义。激光诱导击穿光谱(LIBS)技术采用脉冲激光烧蚀样品表面产生激光诱导等离子体,通过探测等离子体的发射光谱实现待测样品元素组分的定性和定量分析,具有无需(或少量)样品预处理、多元素同时检测,分析速度快的优势,在食品快速检测分析方面具有很大的应用潜力。将LIBS技术结合随机森林(RF)算法用于不同种类鱼产品快速鉴别分析。首先对6种鱼肉样品进行压片处理,采用手持式LIBS分析仪采集其光谱数据,可探测到清晰的C、Mg、CN、Ca、Na、H、K、O等元素组分的特征谱线。将原始光谱数据进行归一化预处理,采用主成分分析方法(PCA)进行聚类,发现海水鱼和淡水鱼样品可以区分,而不同海水鱼之间和不同淡水鱼之间的样品则难以有效区分,说明PCA方法对鱼肉LIBS光谱分类能力有限。之后采用非线性的随机森林算法建立分类模型,经过优化RF模型的决策树个数与决策深度,得到鱼肉样品的整体识别正确率为90%。为进一步提高模型识别精度和分析效率,通过RF模型输出的变量重要性进行光谱特征提取,识别正确率提高到94.44%,且模型输入变量由23 431个减少到597个,模型运算时间显著降低。表明RF模型结合变量重要性提取可以很好地将LIBS光谱中变量重要性高、对分类贡献大的弱信号提取出来,有效剔除了谱线噪声、背景、以及其他不相关变量的干扰,提高模型的识别精度和分析效率。也验证了手持式LIBS设备结合机器学习方法用于市场鱼产品快速鉴别分析的可行性。  相似文献   

7.
如何从海量或高维数据中“提纯”出有用的信息,这是当前数据分析面临的一个巨大的挑战,也是当前研究的一个热点。变量筛选技术能够从众多、复杂的量测数据中提取出特征信息变量,达到简化多元模型乃至提高模型预测性能等目的。在光谱分析中,来自噪声等诸多因素的影响,量测数据会不可避免地包含干扰和无关信息变量,以及变量间存在的多重共线性,这些都会影响模型的稳健性和预测能力。近年来变量(波长)筛选方法在光谱解析领域的研究与应用中取得了较大的进展。结合国内外相关研究文献和作者的研究体会,不仅仅综述了近红外光谱,还综述了中红外光谱、拉曼光谱等众多筛选变量的方法的提出、特点、发展、类别、比较和近五年来在不同领域的应用进展。其中,评价变量重要性的参数及其标准或阈值的选择、搜索变量的策略和途径是变量筛选方法的关键。而且每种方法都具有各自的优势和局限性,实际使用中要根据方法自身特点结合目标体系的特征选择合适的方法。重点内容:(1)对比了光谱数据分析中常用的波长筛选和波段筛选方法;(2)对比了基于PLS模型参数的不同变量筛选方法的原理和特点;(3)根据搜索和筛选变量策略的不同将变量筛选方法进行分类评述。最后,围绕在解析实际复杂体系中变量筛选方法出现的过拟合、不稳定等问题进行了讨论并提出相应的解决措施,同时对变量筛选方法的研究趋势、发展前景和应用方向进行了展望。其中,新的评价变量重要性的判据和搜索变量的策略等工作仍需要展开深入地研究。期望本综述能够对光谱变量筛选的后续研究及应用起到积极的推动作用。  相似文献   

8.
Selection of biologically relevant genes from high-dimensional expression data is a key research problem in gene expression genomics. Most of the available gene selection methods are either based on relevancy or redundancy measure, which are usually adjudged through post selection classification accuracy. Through these methods the ranking of genes was conducted on a single high-dimensional expression data, which led to the selection of spuriously associated and redundant genes. Hence, we developed a statistical approach through combining a support vector machine with Maximum Relevance and Minimum Redundancy under a sound statistical setup for the selection of biologically relevant genes. Here, the genes were selected through statistical significance values and computed using a nonparametric test statistic under a bootstrap-based subject sampling model. Further, a systematic and rigorous evaluation of the proposed approach with nine existing competitive methods was carried on six different real crop gene expression datasets. This performance analysis was carried out under three comparison settings, i.e., subject classification, biological relevant criteria based on quantitative trait loci and gene ontology. Our analytical results showed that the proposed approach selects genes which are more biologically relevant as compared to the existing methods. Moreover, the proposed approach was also found to be better with respect to the competitive existing methods. The proposed statistical approach provides a framework for combining filter and wrapper methods of gene selection.  相似文献   

9.
10.
Multi-label learning is dedicated to learning functions so that each sample is labeled with a true label set. With the increase of data knowledge, the feature dimensionality is increasing. However, high-dimensional information may contain noisy data, making the process of multi-label learning difficult. Feature selection is a technical approach that can effectively reduce the data dimension. In the study of feature selection, the multi-objective optimization algorithm has shown an excellent global optimization performance. The Pareto relationship can handle contradictory objectives in the multi-objective problem well. Therefore, a Shapley value-fused feature selection algorithm for multi-label learning (SHAPFS-ML) is proposed. The method takes multi-label criteria as the optimization objectives and the proposed crossover and mutation operators based on Shapley value are conducive to identifying relevant, redundant and irrelevant features. The comparison of experimental results on real-world datasets reveals that SHAPFS-ML is an effective feature selection method for multi-label classification, which can reduce the classification algorithm’s computational complexity and improve the classification accuracy.  相似文献   

11.
在柴油、汽油、重质燃料油等成品油和原油等溢油油源的区分方面,荧光光谱结合模式识别手段得到了广泛的应用。传统的三维荧光光谱分析方法虽然能够获得溢油样品丰富的成分信息,但难以适应现场应用的要求,目前还停留在实验室检测的阶段。发展适用于现场应用的原油识别方法,对于海洋溢油污染的快速响应与处理意义重大。面向激光雷达的需要,发展了一种基于激光诱导时间分辨荧光手段、结合支持向量机(SVM)模型的原油识别方法,从时间和波长两个不同维度出发,通过对时间窗口和波长范围的选取进行优化,获得了理想的油种识别准确率。实验结果表明通过选取ICCD探测延时为54~74 ns可以将分类正确率从全谱线数据的83.3%提高到88.1%。通过选取波长范围为387.00~608.87 nm的谱线数据,可将疑似油种的分类正确率从全谱线数据的84%提高到100%。激光荧光雷达在实际工作中,受波浪、运载平台晃动等因素的影响,探测延时会出现一定的波动。本文介绍的分类识别方法通过时间和波长两个维度的筛选,更加适用于现场探测数据的识别,并进一步凸显了原油时间分辨荧光光谱特征,为疑似油种分类识别过程中数据量的压缩提供了重要依据。  相似文献   

12.
Human activity recognition (HAR) plays a vital role in different real-world applications such as in tracking elderly activities for elderly care services, in assisted living environments, smart home interactions, healthcare monitoring applications, electronic games, and various human–computer interaction (HCI) applications, and is an essential part of the Internet of Healthcare Things (IoHT) services. However, the high dimensionality of the collected data from these applications has the largest influence on the quality of the HAR model. Therefore, in this paper, we propose an efficient HAR system using a lightweight feature selection (FS) method to enhance the HAR classification process. The developed FS method, called GBOGWO, aims to improve the performance of the Gradient-based optimizer (GBO) algorithm by using the operators of the grey wolf optimizer (GWO). First, GBOGWO is used to select the appropriate features; then, the support vector machine (SVM) is used to classify the activities. To assess the performance of GBOGWO, extensive experiments using well-known UCI-HAR and WISDM datasets were conducted. Overall outcomes show that GBOGWO improved the classification accuracy with an average accuracy of 98%.  相似文献   

13.
近红外光谱具有高维小样本的特点,变量选择是提高定量分析模型稳健性和可解释性的一种有效方法。确定独立筛选(SIS)是一种基于边际相关性的超高维数据变量选择方法,广泛用于基因微阵列数据的变量选择。SIS具有将数据维度降低至样本大小规模的能力,其降维能力与LASSO相当,在相当宽泛的近似条件下,由于具有安全筛选性质,所有重要变量被保留的概率趋于1。基于确定独立筛选偏最小二乘(SIS-SPLS)的变量选择是一种迭代式的SIS变量选择方法,首先利用SIS方法完成光谱重要变量的初选;然后根据重要变量的边际相关性大小进行逐步前向选择:建立偏最小二乘回归模型,依据贝叶斯信息准则(BIC)确定最终的变量选择结果。SIS-SPLS以逐步前向选择的方式实现对重要变量的增量式筛选,随着潜变量个数的增加及因变量残差的逐步减小,SIS-SPLS方法选择的变量个数将趋于稳定。然而仅以边际相关性对变量重要性进行评价,当光谱变量个数远大于样本数时,该方法也存在选择的变量过多、变量选择结果不够稳健等问题。为进一步提高小样本情况下变量选择的稳健性,将集成学习引入SIS-SPLS方法之中,提出了一种集成SIS-SPLS变量选择方法(Ensemble-SISPLS)。该方法首先对校正集样本进行自助重采样,对采样得到的每一个校正子集分别使用SIS-SPLS方法进行变量筛选,通过投票机制并设置频次阈值对所有校正子集的变量选择结果进行集成,选择出现频次大于给定阈值的变量并建立偏最小二乘回归模型,计算5折交叉验证均方根误差。对频次阈值和潜变量个数两个关键参数使用网格搜索法进行优选,根据子模型的交叉验证均方根误差和变量个数对子模型性能进行综合评价,以最优子模型包含的变量作为最终的变量选择结果。分别在Corn数据集和当归数据集上进行变量选择实验,比较Ensemble-SISPLS,SIS-SPLS和UVE-PLS三种变量选择方法的性能。其中当归数据集共77个样本,样本采自甘肃岷县和渭源县,使用Nicolet-6700型近红外光谱仪扫描得到所有样本的近红外光谱并对当归中的阿魏酸含量进行预测。Ensemble-SISPLS方法在Corn数据集上选择的变量个数、RMSEP和决定系数分别为22,0.000 8和0.999 8;SIS-SPLS方法在Corn数据集上选择的变量个数、RMSEP和决定系数分别为97,0.007 3和0.998 8。Ensemble-SISPLS方法在当归数据集上选择的变量个数、RMSEP和决定系数分别为24,0.018 1和0.996 3;SIS-SPLS方法在当归数据集上选择的变量个数、RMSEP和决定系数分别为38,0.022 6和0.994 3。结果表明,该方法进一步提高了变量选择结果的稳健性和预测能力。Ensemble-SISPLS变量选择方法有效结合了SIS-SPLS较强的变量选择能力和集成学习良好的泛化能力,提高了变量选择的稳健性。此外,由于在子模型的预测能力和变量个数之间进行了折中,一定程度上减少了选择变量的个数,提高了模型的可解释性。  相似文献   

14.
Feature extraction and variable selection are two important issues in monitoring and diagnosing a planetary gearbox. The preparation of data sets for final classification and decision making is usually a multi-stage process. We consider data from two gearboxes, one in a healthy and the other in a faulty state. First, the gathered raw vibration data in time domain have been segmented and transformed to frequency domain using power spectral density. Next, 15 variables denoting amplitudes of calculated power spectra were extracted; these variables were further examined with respect to their diagnostic ability. We have applied here a novel hybrid approach: all subset search by using multivariate linear regression (MLR) and variables shrinkage by the least absolute selection and shrinkage operator (Lasso) performing a non-linear approach. Both methods gave consistent results and yielded subsets with healthy or faulty diagnostic properties.  相似文献   

15.
Imbalance ensemble classification is one of the most essential and practical strategies for improving decision performance in data analysis. There is a growing body of literature about ensemble techniques for imbalance learning in recent years, the various extensions of imbalanced classification methods were established from different points of view. The present study is initiated in an attempt to review the state-of-the-art ensemble classification algorithms for dealing with imbalanced datasets, offering a comprehensive analysis for incorporating the dynamic selection of base classifiers in classification. By conducting 14 existing ensemble algorithms incorporating a dynamic selection on 56 datasets, the experimental results reveal that the classical algorithm with a dynamic selection strategy deliver a practical way to improve the classification performance for both a binary class and multi-class imbalanced datasets. In addition, by combining patch learning with a dynamic selection ensemble classification, a patch-ensemble classification method is designed, which utilizes the misclassified samples to train patch classifiers for increasing the diversity of base classifiers. The experiments’ results indicate that the designed method has a certain potential for the performance of multi-class imbalanced classification.  相似文献   

16.
In order to increase the classification accuracy, a new feature selection method, RFFIM-PCA, based on the random forest feature importance measure (RFFIM) and principal component analysis (PCA) for analyzing the near-infrared (NIR) spectra of tobacco, is presented in this paper. We applied the method to the classification of cigarettes' qualitative evaluation and also compared it with other methods. The result showed that RFFIM-PCA discriminates the high-dimensional data effectively and can be used to identify the cigarettes' quality. The feature selection filters the noises, while PCA eliminates the redundant features and reduces the dimensionalities as well. The experimental results showed that RFFIM-PCA successfully eliminated the noises and redundant features in high-dimensional data, leading to a promising improvement on the feature selection and classification accuracy.  相似文献   

17.
18.
In order to improve prediction accuracy of calibration in human blood glucose noninvasive measurement using near infrared (NIR) spectroscopy, a modified uninformative variable elimination (mUVE) method combined with kernel partial least squares (KPLS), named as mUVE–KPLS, is proposed as an alternative nonlinear modeling strategy. Under the mUVE method, high-frequency noise and matrix background can be eliminated simultaneously, which provide a optimized data for calibration in sequence; under the kernel trick, a nonlinear relationship of response variable and predictor variables is constructed, which is different with PLS that is a complex model and inappropriate to describe the underlying data structure with significant nonlinear characteristics. Two NIR spectra data of basic research experiments (simulated physiological solution samples experiment in vitro and human noninvasive measurement experiment in vivo) are introduced to evaluate the performance of the proposed method. The results indicate that, after elimination high-frequency noise and matrix background from optical absorption of water in NIR region, a high-quality spectra data is employed in calibration; and under the selection of kernel function and kernel parameter, the best prediction accuracy can be got by KPLS with Gaussian kernel compared with Spline-PLS and PLS. It is encouraging that mUVE–KPLS is a promising nonlinear calibration strategy with higher prediction accuracy for blood glucose noninvasive measurement using NIR spectroscopy.  相似文献   

19.
A phenomenology-based virtual metrology (VM) for monitoring SiO2 etching depth was proposed by Park (2015). It achieved high prediction accuracy by introducing newly developed plasma information (PI) variables as designated inputs, called PI-VM. The PI variables represent the state of the plasma, the sheath, and the target during the process. We investigate how a PI variable can help to improve prediction accuracy of VM and how it plays a special role in the statistical selection. We choose only PIEEDF among the three PI variables to focus on the investigation. The PIEEDF is determined from the ratio of line-intensities of optical emission spectroscopy. We apply Pearson's correlation filter (PCF), principal component analysis (PCA), and stepwise variable selection (SVS) as statistical selection methods on the variables set including PIEEDF or not. Multilinear regression is used to model the VM. This study reveals that PIEEDF variable is a good variable in terms of independence from other input variables and explanatory power for an output variable. Especially, VM using SVS method applied to variable sets including PIEEDF achieves the highest accuracy, comparable to Park's PI-VM. This study shows that PIEEDF variable is particularly useful for monitoring of the fine variations in semiconductor manufacturing process and it also extends the utilization of OES sensor data.  相似文献   

20.
王新  夏广远 《应用声学》2023,42(5):954-962
面向管道法兰连接松动引起的泄漏检测需求,为解决数据样本不足和减少特征指标手动选取的繁琐环节。本文,考虑到生成性对抗网络(GAN)作为数据扩充工具,已被证明能够生成与真实数据相似的样本数据。同时,卷积神经网络(CNN)作为一种深度学习方法,为自动提取数据的特征提供了一种有效的方法。开展了基于GAN和CNN的铝合金管道法兰连接松动泄漏检测研究。首先,搭建管道泄漏标定和数据采集实验台,利用声发射技术获取不同等级的原始泄漏信号。其次,采用GAN生成样本数据扩充原始数据。同时,为了评估生成模型的性能,引入统计特评估生成质量。最后,将生成的样本数据与原始数据设置为不同训练集,基于卷积神经网络构建智能分类检测模型,应用于管道泄漏检测。同时,分类检测结果与小样本智能分类方法SVM进行了比较,实验结果表明,基于GAN和CNN构建的智能分类模型可显著提高管道法兰连接松动泄漏检测精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号