首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
针对人类和非人类血液种属鉴别对无损、 高效分析方法的需求, 结合随机森林(Random Forest)和AdaBoost(Adaptive Boosting Algorithm)算法, 提出了一种血液种属鉴别方法(RF_AdaBoost). 该方法将RF作为AdaBoost的弱分类器, 以达到提高模型鉴别准确度, 增强模型鲁棒性的目的. 采用RF、 支持向量机(SVM)、 极限学习机(ELM)、 核极限学习机(KELM)、 堆栈自编码网络(SAE)、 反向传播网络(BP)、 主成分分析-线性判别法(PCA-LDA)及偏最小二乘判别分析(PLS-DA)与RF_AdaBoost模型进行对比, 以不同规模血液拉曼光谱数据训练集进行鉴别实验评估其性能. 结果表明, 随着训练样本的增加, RF_AdaBoost鉴别准确度最高达100%, 预测标准偏差趋于0. 与其它模型相比, RF_AdaBoost具有较高的分类准确度及较强的稳定性, 为血液种属的鉴别工作提供了新方法.  相似文献   

3.
Different classification methods (Partial Least Squares Discriminant Analysis, Extended Canonical Variates Analysis and Linear Discriminant Analysis), in combination with variable selection approaches (Forward Selection and Genetic Algorithms), were compared, evaluating their capabilities in the geographical discrimination of wine samples. Sixty‐two samples were analysed by means of dynamic headspace gas chromatography mass spectrometry (HS‐GC‐MS) and the entire chromatographic profile was considered to build the dataset. Since variable selection techniques pose a risk of overfitting when a large number of variables is used, a method for coupling data dimension reduction and variable selection was proposed. This approach compresses windows of the original data by retaining only significant components of local Principal Component Analysis models. The subsequent variable selection is then performed on these locally derived score variables. The results confirmed that the classification models achieved on the reduced data were better than those obtained on the entire chromatographic profile, with the exception of Extended Canonical Variates Analysis, which gave acceptable models in both cases. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

4.
The application of supervised pattern recognition methodology is becoming important within chemistry. The aim of the study is to compare classification method accuracies by the use of a McNemar’s statistical test. Three qualitative parameters of sugar beet are studied: disease resistance (DR), geographical origins and crop periods. Samples are analyzed by near-infrared spectroscopy (NIRS) and by wet chemical analysis (WCA). Firstly, the performances of eight well-known classification methods on NIRS data are compared: Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN) method, Soft Independent Modeling of Class Analogy (SIMCA), Discriminant Partial Least Squares (DPLS), Procrustes Discriminant Analysis (PDA), Classification And Regression Tree (CART), Probabilistic Neural Network (PNN) and Learning Vector Quantization (LVQ) neural network are computed. Among the three data sets, SIMCA, DPLS and PDA have the highest classification accuracies. LDA and KNN are not significantly different. The non-linear neural methods give the less accurate results. The three most accurate methods are linear, non-parametric and based on modeling methods. Secondly, we want to emphasize the power of near-infrared reflectance data for sample discrimination. McNemar’s tests compare classification developed with WCA or with NIRS data. For two of the three data sets, the classification results are significantly improved by the use of NIRS data.  相似文献   

5.
In the present work, the emission and the absorption spectra of numerous Greek olive oil samples and mixtures of them, obtained by two spectroscopic techniques, namely Laser-Induced Breakdown Spectroscopy (LIBS) and Absorption Spectroscopy, and aided by machine learning algorithms, were employed for the discrimination/classification of olive oils regarding their geographical origin. Both emission and absorption spectra were initially preprocessed by means of Principal Component Analysis (PCA) and were subsequently used for the construction of predictive models, employing Linear Discriminant Analysis (LDA) and Support Vector Machines (SVM). All data analysis methodologies were validated by both “k-fold” cross-validation and external validation methods. In all cases, very high classification accuracies were found, up to 100%. The present results demonstrate the advantages of machine learning implementation for improving the capabilities of these spectroscopic techniques as tools for efficient olive oil quality monitoring and control.  相似文献   

6.
Mallotus and Phyllanthus genera, both containing several species commonly used as traditional medicines around the world, are the subjects of this discrimination and classification study. The objective of this study was to compare different discrimination and classification techniques to distinguish the two genera (Mallotus and Phyllanthus) on the one hand, and the six species (Mallotus apelta, Mallotus paniculatus, Phyllanthus emblica, Phyllanthus reticulatus, Phyllanthus urinaria L. and Phyllanthus amarus), on the other. Fingerprints of 36 samples from the 6 species were developed using reversed-phase high-performance liquid chromatography with ultraviolet detection (RP-HPLC-UV). After fingerprint data pretreatment, first an exploratory data analysis was performed using Principal Component Analysis (PCA), revealing two outlying samples, which were excluded from the calibration set used to develop the discrimination and classification models. Models were built by means of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Classification and Regression Trees (CART) and Soft Independent Modeling of Class Analogy (SIMCA). Application of the models on the total data set (outliers included) confirmed a possible labeling issue for the outliers. LDA, QDA and CART, independently of the pretreatment, or SIMCA after “normalization and column centering (N_CC)” or after “Standard Normal Variate transformation and column centering (SNV_CC)” were found best to discriminate the two genera, while LDA after column centering (CC), N_CC or SNV_CC; QDA after SNV_CC; and SIMCA after N_CC or after SNV_CC best distinguished between the 6 species. As classification technique, SIMCA after N_CC or after SNV_CC results in the best overall sensitivity and specificity.  相似文献   

7.
8.
9.
10.
This paper deals with the application of a voltammetric electronic tongue (ET) towards beers classification. For this purpose, samples were analyzed using cyclic voltammetry without performing any sample pretreatment, albeit its dilution with distilled water. The voltammetric signals were first preprocessed employing Fast Fourier Transform (FFT). Then, using the obtained coefficients, responses were evaluated using three different clustering techniques: Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS‐DA) and Linear Discriminant Analysis (LDA). In this case, the ET has demonstrated a good capability to correctly discriminate and classify the different beer samples according to its type (Lager, Stout and IPA) and manufacture process (commercial and craft).  相似文献   

11.
12.
In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures.In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one.The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks’ Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees.A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.  相似文献   

13.
A fast and objective chemometric classification method is developed and applied to the analysis of gas chromatography (GC) data from five commercial gasoline samples. The gasoline samples serve as model mixtures, whereas the focus is on the development and demonstration of the classification method. The method is based on objective retention time alignment (referred to as piecewise alignment) coupled with analysis of variance (ANOVA) feature selection prior to classification by principal component analysis (PCA) using optimal parameters. The degree-of-class-separation is used as a metric to objectively optimize the alignment and feature selection parameters using a suitable training set thereby reducing user subjectivity, as well as to indicate the success of the PCA clustering and classification. The degree-of-class-separation is calculated using Euclidean distances between the PCA scores of a subset of the replicate runs from two of the five fuel types, i.e., the training set. The unaligned training set that was directly submitted to PCA had a low degree-of-class-separation (0.4), and the PCA scores plot for the raw training set combined with the raw test set failed to correctly cluster the five sample types. After submitting the training set to piecewise alignment, the degree-of-class-separation increased (1.2), but when the same alignment parameters were applied to the training set combined with the test set, the scores plot clustering still did not yield five distinct groups. Applying feature selection to the unaligned training set increased the degree-of-class-separation (4.8), but chemical variations were still obscured by retention time variation and when the same feature selection conditions were used for the training set combined with the test set, only one of the five fuels was clustered correctly. However, piecewise alignment coupled with feature selection yielded a reasonably optimal degree-of-class-separation for the training set (9.2), and when the same alignment and ANOVA parameters were applied to the training set combined with the test set, the PCA scores plot correctly classified the gasoline fingerprints into five distinct clusters.  相似文献   

14.
15.
基于地统计学与支持向量回归的QSAR建模   总被引:4,自引:0,他引:4  
基于主成分分析(PCA)、地统计学(GS)和支持向量回归(SVR), 提出了一种新的定量构效关系(QSAR)个体化预测方法——Weight-PCA-GS-SVR. 其基本思路是: 先以PCA降维并消除自变量间的信息冗余, 继以SVR经非线性主成分筛选去除与因变量无关的主成分, 再以保留主成分计算样本间的加权距离, 然后以高维GS确定公用变程; 每一个待测样本都以自身为中心从训练集中找出加权距离小于公用变程的私有k个近邻, 以SVR训练建模完成个体化预测. Weight-PCA-GS-SVR从行、列两个方向对模型进行了优化, 为自变量提供了一种新的加权方法, 为解决最优k近邻选择难题提供了新的思路, 并具有SVR原来的优点. 经3个化合物活性实例数据集验证, 新方法在所有参比模型中预测精度最高, 且明显优于文献报道结果, Weight-PCA-GS-SVR在QSAR等回归预测领域有较广泛的应用前景.  相似文献   

16.
在法庭科学实践中,往往需要通过对文件中字迹墨水的成分分析来精确地判定检材和样本文件的同一性。利用高光谱成像和分光光度技术结合化学计量法,提出了一种对喷墨打印墨水分类的方法。采集14台不同品牌、型号的四色喷墨打印墨水高光谱数据和色度值。计算出平均色度值后进行PCA降维处理和K-Means聚类分析,将样品初步分类。之后应用LightGBM模型、XGBoost模型和SVM模型共三种分类模型,以1:4的比例确定测试集和训练集,对聚类分析结果中每一类别的样品进行逐一鉴别。结果表明,LightGBM和XGBoost对四色样品的分类精度都能达到95%以上,SVM的分类精度为100%。提出的方法能够做到无损、准确、快速地将不同品牌乃至型号的喷墨打印墨水进行区分。  相似文献   

17.
基于径向基函数神经网络的高血压分类诊断系统的建立   总被引:1,自引:0,他引:1  
为研究头发中Ca,Mg,Al,Ca,Zn 5种微量元素以及w(Zn)/w(Cu)与高血压的相关性,利用径向基神经网络(RBF—NN)的函数逼近、模式识别和分类能力强以及学习速度快等特点,对微黄元素与高血压的相关性进行了研究;基于Matlab平台,对原始数据进行标准化预处理.45个作训练样本、8个作检测样本及其2个目标输出,建立了高血压分类的辅助诊断模型;同时与主成分分析法进行对照实验。结果表明,获得了最佳网络参数sc=0.1,me=43,分类准确率达到96.226%,径向基神经网络在判别分类上优于主成分分析法。可见RBF—NN在揭示头发微最元素与高血压的相关性上是可行的,为临床高血压分类诊断提供了一种新的方法。  相似文献   

18.
19.
The multi-elemental composition of three typical Italian Pecorino cheeses, Protected Designation of Origin (PDO) Pecorino Romano (PR), PDO Pecorino Sardo (PS) and Pecorino di Farindola (PF), was determined by Inductively Coupled Plasma Optical Emission Spectrometry (ICP-OES). The ICP-OES method here developed allowed the accurate and precise determination of eight major elements (Ba, Ca, Fe, K, Mg, Na, P, and Zn). The ICP-OES data acquired from 17 PR, 20 PS, and 16 PF samples were processed by unsupervised (Principal Component Analysis, PCA) and supervised (Partial Least Square-Discriminant Analysis, PLS-DA) multivariate methods. PCA revealed a relatively high variability of the multi-elemental composition within the samples of a given variety, and a fairly good separation of the Pecorino cheeses according to the geographical origin. Concerning the supervised classification, PLS-DA has allowed obtaining excellent results, both in calibration (in cross-validation) and in validation (on the external test set). In fact, the model led to a cross-validated total accuracy of 93.3% and a predictive accuracy of 91.3%, corresponding to 2 (over 23) misclassified test samples, indicating the adequacy of the model in discriminating Pecorino cheese in accordance with its origin.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号