首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
In multivariate regression and classification issues variable selection is an important procedure used to select an optimal subset of variables with the aim of producing more parsimonious and eventually more predictive models. Variable selection is often necessary when dealing with methodologies that produce thousands of variables, such as Quantitative Structure-Activity Relationships (QSARs) and highly dimensional analytical procedures.In this paper a novel method for variable selection for classification purposes is introduced. This method exploits the recently proposed Canonical Measure of Correlation between two sets of variables (CMC index). The CMC index is in this case calculated for two specific sets of variables, the former being comprised of the independent variables and the latter of the unfolded class matrix. The CMC values, calculated by considering one variable at a time, can be sorted and a ranking of the variables on the basis of their class discrimination capabilities results. Alternatively, CMC index can be calculated for all the possible combinations of variables and the variable subset with the maximal CMC can be selected, but this procedure is computationally more demanding and classification performance of the selected subset is not always the best one.The effectiveness of the CMC index in selecting variables with discriminative ability was compared with that of other well-known strategies for variable selection, such as the Wilks’ Lambda, the VIP index based on the Partial Least Squares-Discriminant Analysis, and the selection provided by classification trees.A variable Forward Selection based on the CMC index was finally used in conjunction of Linear Discriminant Analysis. This approach was tested on several chemical data sets. Obtained results were encouraging.  相似文献   

2.
《Analytica chimica acta》2004,515(1):117-125
In this work a supervised chemometric approach to the discrimination of Italian honey samples from different floral origin is presented. The analytical data of 73 Italian honey samples from six varieties (chestnut, eucalyptus, heather, sulla, honeydew, and wildflower) have been processed by Linear Discriminant Analysis (LDA), using two different variable selection procedures (Fisher F-based and stepwise LDA). Three and two variables, respectively have been necessary to obtain a 100% predictive ability as evaluated by cross-validation. Successively, a class modeling approach has been followed, using UNEQ. The resulting models showed 100% sensitivity and specificity.  相似文献   

3.
4.
Inductively Coupled Plasma Atomic Emission Spectroscopy measurements of six trace elements were performed on the scalp hair of 155 donors, 73 of which have been diagnosed with Hepatitis C and 82 Controls. Principal Components Analysis (PCA) was employed to visualise the separation between groups and show the relationship between the elements and the diseased state. Pattern recognition methods for classification involving Quadratic Discriminant Analysis and Partial Least Squares Discriminant Analysis (PLS-DA) were applied to the data. The number of significant components for both PCA and PLS were determined using the bootstrap. The stability of training set models were determined by repeatedly splitting the data into training and test sets and employing visualisation for two components models: the percent classification ability (CC), predictive ability (PA) and model stability (MS) were computed for test and training sets.  相似文献   

5.
This paper deals with the application of a voltammetric electronic tongue (ET) towards beers classification. For this purpose, samples were analyzed using cyclic voltammetry without performing any sample pretreatment, albeit its dilution with distilled water. The voltammetric signals were first preprocessed employing Fast Fourier Transform (FFT). Then, using the obtained coefficients, responses were evaluated using three different clustering techniques: Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS‐DA) and Linear Discriminant Analysis (LDA). In this case, the ET has demonstrated a good capability to correctly discriminate and classify the different beer samples according to its type (Lager, Stout and IPA) and manufacture process (commercial and craft).  相似文献   

6.
Dried and ground red pepper is a spice used as seasoning in various traditional dishes all over the world; nevertheless, the pedoclimatic conditions of the diverse cultivation areas provide different chemical characteristics, and, consequently, diverse organoleptic properties to this product. In the present study, the volatile profiles of 96 samples of two different ground bell peppers harvested in diverse Italian geographical areas, Altino (Abruzzo) and Senise (Lucania), and a commercial sweet paprika, have been studied by means of headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography-mass spectrometry (GC-MS). The investigation of their volatile profile has led to the identification of 59 analytes. Eventually, a discriminant classifier, Partial Least Squares Discriminant Analysis (PLS-DA), was exploited to discriminate samples according to their geographical origin. The model provided very accurate results in external validation; in fact, it correctly classified all the 30 test samples, achieving 100% correct classification (on the validation set). Furthermore, in order to understand which volatiles contribute the most at differentiating the bell peppers from the different origins, a variable selection approach, Variable Importance in Projection (VIP), was used. This strategy led to the selection of sixteen diverse compounds which characterize the different bell pepper spices.  相似文献   

7.
8.
This paper proposes a methodology for cigarette classification employing Near Infrared Reflectance spectrometry and variable selection. For this purpose, the Successive Projections Algorithm (SPA) is employed to choose an appropriate subset of wavenumbers for a Linear Discriminant Analysis (LDA) model. The proposed methodology is applied to a set of 210 cigarettes of four different brands. For comparison, Soft Independent Modelling of Class Analogy (SIMCA) is also employed for full-spectrum classification. The resulting SPA-LDA model successfully classified all test samples with respect to their brands using only two wavenumbers (5058 and 4903 cm−1). In contrast, the SIMCA models were not able to achieve 100% of classification accuracy, regardless of the significance level adopted for the F-test. The results obtained in this investigation suggest that the proposed methodology is a promising alternative for assessment of cigarette authenticity.  相似文献   

9.
ASTM clustering for improving coal analysis by near-infrared spectroscopy   总被引:1,自引:0,他引:1  
Andrés JM  Bona MT 《Talanta》2006,70(4):711-719
Multivariate analysis techniques have been applied to near-infrared (NIR) spectra coals to investigate the relationship between nine coal properties (moisture (%), ash (%), volatile matter (%), fixed carbon (%), heating value (kcal/kg), carbon (%), hydrogen (%), nitrogen (%) and sulphur (%)) and the corresponding predictor variables. In this work, a whole set of coal samples was grouped into six more homogeneous clusters following the ASTM reference method for classification prior to the application of calibration methods to each coal set. The results obtained showed a considerable improvement of the error determination compared with the calibration for the whole sample set. For some groups, the established calibrations approached the quality required by the ASTM/ISO norms for laboratory analysis. To predict property values for a new coal sample it is necessary the assignation of that sample to its respective group. Thus, the discrimination and classification ability of coal samples by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS) in the NIR range was also studied by applying Soft Independent Modelling of Class Analogy (SIMCA) and Linear Discriminant Analysis (LDA) techniques. Modelling of the groups by SIMCA led to overlapping models that cannot discriminate for unique classification. On the other hand, the application of Linear Discriminant Analysis improved the classification of the samples but not enough to be satisfactory for every group considered.  相似文献   

10.
Mallotus and Phyllanthus genera, both containing several species commonly used as traditional medicines around the world, are the subjects of this discrimination and classification study. The objective of this study was to compare different discrimination and classification techniques to distinguish the two genera (Mallotus and Phyllanthus) on the one hand, and the six species (Mallotus apelta, Mallotus paniculatus, Phyllanthus emblica, Phyllanthus reticulatus, Phyllanthus urinaria L. and Phyllanthus amarus), on the other. Fingerprints of 36 samples from the 6 species were developed using reversed-phase high-performance liquid chromatography with ultraviolet detection (RP-HPLC-UV). After fingerprint data pretreatment, first an exploratory data analysis was performed using Principal Component Analysis (PCA), revealing two outlying samples, which were excluded from the calibration set used to develop the discrimination and classification models. Models were built by means of Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Classification and Regression Trees (CART) and Soft Independent Modeling of Class Analogy (SIMCA). Application of the models on the total data set (outliers included) confirmed a possible labeling issue for the outliers. LDA, QDA and CART, independently of the pretreatment, or SIMCA after “normalization and column centering (N_CC)” or after “Standard Normal Variate transformation and column centering (SNV_CC)” were found best to discriminate the two genera, while LDA after column centering (CC), N_CC or SNV_CC; QDA after SNV_CC; and SIMCA after N_CC or after SNV_CC best distinguished between the 6 species. As classification technique, SIMCA after N_CC or after SNV_CC results in the best overall sensitivity and specificity.  相似文献   

11.
针对人类和非人类血液种属鉴别对无损、 高效分析方法的需求, 结合随机森林(Random Forest)和AdaBoost(Adaptive Boosting Algorithm)算法, 提出了一种血液种属鉴别方法(RF_AdaBoost). 该方法将RF作为AdaBoost的弱分类器, 以达到提高模型鉴别准确度, 增强模型鲁棒性的目的. 采用RF、 支持向量机(SVM)、 极限学习机(ELM)、 核极限学习机(KELM)、 堆栈自编码网络(SAE)、 反向传播网络(BP)、 主成分分析-线性判别法(PCA-LDA)及偏最小二乘判别分析(PLS-DA)与RF_AdaBoost模型进行对比, 以不同规模血液拉曼光谱数据训练集进行鉴别实验评估其性能. 结果表明, 随着训练样本的增加, RF_AdaBoost鉴别准确度最高达100%, 预测标准偏差趋于0. 与其它模型相比, RF_AdaBoost具有较高的分类准确度及较强的稳定性, 为血液种属的鉴别工作提供了新方法.  相似文献   

12.
人发微量元素与性别关系的模式识别分类研究   总被引:6,自引:1,他引:5  
通过对人发样品中22种元素含量的数据进行变量扩维及压缩筛选处理,选出了影响性别判断较显著的变量,用PLS法处理这些变量组成的数据,得到男性与女性分类清晰的二维判别图及预报模型,并根据所建立的预报模型及人发微量元素的含量判别人的性别,准确率为81%.  相似文献   

13.
Taking in consideration the global analysis of complex samples, proposed by the metabolomic approach, the chromatographic fingerprint encompasses an attractive chemical characterization of herbal medicines. Thus, it can be used as a tool in quality control analysis of phytomedicines. The generated multivariate data are better evaluated by chemometric analyses, and they can be modeled by classification methods. “Stone breaker” is a popular Brazilian plant of Phyllanthus genus, used worldwide to treat renal calculus, hepatitis, and many other diseases. In this study, gradient elution at reversed-phase conditions with detection at ultraviolet region were used to obtain chemical profiles (fingerprints) of botanically identified samples of six Phyllanthus species. The obtained chromatograms, at 275 nm, were organized in data matrices, and the time shifts of peaks were adjusted using the Correlation Optimized Warping algorithm. Principal Component Analyses were performed to evaluate similarities among cultivated and uncultivated samples and the discrimination among the species and, after that, the samples were used to compose three classification models using Soft Independent Modeling of Class analogy, K-Nearest Neighbor, and Partial Least Squares for Discriminant Analysis. The ability of classification models were discussed after their successful application for authenticity evaluation of 25 commercial samples of “stone breaker.”  相似文献   

14.
《Analytical letters》2012,45(13):1810-1823
Chromatographic profiles of Rhizoma et Radix Notoperygii (RRN, “Qianghuo” in Chinese), a complex traditional Chinese medicine (TCM), were collected by high-performance liquid chromatography with diode array detection (HPLC-DAD) at 330 nm. These data profiles were used as fingerprints to investigate quality control classification modeling of the RRN samples. In contrast to the classical methods for discrimination of TCMs, that is, just using common HPLC peaks, all chromatographic profile data were pre-processed by the correlation optimized warping method and polynomial functions; then, these data were submitted as fingerprints (variables) for classification on the basis of sample origin. Chemometrics methods used for calibration modeling and subsequent sample classification-least square support vector machine (LS-SVM), artificial neural network (ANN), and partial least square discriminant analysis (PLS-DA); all produced satisfactory calibrations as well as classification results.  相似文献   

15.
The aim of the present study was to characterize and classify olive oils from Western Greece according to cultivar and geographical origin, based on volatile compound composition, by means of Linear Discriminant Analysis. A total of 51 olive oil samples were collected during the harvesting period 2007-2008 from six regions of Western Greece and from six local cultivars. Forty-five of the samples were characterized as extra virgin olive oils. The analysis of volatile compounds was performed by Headspace Solid Phase Microextraction-Gas Chromatography/Mass Spectrometry (HS-SPME-GC/MS). Fifty-three (53) different volatile compounds were tentatively identified and semi-quantified. Using selected volatile compound composition data (selection was based on the application of ANOVA to total volatiles to determine those variables showing substantial differences among samples of different geographical origin/cultivar), the olive oil samples were satisfactorily classified according to geographical origin (87.2%) and cultivar (74%).  相似文献   

16.
17.
The nearest shrunken centroid (NSC) Classifier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classifier is minimized by the so‐called soft‐thresholding property of the approach. In this paper, we first show that for the two‐class prediction problem, the NSC Classifier is similar to a one‐component discriminant partial least squares (PLS) model with soft‐shrinkage of the loading weights. Then we introduce the soft‐threshold‐PLS (ST‐PLS) as a general discriminant‐PLS model with soft‐thresholding of the loading weights of multiple latent components. This method is especially suited for classification and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST‐PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST‐PLS and the NSC classifiers are applied to four real data sets. The results indicate that ST‐PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classification, and there are strong dependencies between subsets of variables. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

18.
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

19.
It is imperfect to evaluate a subsampling variable selection method using only its prediction performance. To further assess the reliability of subsampling variable selection methods, dummy noise variables of different amplitudes were augmented to the original spectral data, and the false variable selection number was recorded. The reliabilities of three subsampling variable selection methods including Monte Carlo uninformative variable elimination (MC‐UVE), competitive adaptive reweighted sampling (CARS), and stability CARS (SCARS) were evaluated using this dummy noise strategy. The evaluation results indicated that both CARS and SCARS produced more parsimonious variable sets, but the reliabilities of their final variable sets were weaker than those of MC‐UVE. On the contrary, only marginal improvement on the prediction performance was obtained using MC‐UVE. Further experiments showed that removing white noise‐like variables beforehand would improve the reliability of variables extracted by CARS and SCARS. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

20.
An electronic nose and an UV-Vis spectrophotometer, in combination with multivariate analysis, have been used to verify the geographical origin of extra virgin olive oils. Forty-six oil samples from three different areas of Liguria were included in this analysis.Initially, the data obtained from the two instruments were analysed separately. Then, the potential of the synergy between these two technologies for testing food authenticity and quality was investigated.Application of Linear Discriminant Analysis, after feature selection, was sufficient to differentiate the three geographical denominations of Liguria (“Riviera dei Fiori”, “Riviera del Ponente Savonese” and “Riviera di Levante”), obtaining 100% success in classification and close to 100% in prediction. The models built using SIMCA as a class-modelling tool, were not so effective, but confirmed that the results improve using the synergy between different analytical techniques.This paper shows that objective instrumental data related to two important organoleptic features such as oil colour and aroma, supply complementary information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号