首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 162 毫秒
1.
A rapid Raman spectroscopy protocol is reported to classify gasoline according to its distributor and to identify and quantify common adulterants. Gasoline from three distributors was collected from 19 stations in São Paulo, Brazil. Principal component analysis (PCA) showed specific clusters for each distributor, and partial least squares discriminant analysis (PLS-DA) correctly identified the origin of the samples. To evaluate the technique for the identification and quantification of the adulterants, authentic samples from each distributor were fortified at levels from 2.5 up to 25.0% (v/v) using ethanol, methanol, toluene, and turpentine to obtain 120 altered samples. PCA showed clear separation among the samples with the adulterants and PLS-DA precisely identified the adulterants (478 in 480 predictions by cross-validation), irrespective of the distributor and the concentration. One classification model was used to characterize all distributors. To quantify the adulterants, 36 multivariate calibration models were constructed using partial least squares (PLS), interval PLS, and PLS genetic algorithm for each distributor and for each adulterant. Cross-validation errors of less than 5.0% were obtained for all adulterants regardless of the distributor. Raman spectroscopy and multivariate analysis were shown to be powerful for rapid and inexpensive for the characterization of gasoline origin and the identification and quantification of common adulterants.  相似文献   

2.
Li-Juan Tang  Hai-Long Wu 《Talanta》2009,79(2):260-1694
One problem with discriminant analysis of microarray data is representation of each sample by a large number of genes that are possibly irrelevant, insignificant or redundant. Methods of variable selection are, therefore, of great significance in microarray data analysis. To circumvent the problem, a new gene mining approach is proposed based on the similarity between probability density functions on each gene for the class of interest with respect to the others. This method allows the ascertainment of significant genes that are informative for discriminating each individual class rather than maximizing the separability of all classes. Then one can select genes containing important information about the particular subtypes of diseases. Based on the mined significant genes for individual classes, a support vector machine with local kernel transform is constructed for the classification of different diseases. The combination of the gene mining approach with support vector machine is demonstrated for cancer classification using two public data sets. The results reveal that significant genes are identified for each cancer, and the classification model shows satisfactory performance in training and prediction for both data sets.  相似文献   

3.
Four cultivars of olives picked up in the Moroccan region of Beni Mellal were subjected to a characterization and classification study. Analytical data were collected by Fourier transform infrared spectroscopy (FTIR), applied on the mesocarp of the fresh olives without any preliminary treatment. The spectral data were pre-treated by derivative elaboration based on the Savitzky-Golay algorithm to reduce noise and increase analytical information. Partial least squares discriminant analysis (PLS-DA) was performed to elaborate the measurement data and assess the discriminant features of the four cultivars. The PLS model was optimized by applying the Martens’ uncertainty test which provided to select the vibrational frequencies giving the most useful information. The optimized model resulted able to separate the four classes and classify new objects into the appropriate defined classes with a percentage prediction of 97%. The proposed method represents a real novelty to classify olives of different varieties by means of a rapid, inexpensive and reliable procedure.  相似文献   

4.
Unbiased evaluation of classification and calibration methods is important, especially as these methods are applied to increasingly complex data sets that are under-determined. Precision bounds, such as confidence intervals, are required for interpreting any experimental result. Using bootstrapped Latin partitions to evaluate classification and calibration models, bounds on the average predictions were obtained. These bounds characterize sources of variation attributed to building the model and the composition of the training set with respect to the test set. Furthermore, precision bounds on the average of the model-variable loadings allow the significance of characteristic features to be estimated. The procedure for bootstrapped Latin partitions is given and demonstrated with synthetic data sets for classification using linear discriminant analysis and fuzzy rule-building expert systems, and for calibration using partial least squares regression with one and three properties. All analyses were implemented on a personal computer with the longest evaluation requiring 6-h processing time. Analysis of variance and matched sample t-tests were also used to demonstrate the statistical power of these tests.  相似文献   

5.
This work describes multi-classification based on binary probabilistic discriminant partial least squares (p-DPLS) models, developed with the strategy one-against-one and the principle of winner-takes-all. The multi-classification problem is split into binary classification problems with p-DPLS models. The results of these models are combined to obtain the final classification result. The classification criterion uses the specific characteristics of an object (position in the multivariate space and prediction uncertainty) to estimate the reliability of the classification, so that the object is assigned to the class with the highest reliability. This new methodology is tested with the well-known Iris data set and a data set of Italian olive oils. When compared with CART and SIMCA, the proposed method has better average performance of classification, besides giving a statistic that evaluates the reliability of classification. For the olive oil set the average percentage of correct classification for the training set was close to 84% with p-DPLS against 75% with CART and 100% with SIMCA, while for the test set the average was close to 94% with p-DPLS as against 50% with CART and 62% with SIMCA.  相似文献   

6.
A large suite of natural carbonate, fluorite and silicate geological materials was studied using laser-induced breakdown spectroscopy (LIBS). Both single- and double-pulse LIBS spectra were acquired using close-contact benchtop and standoff (25 m) LIBS systems. Principal components analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were used to identify the distinguishing characteristics of the geological samples and to classify the materials. Excellent discrimination was achieved with all sample types using PLS-DA and several techniques for improving sample classification were identified. The laboratory double-pulse LIBS system did not provide any advantage for sample classification over the single-pulse LIBS system, except in the case of the soil samples. The standoff LIBS system provided comparable results to the laboratory systems. This work also demonstrates how PCA can be used to identify spectral differences between similar sample types based on minor impurities.  相似文献   

7.
Gastrodia elata from different geographical origins varies in quality and pharmacological activity. This study focused on the classification and identification of Gastrodia elata from six producing areas using high‐performance liquid chromatography fingerprint combined with boosting partial least‐squares discriminant analysis. Before recognition analysis, a principal component analysis was applied to ascertain the discrimination possibility with high‐performance liquid chromatography fingerprints. And then, boosting partial least‐squares discriminant analysis and conventional partial least‐squares discriminant analysis were applied in this study. Experimental results indicated that the adaptive iteratively reweighted penalized least‐squares algorithm could eliminate the baseline drift of high‐performance liquid chromatography chromatograms effectively. And compared with partial least‐squares discriminant analysis, the total recognition rates using high‐performance liquid chromatography fingerprint combined with boosting partial least‐squares discriminant analysis for the calibration sets and prediction sets were improved from 94 to 100% and 86 to 97%, respectively. In conclusion, high‐performance liquid chromatography combined with boosting partial least‐squares discriminant analysis, which has such advantages as effective, specific, accurate, non‐polluting, has an edge for discrimination of traditional Chinese medicine from different geographical origins. And the proposed methodology is a useful tool to classify and identify Gastrodia elata from different geographical origins.  相似文献   

8.
Automotive fuel adulteration is an old and significant problem. One common type of fuel adulteration is the addition of diesel to gasoline. Unsupervised models were developed through hierarchical cluster and principal component analysis models. Supervised models through partial least square discriminant analysis using 1H nuclear magnetic resonance spectra as the input were used to classify samples as adulterated or unadulterated. Quantitative models were developed using partial least squares to determine the gasoline and diesel concentrations in the samples. This set contained samples composed of pure gasoline and anhydrous ethanol reproducing commercial gasoline and other samples treated with diesel. Hierarchical cluster and principal component analysis did not distinguish between adulterated and unadulterated samples except for the most adulterated materials. However, partial least square discriminant analysis classified 100% of the samples correctly. The partial least square algorithm provided excellent regression models for the gasoline and diesel content. The determination coefficient was 0.9920 for both models, whereas the root mean square error of cross-validation and root mean square error of prediction for the diesel model were 2.32 and 1.42%, respectively, and 2.40 and 1.38% for the gasoline model.  相似文献   

9.
10.
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swarm optimization (PSO) and tabu search (HPSOTS) approach for gene selection for tumor classification. The incorporation of tabu search (TS) as a local improvement procedure enables the algorithm HPSOTS to overleap local optima and show satisfactory performance. The proposed approach is applied to three different microarray data sets. Moreover, we compare the performance of HPSOTS on these datasets to that of stepwise selection, the pure TS and PSO algorithm. It has been demonstrated that the HPSOTS is a useful tool for gene selection and mining high dimension data.  相似文献   

11.
Headspace solid-phase microextraction (HS-SPME) coupled with gas chromatography (GC) and multivariate data analysis were applied to classify different vinegar types (white and red, balsamic, sherry and cider vinegars) on the basis of their volatile composition. The collected chromatographic signals were analysed using the stepwise linear discriminant analysis (SLDA) method, thus simultaneously performing feature selection and classification. Several options, more or less restrictive according to the final number of considered categories, were explored in order to identify the one that afforded highest discrimination ability. The simplicity and effectiveness of the classification methodology proposed in the present study (all the samples were correctly classified and predicted by cross-validation) are promising and encourage the feasibility of using a similar strategy to evaluate the quality and origin of vinegar samples in a reliable, fast, reproducible and cost-efficient way in routine applications. The high quality results obtained were even more remarkable considering the reduced number of discriminant variables finally selected by the stepwise procedure. The use of only 14 peaks enabled differentiation between cider, balsamic, sherry and wine vinegars, whereas only 3 variables were selected to discriminate between red (RW) and white wine (WW) vinegars. The subsequent identification by gas chromatography-mass spectrometry (GC-MS) of the volatile compounds associated with the discriminant peaks selected in the classification process served to interpret their chemical significance.  相似文献   

12.
13.
This article discusses problems of validating classification models especially in datasets where sample sizes are small and the number of variables is large. It describes the use of percentage correctly classified (%CC) as an indicator for success of a classification model. For small datasets, %CC should not be used uncritically and its interpretation depends on sample size. It illustrates the use of a common classification method, discriminant partial least squares (D-PLS) on a randomly generated dataset of 200 samples and 200 variables.

An aim of the classifier is to determine whether the null hypothesis (there is no distinction between two classes) can be rejected. Autoprediction gives an 84.5% CC. It is shown that, if there is variable selection, it must be performed independently on the training set to obtain a CC close to 50% on the test set; otherwise, over-optimistic and false conclusions can be reached about the ability to classify samples into groups.

Finally, two aims of determining the quality of a model are frequently confused, namely optimisation (often used to determine the most appropriate number of components in a model) and independent validation; to overcome this, the data should be split into three groups.

There are often difficulties with model building if validation and optimisation have been done on different groups of samples, especially using iterative methods, each group being modelled using properties, such as a different number of components or different variables.  相似文献   


14.
Laser-induced breakdown spectroscopy has been used to obtain spectral fingerprints from live bacterial specimens from thirteen distinct taxonomic bacterial classes representative of five bacterial genera. By taking sums, ratios, and complex ratios of measured atomic emission line intensities three unique sets of independent variables (models) were constructed to determine which choice of independent variables provided optimal genus-level classification of unknown specimens utilizing a discriminant function analysis. A model composed of 80 independent variables constructed from simple and complex ratios of the measured emission line intensities was found to provide the greatest sensitivity and specificity. This model was then used in a partial least squares discriminant analysis to compare the performance of this multivariate technique with a discriminant function analysis. The partial least squares discriminant analysis possessed a higher true positive rate, possessed a higher false positive rate, and was more effective at distinguishing between highly similar spectra from closely related bacterial genera. This suggests it may be the preferred multivariate technique in future species-level or strain-level classifications.  相似文献   

15.
Serum and urine samples from patients with type 2 diabetes mellitus and control samples were analyzed by UPLC-TOF-MS; fast and slow separation gradients were compared using both positive and negative ionization modes. The resulting data were analyzed using partial least squares discriminant analysis (PLS-DA), and models were developed to differentiate between patient and control samples. The models were evaluated using external test sets to classify their predictive ability. Under both fast and slow gradient conditions, the PLS-DA models generated using serum samples were more robust than those generated using urine samples, and the positive ionization mode produced better differentiation and higher classification rates than negative ionization mode. In addition, fast gradient conditions were found to have a comparable ability for differentiation to slow gradient conditions.  相似文献   

16.
The possibility of devising a simple, flexible and accurate non-linear classification method, by extending the locally weighted partial least squares (LW–PLS) approach to the cases where the algorithm is used in a discriminant way (partial least squares discriminant analysis, PLS-DA), is presented. In particular, to assess which category an unknown sample belongs to, the proposed algorithm operates by identifying which training objects are most similar to the one to be predicted and building a PLS-DA model using these calibration samples only. Moreover, the influence of the selected training samples on the local model can be further modulated by adopting a not uniform distance-based weighting scheme which allows the farthest calibration objects to have less impact than the closest ones.  相似文献   

17.
Classification problems have received considerable attention in biological and medical applications. In particular, classification methods combining to microarray technology play an important role in diagnosing and predicting disease, such as cancer, in medical research. Primary objective in classification is to build an optimal classifier based on the training sample in order to predict unknown class in the test sample. In this paper, we propose a unified approach for optimal gene classification with conjunction with functional principal component analysis (FPCA) in functional data analysis (FNDA) framework to classify time-course gene expression profiles based on information from the patterns. To derive an optimal classifier in FNDA, we also propose to find optimal number of bases in the smoothing step and functional principal components in FPCA using a cross-validation technique, and compare the performance of some popular classification techniques in the proposed setting. We illustrate the propose method with a simulation study and a real world data analysis.  相似文献   

18.
19.
Using a series of thirteen organic materials that includes novel high-nitrogen energetic materials, conventional organic military explosives, and benign organic materials, we have demonstrated the importance of variable selection for maximizing residue discrimination with partial least squares discriminant analysis (PLS-DA). We built several PLS-DA models using different variable sets based on laser induced breakdown spectroscopy (LIBS) spectra of the organic residues on an aluminum substrate under an argon atmosphere. The model classification results for each sample are presented and the influence of the variables on these results is discussed. We found that using the whole spectra as the data input for the PLS-DA model gave the best results. However, variables due to the surrounding atmosphere and the substrate contribute to discrimination when the whole spectra are used, indicating this may not be the most robust model. Further iterative testing with additional validation data sets is necessary to determine the most robust model.  相似文献   

20.
The ability of multivariate analysis methods such as hierarchical cluster analysis, principal component analysis and partial least squares-discriminant analysis (PLS-DA) to achieve olive oil classification based on the olive fruit varieties from their triacylglycerols profile, have been investigated. The variations in the raw chromatographic data sets of 56 olive oil samples were studied by high-temperature gas chromatography with (ion trap) mass spectrometry detection. The olive oil samples were of four different categories (“extra-virgin olive oil”, “virgin olive oil”, “olive oil” and “olive-pomace” oil), and for the “extra-virgin” category, six different well-identified olive oil varieties (“hojiblanca”, “manzanilla”, “picual”, “cornicabra”, “arbequina” and “frantoio”) and some blends of unidentified varieties. Moreover, by pre-processing methods of chemometric (to linearise the response of the variables) such as peak-shifting, baseline (weighted least squares) and mean centering, it was possible to improve the model and grouping between different varieties of olive oils. By using the first three principal components, it was possible to account for 79.50% of the information on the original data. The fitted PLS-DA model succeeded in classifying the samples. Correct classification rates were assessed by cross-validation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号