首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
From the fundamental parts of PLS‐DA, Fisher's canonical discriminant analysis (FCDA) and Powered PLS (PPLS), we develop the concept of powered PLS for classification problems (PPLS‐DA). By taking advantage of a sequence of data reducing linear transformations (consistent with the computation of ordinary PLS‐DA components), PPLS‐DA computes each component from the transformed data by maximization of a parameterized Rayleigh quotient associated with FCDA. Models found by the powered PLS methodology can contribute to reveal the relevance of particular predictors and often requires fewer and simpler components than their ordinary PLS counterparts. From the possibility of imposing restrictions on the powers available for optimization we obtain an explorative approach to predictive modeling not available to the traditional PLS methods. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

2.
Metabolomics datasets generated by modern analytical instruments tend to be increasingly complex. In this study, a recent method named shrunken centroids regularized discriminant analysis (SCRDA) has been introduced and applied in the exploration of metabolomics dataset. It is a supervised method for variable selection, discriminant analysis and biomarker screening. By regularizing the estimate of the within‐class covariance matrix, SCRDA can deal with the singularity issue of linear discriminant analysis. Then a shrinkage estimator is applied to perform variable selection. The method presented is illustrated through the simulated datasets and three complex metabolomics datasets. Commonly used orthogonal partial least squares discriminant analysis and two other similar statistical methods, penalized linear discriminant analysis and nearest shrunken centroids, are used for comparisons. The results illustrate that SCRDA has some desirable abilities in variable selection, classification and prediction. Moreover, the biomarkers identified by SCRDA are further demonstrated to be in accordance with the biochemical research. It has been proved that SCRDA can be applied as a promising strategy in metabolomics. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
In this paper, fault detection and identification methods based on semi‐supervised Laplacian regularization kernel partial least squares (LRKPLS) are proposed. In Laplacian regularization learning framework, unlabeled and labeled samples are used to improve estimate of data manifold so that one can establish a more robust data model. We show that LRKPLS can avoid the over‐fitting problem which may be caused by sample insufficient and outliers present. Moreover, the proposed LRKPLS approach has no special restriction on data distribution, in other words, it can be used in the case of nonlinear or non‐Gaussian data. On the basis of LRKPLS, corresponding fault detection and identification methods are proposed. Those methods are used to monitor a numerical example and Hot Galvanizing Pickling Waste Liquor Treatment Process (HGPWLTP), and the cases study show effeteness of the proposed approaches. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
《Analytical letters》2012,45(8):920-932
Different ANNs models [Multi-layer Perceptrons (MLPs) and Radial Basis Function (RBF)] were developed and evaluated for the discrimination of olive oils produced in four Greek regions according to their geographical origin. For this purpose, ninety-seven samples were analyzed for 10 rare earth elements (REE) by ICP-MS. Moreover, two additional supervised techniques, discriminant analysis (DA) and classification trees (CTs), were applied to the same set for the data pre-treatment and for comparison purposes. In addition, two approaches were used for models' training and evaluation: the classical random choice of samples for the learning data set and an innovative one, which used the two linear discriminant functions (LDFs) of the preceding DA to choose the most representative learning sample set. The results were very satisfactory for the new ANNs classifiers. Over-fitting phenomena were overcome and the prediction ability was 73%, as evaluated by an independent test sample set. The results are encouraging for the ANNs efficiency even in demanding data bases, as the one under consideration.

[Supplementary materials are available for this article. Go to the publisher's online edition of Analytical Letters for the following free supplemental resources: Additional figures and tables.]  相似文献   

5.
Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion of maximizing the concordance may not be appropriate in certain applications. In practice, some applications may emphasize high sensitivity (e.g., clinical diagnostic tests) and others may emphasize high specificity (e.g., epidemiology screening studies). This paper considers effects of the decision threshold on sensitivity, specificity, and concordance for four classification methods: logistic regression, classification tree, Fisher's linear discriminant analysis, and a weighted k-nearest neighbor. We investigated the use of decision threshold adjustment to improve performance of either sensitivity or specificity of a classifier under specific conditions. We conducted a Monte Carlo simulation showing that as the decision threshold increases, the sensitivity decreases and the specificity increases; but, the concordance values in an interval around the maximum concordance are similar. For specified sensitivity and specificity levels, an optimal decision threshold might be determined in an interval around the maximum concordance that meets the specified requirement. Three example data sets were analyzed for illustrations.  相似文献   

6.
Osteoarthritis (OA) is an insidious joint disease that gradually leads to cartilage loss and the morphological impairment of other joint tissues. Therefore, early diagnosis and timely therapeutic intervention are of importance. Although there are a few diagnostic techniques used in clinics, these methods have various drawbacks. Infrared spectroscopy has emerged as an important analytical technique with wide applications in a variety of areas including clinical diagnosis. Research has shown that the presence of OA is associated with biochemical changes that are presumed to be reflected in serum or joint fluid. Hence, OA may be detected provided that serum or joint fluid is measured by infrared spectroscopy and appropriate data analysis methods are used to extract the diagnostic information from the infrared spectra. In this work, 5 discrimination and classification methods ([1] principal component analysis coupled with linear discriminant analysis, [2] principal component analysis coupled with multiple logistic regression, [3] partial least squares discriminant analysis, [4] regularized linear discriminant analysis, and [5] support vector machine) were used to build OA diagnostic models based on mid‐infrared spectra of serum and joint fluid. Useful diagnostic models were developed, indicating that infrared spectroscopy coupled with multivariate data analysis methods is very promising as a simple and accurate approach for OA diagnosis. The results also showed that models built from the 5 methods were different, as were the models' predictive performances. Therefore, choice of appropriate data analysis methods in model development should be taken into account.  相似文献   

7.
Desorption electrospray ionization mass spectrometry (DESI‐MS) and easy ambient sonic‐spray ionization mass spectrometry (EASI‐MS) are employed here in the forensic analysis of chemical compounds found in condoms and relative traces, and their analytical performances are compared. Statistical analysis of data obtained from mass spectra only was applied in order to obtain classification rules for distinguishing ten types of condoms. In particular, two supervised chemometric techniques [linear discriminant analysis (LDA) and soft independent modeling of class analogy (SIMCA)] were carried out on absolute and relative intensity values to test the performances of statistical models in terms of predictive capacity. The achieved classification of samples was excellent because of the high prediction percentages of the method used both for DESI and EASI mass spectrometry analyses, confirming these two as potential ambient ionization techniques for forensic analyses in case of sexual assault crimes. EASI‐MS showed 99% prediction ability for LDA using relative data and 100% prediction ability for SIMCA using both absolute and relative ones, while DESI showed 94% prediction ability for both LDA and SIMCA. The absence of any sample preparation technique gives advantages in terms of sample preservation and reduced contamination, allowing successive analyses to be performed on the same sample by other techniques. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

8.
Direct‐injection mass spectrometry (DIMS) techniques have evolved into powerful methods to analyse volatile organic compounds (VOCs) without the need of chromatographic separation. Combined to chemometrics, they have been used in many domains to solve sample categorization issues based on volatilome determination. In this paper, different DIMS methods that have largely outperformed conventional electronic noses (e‐noses) in classification tasks are briefly reviewed, with an emphasis on food‐related applications. A particular attention is paid to proton transfer reaction mass spectrometry (PTR‐MS), and many results obtained using the powerful PTR‐time of flight‐MS (PTR‐ToF‐MS) instrument are reviewed. Data analysis and feature selection issues are also summarized and discussed. As a case study, a challenging problem of classification of dark chocolates that has been previously assessed by sensory evaluation in four distinct categories is presented. The VOC profiles of a set of 206 chocolate samples classified in the four sensory categories were analysed by PTR‐ToF‐MS. A supervised multivariate data analysis based on partial least squares regression‐discriminant analysis allowed the construction of a classification model that showed excellent prediction capability: 97% of a test set of 62 samples were correctly predicted in the sensory categories. Tentative identification of ions aided characterisation of chocolate classes. Variable selection using dedicated methods pinpointed some volatile compounds important for the discrimination of the chocolates. Among them, the CovSel method was used for the first time on PTR‐MS data resulting in a selection of 10 features that allowed a good prediction to be achieved. Finally, challenges and future needs in the field are discussed.  相似文献   

9.
In chemometrics, the supervised and unsupervised classification of high‐dimensional data has become a recurrent problem. Model‐based techniques for discriminant analysis and clustering are popular tools, which are renowned for their probabilistic foundations and their flexibility. However, classical model‐based techniques show a disappointing behaviour in high‐dimensional spaces, which up to now have been limited in their use within chemometrics. The recent developments in model‐based classification overcame these drawbacks and enabled the efficient classification of high‐dimensional data, even in the ‘small n / large p’ condition. This work presents a comprehensive review of these recent approaches, including regularization‐based techniques, parsimonious modelling, subspace classification methods and classification methods based on variable selection. The use of these model‐based methods is also illustrated on real‐world classification problems in chemometrics using R packages. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
11.
Multivariate classification methods were used to evaluate data on the concentrations of eight metals in human senile lenses measured by atomic absorption spectrometry. Principal components analysis and hierarchical clustering separated senile cataract lenses, nuclei from cataract lenses, and normal lenses into three classes on the basis of the eight elements. Stepwise discriminant analysis was applied to give discriminant functions with five selected variables. Results provided by the linear learning machine method were also satisfactory; the k-nearest neighbour method was less useful.  相似文献   

12.
The Taft-Kamlet-Abboud hydrogen-bond acidity, hydrogen-bond basicity and polarity-polarizability are widely used as empirical characteristics of solvent-solute interactions. These solvatochromic parameters are determined from the absorption band positions of solvatochromic probes in the standard medium and in the medium under study. The practice of solvatochromic probing is growing rapidly, and the values of solvatochromic parameters are refined from time to time. As these values are rather close for many media, the classification of media based on these values can be tedious. This increases the choice of algorithms that can be employed in order to decrease the ambiguity of classification. The classification algorithms stable to small variations of solvatochromic parameters are of special interest. The artificial neural networks (ANN) proved to be a powerful tool for the supervised classification. The paper focuses on the search of optimal parameters of probabilistic, dynamic, Elman, feed-forward, and cascade ANN for the classification of solvent on the basis of their solvatochromic characteristics. Also, the influence of data variation on the stability of classification is examined. The dynamic and probabilistic neural networks have been found to be error-free and stable; they have significantly become such a common tool for supervised classification as linear discriminant analysis.   相似文献   

13.
A chemometric study was carried out to characterize three ionic liquid types (ILs) with hexacationic imidazolium, polymeric imidazolium, and phosphonium cationic cores, using a range of contra-anions such as halogens, thiocyanate, boron anions, triflate, and bistriflimide. The solvation parameter model developed by Abraham et al., unsupervised techniques as cluster analysis (CA), and supervised techniques as linear discriminant analysis (LDA), step-LDA, quadratic discriminant analysis (QDA), and multivariate regression techniques as discriminant partial least squares (D-PLS), or multiple linear regression (MLR) were used to characterize the functionalized ILs above. CA established two main groups of phases, those with an acidic H-bond and those with basic ones. Once detected, the two natural groups, a linear and quadratic delimiters with good classification (>96 %) and prediction (>92 %) capacities were computed. The use of step-LDA technique allowed us to establish that a, b, and s solvation parameters were the most discriminant variables. These variables were used for modeling purposes, and a D-PLS and MLR models were constructed using a binary response. The explained variance of categorical variable by the model validated by cross-validation was 65 %, and 94.5 % of ILs were correctly predicted. IL characterization carried out would allow the appropriate selection of phases for gas chromatography (GC).  相似文献   

14.
Authenticity is an important food quality criterion and rapid methods to guarantee it are widely demanded by food producers, processors, consumers and regulatory bodies. The objective of this work was to develop a classification system in order to confirm the authenticity of Galician potatoes with a Certified Brand of Origin and Quality (CBOQ) 'Denominación Específica: Patata de Galicia' and to differentiate them from other potatoes that did not have this CBOQ. Ten selected metals were determined by atomic spectroscopy in 102 potato samples which were divided into two categories: CBOQ and non-CBOQ potatoes. Multivariate chemometric techniques, such as cluster analysis and principal component analysis, were applied to perform a preliminary study of the data structure. Four supervised pattern recognition procedures [including linear discriminant analysis (LDA), K-nearest neighbours (KNN), soft independent modelling of class analogy (SIMCA) and multilayer feed-forward neural networks (MLF-ANN)] were used to classify samples into the two categories considered on the basis of the chemical data. Results for LDA, KNN and MLF-ANN are acceptable for the non-CBOQ class, whereas SIMCA showed better recognition and prediction abilities for the CBOQ class. A more sophisticated neural network approach performed by the combination of the self-organizing with adaptive neighbourhood network (SOAN) and MLF network was employed to optimize the classification. Using this combined method, excellent performance in terms of classification and prediction abilities was obtained for the two categories with a success rate ranging from 98 to 100%. The metal profiles provided sufficient information to enable classification rules to be developed for identifying potatoes according to their origin brand based on SOAN-MLF neural networks.  相似文献   

15.
To extract discriminant information from analytical data, results from eight conventional biochemical tests of liver function and from determinations of two serum bile acids are studied by supervised pattern recognition methods. The population comprised healthy subjects and seven groups of people affected by different liver diseases. The principal components, linear discriminant, k nearest neighbours and Bayesian methods were applied. Because the prediction ability computed on the whole data set was poor, the problem was simplified by dividing the data set into three subsets, each comprising two liver diseases which were contiguous and overlapped in the hyperspace of variables. The prediction ability of the Bayesian method reached 96% at best, 75% at minimum, in the three subsets. Best performance was achieved in distinguishing between healthy subjects and those with mild liver diseases on the basis of four biochemical assays.  相似文献   

16.
Insomnia, depression, and Alzheimer's disease are all neurodegenerative diseases and are associated with the levels of steroid hormones. To investigate the internal connection and difference of steroid hormones among these three diseases and distinguish them from the perspective of biomarkers, an easy, quick, and efficient high‐performance liquid chromatography with tandem mass spectrometry method was established and validated to determine six steroid hormones simultaneously in rat serum. The separation was accomplished on a SHIM‐PACK XR‐ODS chromatographic column with 0.1% v/v formic acid and methanol as the mobile phase and the detection was performed with electrospray ionization source in the positive ion mode. Based on the concentrations of steroid hormones, all the groups could be distinguished obviously from each other by using partial least square discriminant analysis. Meanwhile, 11‐deoxycortisol, corticosterone, and cortisol were identified as potential biomarkers and 100% of samples were classified correctly by Bayes’ discriminant function. These biomarkers were further screened by one‐way analysis of variance and cortisol was significantly different among all these groups. Bayes’ discriminant function was also built by cortisol and the classification accuracy was 87.2%. This workflow including determination of steroid hormones and discrimination among three neurological diseases would provide a basis for further clinical studies.  相似文献   

17.
A metabolomics method based on ultra high performance liquid chromatography with quadrupole time‐of‐flight mass spectrometry was developed to evaluate the influence of processing times on the quality of raw and processed Polygoni Multiflora Radix . Principal component analysis and partial least‐squares discriminant analysis was used to screen the potential maker metabolites that were contributed to the quality changes. Then these marker metabolites were selected as variables in Fisher's discriminant analysis to establish the models that were used to distinguish the raw and processed Polygoni Multiflora Radix in the markets. Additionally, 36 compounds were identified. Twelve raw Polygoni Multiflora Radix samples and 23 processed Polygoni Multiflora Radix samples were distinguished. The results showed that the 12 raw Polygoni Multiflora Radix samples belonged to the group of processing time of 0 h, and two processed Polygoni Multiflora Radix samples were part of the group of processing times of 4 h, 12 samples belonged to group of processing times of 8 to 16 h, and nine samples were the group of processing times of 24 to 48 h. The results demonstrated that the method could provide scientific support for the processing standardization of Polygoni Multiflora Radix .  相似文献   

18.
Shen H  Carter JF  Brereton RG  Eckers C 《The Analyst》2003,128(3):287-292
Wet granulation and direct compression are two processes employed in tablet preparation. In this paper, pyrolysis-gas chromatography-mass spectrometry (Py-GC-MS) is used to discriminate these processes with the help of chemometric techniques. The data analysis procedure is as follows. First, deconvolute the Py-GC-MS data of each sample into concentration profiles and spectra, and then construct a matrix with each compound corresponding to one column; those contained only in a small number of samples are then removed. Second, the main principal components are kept after excluding three variables and one sample, and further processed by Fisher discriminant analysis. Third, the resultant data are assigned to classes using unsupervised and supervised classification methods. Results from cross-validation show that only 3 of 20 samples are misclassified by the Mahalanobis distance measure.  相似文献   

19.
A 400‐MHz 1H nuclear magnetic resonance (NMR) spectroscopy and multivariate data analysis were used in the context of food surveillance to discriminate 46 authentic rice samples according to type. It was found that the optimal sample preparation consists of preparing aqueous rice extracts at pH 1.9. For the first time, the chemometric method independent component analysis (ICA) was applied to differentiate clusters of rice from the same type (Basmati, non‐Basmati long‐grain rice, and round‐grain rice) and, to a certain extent, their geographical origin. ICA was found to be superior to classical principal component analysis (PCA) regarding the verification of rice authenticity. The chemical shifts of the principal saccharides and acetic acid were found to be mostly responsible for the observed clustering. Among classification methods (linear discriminant analysis, factorial discriminant analysis, partial least squares discriminant analysis (PLS‐DA), soft independent modeling of class analogy, and ICA), PLS‐DA and ICA gave the best values of specificity (0.96 for both methods) and sensitivity (0.94 for PLS‐DA and 1.0 for ICA). Hence, NMR spectroscopy combined with chemometrics could be used as a screening method in the official control of rice samples. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

20.
When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号