首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 550 毫秒
1.
High throughput data are frequently observed in contemporary chemical studies. Classification through spectral information is an important issue in chemometrics. Linear discriminant analysis (LDA) fails in the large‐p‐small‐n situation for two main reasons: (1) the sample covariance matrix is singular when p > n and (2) there is an accumulation of noise in the estimation of the class centroid in high dimensional feature space. The Independence Rule is a class of methods used to overcome these drawbacks by ignoring the correlation information between spectral variables. However, a strong correlation is an essential characteristic of spectral data. We proposed a new correlation‐assisted nearest shrunken centroid classifier (CA‐NSC) to incorporate correlation information into the classification. CA‐NSC combines two sources of information [class centroid (mean) and correlation structure (variance)] to generate the classification. We used two real data analyses and a simulation study to verify our CA‐NSC method. In addition to NSC, we also performed a comparison with the soft independent modeling of class analogy (SIMCA) approach, which uses only correlation structure information for classification. The results show that CA‐NSC consistently improves on NSC and SIMCA. The misclassification rate of CA‐NSC is reduced by almost half compared with NSC in one of the real data analyses. Generally, correlation among variables will worsen the performance of NSC, even though the discriminatory information contained in the class centroid remains unchanged. If only correlation structure information is used (as in the case of SIMCA), the result will be satisfactory only when the correlation structure alone can provide sufficient information for classification. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

2.
A modified partial least squares (PLS) algorithm is presented on the basis of a novel weight updating strategy. The new weight can handle situations with directions in X space having large variance unrelated to Y , whereas the linear PLS may not work well. In the proposed algorithm, the slice transform technique is introduced to provide a piecewise linear representation of the weight vectors. Then, the corresponding mapping functions are estimated by a least square criterion of the inner relation between the observed variables and the score of response variables. At last, weight vectors are updated by the obtained mapping functions, and the corresponding scores and loadings are calculated with the new weights. An optimal piecewise linear replacements of the PLS weights are achieved by the proposed method. The predictive performances of the new approach and other methods are compared statistically using the Wilcoxon signed rank test. Experimental results show that the proposed method can achieve simpler models, whereas the model performances are at least comparable with PLS and other methods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
The well‐known Martens factorization for PLS1 produces a single y‐related score, with all subsequent scores being y‐unrelated. The X‐explanatory value of these y‐orthogonal scores can be summarized by a simple expression, which is analogous to the ‘P’ loading weights in the orthogonalized NIPALS algorithm. This can be used to rearrange the factorization into entirely y‐related and y‐unrelated parts. Systematic y‐unrelated variation can thus be removed from the X data through a single post hoc calculation following conventional PLS, without any recourse to the orthogonal projections to latent structures (OPLS) algorithm. The work presented is consistent with the development by Ergon (PLS post‐processing by similarity transformation (PLS + ST): a simple alternative to OPLS. J. Chemom. 2005; 19 : 1–4), which shows that conventional PLS and OPLS are equivalent within a similarity transform. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
We propose a new data compression method for estimating optimal latent variables in multi‐variate classification and regression problems where more than one response variable is available. The latent variables are found according to a common innovative principle combining PLS methodology and canonical correlation analysis (CCA). The suggested method is able to extract predictive information for the latent variables more effectively than ordinary PLS approaches. Only simple modifications of existing PLS and PPLS algorithms are required to adopt the proposed method. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
Different published versions of partial least squares discriminant analysis (PLS‐DA) are shown as special cases of an approach exploiting prior probabilities in the estimated between groups covariance matrix used for calculation of loading weights. With prior probabilities included in the calculation of both PLS components and canonical variates, a complete strategy for extracting appropriate decision spaces with multicollinear data is obtained. This idea easily extends to weighted linear dummy regression so that the corresponding fitted values also span the canonical space. Two different choices of prior probabilities are applied with a real dataset to illustrate the effect for the obtained decision spaces. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

6.
The issue of outer model weight updating is important in extending partial least squares (PLS) regression to modelling data that shows significant non‐linearity. This paper presents a novel co‐evolutionary component approach to the weight updating problem. Specification of the non‐linear PLS model is achieved using an evolutionary computational (EC) method that can co‐evolve all non‐linear inner models and all input projection weights simultaneously. In this method, modular symbolic non‐linear equations are used to represent the inner models and binary sequences are used to represent the projection weights. The approach is flexible, and other representations could be employed within the same co‐evolutionary framework. The potential of these methods is illustrated using a simulated pH neutralisation process data set exhibiting significant non‐linearity. It is demonstrated that the co‐evolutionary component architecture can produce results which are competitive with non‐linear neural network‐based PLS algorithms that use iterative projection weight updating. In addition, a data sampling method for mitigating overfitting to the training data is described. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

7.
This paper presents a modified version of the NIPALS algorithm for PLS regression with one single response variable. This version, denoted a CF‐PLS, provides significant advantages over the standard PLS. First of all, it strongly reduces the over‐fit of the regression. Secondly, R2 for the null hypothesis follows a Beta distribution only function of the number of observations, which allows the use of a probabilistic framework to test the validity of a component. Thirdly, the models generated with CF‐PLS have comparable if not better prediction ability than the models fitted with NIPALS. Finally, the scores and loadings of the CF‐PLS are directly related to the R2, which makes the model and its interpretation more reliable. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

8.
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent‐variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post‐processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent‐variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X‐tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three‐component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

9.
An evaluation of computational performance and precision regarding the cross‐validation error of five partial least squares (PLS) algorithms (NIPALS, modified NIPALS, Kernel, SIMPLS and bidiagonal PLS), available and widely used in the literature, is presented. When dealing with large data sets, computational time is an important issue, mainly in cross‐validation and variable selection. In the present paper, the PLS algorithms are compared in terms of the run time and the relative error in the precision obtained when performing leave‐one‐out cross‐validation using simulated and real data sets. The simulated data sets were investigated through factorial and Latin square experimental designs. The evaluations were based on the number of rows, the number of columns and the number of latent variables. With respect to their performance, the results for both simulated and real data sets have shown that the differences in run time are statistically different. PLS bidiagonal is the fastest algorithm, followed by Kernel and SIMPLS. Regarding cross‐validation error, all algorithms showed similar results. However, in some situations as, for example, when many latent variables were in question, discrepancies were observed, especially with respect to SIMPLS. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

10.
In this work, a strategy was proposed to discriminate Polygoni Multiflori Radix (PMR) and its adulterant (Cynanchi Auriculati Radix, CAR). Ultra‐high performance liquid chromatography (UHPLC) fingerprints were established to analyze samples containing PMR, CAR and mixtures simultaneously. Multivariate classification methods were applied to analyze the obtained UHPLC fingerprints, including principal component analysis (PCA), partial least square discriminant analysis (PLS‐DA), soft independent modeling of class analogy (SIMCA), support vector machine discriminant analysis (SVMDA) and counter‐propagation artificial neural network (CP‐ANN). A plot of PCA score showed that PMR and CAR samples belonged to separate clusters (PMR class and CAR class), and samples of mixtures were located near PMR or CAR classes. Analysis by PLS‐DA, SVMDA and CP‐ANN performed well for recognition and prediction in terms of PMR and CAR samples. Moreover, the PLS‐DA method performed best in the detection of adulterated samples, even if the adulterant was about 25%.  相似文献   

11.
The complexity of metabolic profiles makes chemometric tools indispensable for extracting the most significant information. Partial least‐squares discriminant analysis (PLS‐DA) acts as one of the most effective strategies for data analysis in metabonomics. However, its actual efficacy in metabonomics is often weakened by the high similarity of metabolic profiles, which contain excessive variables. To rectify this situation, particle swarm optimization (PSO) was introduced to improve PLS‐DA by simultaneously selecting the optimal sample and variable subsets, the appropriate variable weights, and the best number of latent variables (SVWL) in PLS‐DA, forming a new algorithm named PSO‐SVWL‐PLSDA. Combined with 1H nuclear magnetic resonance‐based metabonomics, PSO‐SVWL‐PLSDA was applied to recognize the patients with lung cancer from the healthy controls. PLS‐DA was also investigated as a comparison. Relatively to the recognition rates of 86% and 65%, which were yielded by PLS‐DA, respectively, for the training and test sets, those of 98.3% and 90% were offered by PSO‐SVWL‐PLSDA. Moreover, several most discriminative metabolites were identified by PSO‐SVWL‐PLSDA to aid the diagnosis of lung cancer, including lactate, glucose (α‐glucose and β‐glucose), threonine, valine, taurine, trimethylamine, glutamine, glycoprotein, proline, and lipid. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Partial Least Squares (PLS) is a wide class of regression methods aiming at modelling relationships between sets of observed variables by means of latent variables. Specifically, PLS2 was developed to correlate two blocks of data, the X‐block representing the independent or explanatory variables and the Y‐block representing the dependent or response variables. Lately, OPLS was introduced to further reduce model complexity by removing Y‐orthogonal sources of variation from X in the latent space, thus improving data interpretation through the generated predictive latent variables. Nevertheless, relationships between PLS2 and OPLS in case of multiple Y‐response have not yet been fully explored. With this perspective and taking inspiration from some basic mathematical properties of PLS2, we here present a novel and general approach consisting in a post‐transformation of PLS2 (ptPLS2), which results in a decomposition of the latent space into orthogonal and predictive components, while preserving the same goodness of fit and predictive ability of PLS2. Additionally, we discuss the application of ptPLS2 approach to two metabolomic data sets extracted from earlier published studies and its advantages in model interpretation as compared with the ‘standard’ PLS approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

13.
Representation or compression of data sets in the wavelet space is usually performed to retain the maximum variance of the original or pretreated data, like in the compression by means of principal components. In order to represent together a number of objects in the wavelet space, a common basis is required, and this common basis is usually obtained by means of the variance spectrum or of the variance wavelet tree. In this study, the use of alternative common bases is suggested, both for classification and regression problems. In the case of classification or class-modeling, the suggested common bases are based on the spectrum of the Fisher weights (a measure of the between-class to within-class variance ratio) or on the spectrum of the SIMCA discriminant weights. In the case of regression, the suggested common bases are obtained by the correlation spectrum (the correlation coefficients of the predictor variables with a response variable) or by the PLS (Partial Least Squares regression) importance of the predictors (the product between the absolute value of the regression coefficient of the predictor in the PLS model and its standard deviation). Other alternative strategies apply the Gram–Schmidt supervised orthogonalization to the wavelet coefficients. The results indicate that, both in classification and regression, the information retained after compression in the wavelets space can be more efficient than that retained with a common basis obtained by variance.  相似文献   

14.
The on‐line monitoring of batch processes based on principal component analysis (PCA) has been widely studied. Nonetheless, researchers have not paid so much attention to the on‐line application of partial least squares (PLS). In this paper, the influence of several issues in the predictive power of a PLS model for the on‐line estimation of key variables in a batch process is studied. Some of the conclusions can help to better understand the capabilities of the proposals presented for on‐line PCA‐based monitoring. Issues like the convenience of batch‐wise or variable‐wise unfolding, the method for the imputation of future measurements and the use of several sub‐models are addressed. This is the first time that the adaptive hierarchical (or multi‐block) approach is extended to the PLS modelling. Also, the formulation of the so‐called trimmed scores regression (TSR), a powerful imputation method defined for PCA, is extended for its application with PLS modelling. Data from two processes, one simulated and one real, are used to illustrate the results. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

15.
Metabolomics datasets generated by modern analytical instruments tend to be increasingly complex. In this study, a recent method named shrunken centroids regularized discriminant analysis (SCRDA) has been introduced and applied in the exploration of metabolomics dataset. It is a supervised method for variable selection, discriminant analysis and biomarker screening. By regularizing the estimate of the within‐class covariance matrix, SCRDA can deal with the singularity issue of linear discriminant analysis. Then a shrinkage estimator is applied to perform variable selection. The method presented is illustrated through the simulated datasets and three complex metabolomics datasets. Commonly used orthogonal partial least squares discriminant analysis and two other similar statistical methods, penalized linear discriminant analysis and nearest shrunken centroids, are used for comparisons. The results illustrate that SCRDA has some desirable abilities in variable selection, classification and prediction. Moreover, the biomarkers identified by SCRDA are further demonstrated to be in accordance with the biochemical research. It has been proved that SCRDA can be applied as a promising strategy in metabolomics. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
A novel projection modeling method for quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) is developed in this paper. Orthogonalization of block variables is introduced to deal with the problem of variable selection. Projections based on least squares are used to construct the modeling space in order to search for the best regression directions for chemical modeling. A suitable prediction space for such a model is further defined to confine the usage range of the model. Three real data sets were analyzed to check the performance of the proposed modeling method. The results obtained from Monte‐Carlo cross‐validation (MCCV) showed that the proposed modeling method might provide better results for QSAR and QSPR modeling than PCR and PLS with respect to both fitting and prediction abilities. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

17.
Infrared emissions (IREs) of samples of pentaerythritol tetranitrate (PETN) deposited as contamination residues on various substrates were measured to generate models for the detection and discrimination of the important nitrate ester from the emissions of the substrates. Mid‐infrared emissions were generated by heating the samples remotely using laser‐induced thermal emission (LITE). Chemometrics multivariate analysis techniques such as principal component analysis (PCA), soft independent modeling by class analogy (SIMCA), partial least squares‐discriminant analysis (PLS‐DA), support vector machines (SVMs), and neural network (NN) were employed to generate the models for the classification and discrimination of PETN IREs from substrate thermal emissions. PCA exhibited less variability for the LITE spectra of PETN/substrates. SIMCA was able to predict only 44.7% of all samples, while SVM proved to be the most effective statistical analysis routine, with a discrimination performance of 95%. PLS‐DA and NN achieved prediction accuracies of 94% and 88%, respectively. High sensitivity and specificity values were achieved for five of the seven substrates investigated. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
The insight from, and conclusions of this paper motivate efficient and numerically robust ‘new’ variants of algorithms for solving the single response partial least squares regression (PLS1) problem. Prototype MATLAB code for these variants are included in the Appendix. The analysis of and conclusions regarding PLS1 modelling are based on a rich and nontrivial application of numerous key concepts from elementary linear algebra. The investigation starts with a simple analysis of the nonlinear iterative partial least squares (NIPALS) PLS1 algorithm variant computing orthonormal scores and weights. A rigorous interpretation of the squared P ‐loadings as the variable‐wise explained sum of squares is presented. We show that the orthonormal row‐subspace basis of W ‐weights can be found from a recurrence equation. Consequently, the NIPALS deflation steps of the centered predictor matrix can be replaced by a corresponding sequence of Gram–Schmidt steps that compute the orthonormal column‐subspace basis of T ‐scores from the associated non‐orthogonal scores. The transitions between the non‐orthogonal and orthonormal scores and weights (illustrated by an easy‐to‐grasp commutative diagram), respectively, are both given by QR factorizations of the non‐orthogonal matrices. The properties of singular value decomposition combined with the mappings between the alternative representations of the PLS1 ‘truncated’ X data (including P t W ) are taken to justify an invariance principle to distinguish between the PLS1 truncation alternatives. The fundamental orthogonal truncation of PLS1 is illustrated by a Lanczos bidiagonalization type of algorithm where the predictor matrix deflation is required to be different from the standard NIPALS deflation. A mathematical argument concluding the PLS1 inconsistency debate (published in 2009 in this journal) is also presented. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

19.
Run to run (R2R) optimization based on unfolded Partial Least Squares (u‐PLS) is a promising approach for improving the performance of batch and fed‐batch processes as it is able to continuously adapt to changing processing conditions. Using this technique, the regression coefficients of PLS are used to modify the input profile of the process in order to optimize the yield. When this approach was initially proposed, it was observed that the optimization performed better when PLS was combined with a smoothing technique, in particular a sliding window filtering, which constrained the regression coefficients to be smooth. In the present paper, this result is further investigated and some modifications to the original approach are proposed. Also, the suitability of different smoothing techniques in combination with PLS is studied for both end‐of‐batch quality prediction and R2R optimization. The smoothing techniques considered in this paper include the original filtering approach, the introduction of smoothing constraints in the PLS calibration (Penalized PLS), and the use of functional analysis (Functional PLS). Two fed‐batch process simulators are used to illustrate the results. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
一种用于二类样本判别分析的PLS方法   总被引:4,自引:0,他引:4  
提出了一种新的用于两类样本判别分析问题的PLS方法,该法对响应函数y作了类似神经网络算法中用的Signoid函数转换,可用一种新的优化目标判据来提取一组PLS方法中两两正交的隐变量t1,t2...,用这些变量可构成判别分类图,并可得到比较理想的判别方向矢量。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号