首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 102 毫秒
1.
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single‐response case. The present paper presents an approach to identification of the complete y ‐orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non‐deflating OPLS algorithms and a modified principal component analysis (PCA)‐driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the non‐orthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a ‘built‐in’ property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non‐existing. The PCA‐driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix X orth with y ‐orthogonal columns and a rank of one less than the rank of X . The desired truncation corresponds exactly to the first X deflation step of Martens non‐orthogonal PLS algorithm. The significant y ‐orthogonal structure of X found by PCA of X orth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non‐orthogonal algorithm into an efficient dual PLS1–OPLS algorithm. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

2.
The well‐known Martens factorization for PLS1 produces a single y‐related score, with all subsequent scores being y‐unrelated. The X‐explanatory value of these y‐orthogonal scores can be summarized by a simple expression, which is analogous to the ‘P’ loading weights in the orthogonalized NIPALS algorithm. This can be used to rearrange the factorization into entirely y‐related and y‐unrelated parts. Systematic y‐unrelated variation can thus be removed from the X data through a single post hoc calculation following conventional PLS, without any recourse to the orthogonal projections to latent structures (OPLS) algorithm. The work presented is consistent with the development by Ergon (PLS post‐processing by similarity transformation (PLS + ST): a simple alternative to OPLS. J. Chemom. 2005; 19 : 1–4), which shows that conventional PLS and OPLS are equivalent within a similarity transform. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
Nine PLS1 algorithms were evaluated, primarily in terms of their numerical stability, and secondarily their speed. There were six existing algorithms: (a) NIPALS by Wold; (b) the non‐orthogonalized scores algorithm by Martens; (c) Bidiag2 by Golub and Kahan; (d) SIMPLS by de Jong; (e) improved kernel PLS by Dayal; and (f) PLSF by Manne. Three new algorithms were created: (g) direct‐scores PLS1 based on a new recurrent formula for the calculation of basis vectors yielding scores directly from X and y; (h) Krylov PLS1 with its regression vector defined explicitly, using only the original X and y; (i) PLSPLS1 with its regression vector recursively defined from X and the regression vectors of its previous recursions. Data from IR and NIR spectrometers applied to food, agricultural, and pharmaceutical products were used to demonstrate the numerical stability. It was found that three methods (c, f, h) create regression vectors that do not well resemble the corresponding precise PLS1 regression vectors. Because of this, their loading and score vectors were also concluded to be deviating, and their models of X and the corresponding residuals could be shown to be numerically suboptimal in a least squares sense. Methods (a, b, e, g) were the most stable. Two of them (e, g) were not only numerically stable but also much faster than methods (a, b). The fast method (d) and the moderately fast method (i) showed a tendency to become unstable at high numbers of PLS factors. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

4.
The issue of outer model weight updating is important in extending partial least squares (PLS) regression to modelling data that shows significant non‐linearity. This paper presents a novel co‐evolutionary component approach to the weight updating problem. Specification of the non‐linear PLS model is achieved using an evolutionary computational (EC) method that can co‐evolve all non‐linear inner models and all input projection weights simultaneously. In this method, modular symbolic non‐linear equations are used to represent the inner models and binary sequences are used to represent the projection weights. The approach is flexible, and other representations could be employed within the same co‐evolutionary framework. The potential of these methods is illustrated using a simulated pH neutralisation process data set exhibiting significant non‐linearity. It is demonstrated that the co‐evolutionary component architecture can produce results which are competitive with non‐linear neural network‐based PLS algorithms that use iterative projection weight updating. In addition, a data sampling method for mitigating overfitting to the training data is described. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

5.
This paper presents a modified version of the NIPALS algorithm for PLS regression with one single response variable. This version, denoted a CF‐PLS, provides significant advantages over the standard PLS. First of all, it strongly reduces the over‐fit of the regression. Secondly, R2 for the null hypothesis follows a Beta distribution only function of the number of observations, which allows the use of a probabilistic framework to test the validity of a component. Thirdly, the models generated with CF‐PLS have comparable if not better prediction ability than the models fitted with NIPALS. Finally, the scores and loadings of the CF‐PLS are directly related to the R2, which makes the model and its interpretation more reliable. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

6.
Ideally, the score vectors numerically computed by an orthogonal scores partial least squares (PLS) algorithm should be orthogonal close to machine precision. However, this is not ensured without taking special precautions. The progressive loss of orthogonality with increasing number of components is illustrated for two widely used PLS algorithms, i.e., one that can be considered as a standard PLS algorithm, and SIMPLS. It is shown that the original standard PLS algorithm outperforms the original SIMPLS in terms of numerical stability. However, SIMPLS is confirmed to perform much better in terms of speed. We have investigated reorthogonalization as the special precaution to ensure orthogonality close to machine precision. Since the increase of computing time is relatively small for SIMPLS, we therefore recommend SIMPLS with reorthogonalization. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

7.
An evaluation of computational performance and precision regarding the cross‐validation error of five partial least squares (PLS) algorithms (NIPALS, modified NIPALS, Kernel, SIMPLS and bidiagonal PLS), available and widely used in the literature, is presented. When dealing with large data sets, computational time is an important issue, mainly in cross‐validation and variable selection. In the present paper, the PLS algorithms are compared in terms of the run time and the relative error in the precision obtained when performing leave‐one‐out cross‐validation using simulated and real data sets. The simulated data sets were investigated through factorial and Latin square experimental designs. The evaluations were based on the number of rows, the number of columns and the number of latent variables. With respect to their performance, the results for both simulated and real data sets have shown that the differences in run time are statistically different. PLS bidiagonal is the fastest algorithm, followed by Kernel and SIMPLS. Regarding cross‐validation error, all algorithms showed similar results. However, in some situations as, for example, when many latent variables were in question, discrepancies were observed, especially with respect to SIMPLS. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes. With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill‐conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM‐PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative to modified NIPALS and EM‐PLS, this paper presents an efficient NLP‐based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
Within the framework of nonlinear partial least squares (PLS), the quadratic PLS regression approach, involving both linear and quadratic terms in the criterion, is discussed. A new algorithm for the determination of the components is proposed, and its advantages over the original algorithm are outlined. The approach of analysis is illustrated on the basis of simulated and real data. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
The nearest shrunken centroid (NSC) Classifier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classifier is minimized by the so‐called soft‐thresholding property of the approach. In this paper, we first show that for the two‐class prediction problem, the NSC Classifier is similar to a one‐component discriminant partial least squares (PLS) model with soft‐shrinkage of the loading weights. Then we introduce the soft‐threshold‐PLS (ST‐PLS) as a general discriminant‐PLS model with soft‐thresholding of the loading weights of multiple latent components. This method is especially suited for classification and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST‐PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST‐PLS and the NSC classifiers are applied to four real data sets. The results indicate that ST‐PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classification, and there are strong dependencies between subsets of variables. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.
We present the response‐oriented sequential alternation (ROSA) method for multiblock data analysis. ROSA is a novel and transparent multiblock extension of the partial least squares regression (PLSR). According to a “winner takes all” approach, each component of the model is calculated from the block of predictors that most reduces the current residual error. The suggested algorithm is computationally fast compared with other multiblock methods because orthogonal scores and loading weights are calculated without deflation of the predictor blocks. Therefore, it can work effectively even with a large number of blocks included. The ROSA method is invariant to block scaling and ordering. The ROSA model has the same attributes (vectors of scores, loadings, and loading weights) as PLSR and is identical to PLSR modeling for the case with only one block of predictors.  相似文献   

12.
Squared prediction errors (SPE) in are discussed in relation to the conventional PLSR versus bidiagonalization model and algorithm issue concerning residual and prediction consistency, with focus on process monitoring and fault detection. Our analysis leads to the conclusion that conventional PLSR based on the NIPALS algorithm is ambiguous in SPE values caused by process faults. The basic reason for this is that the sample residuals are not found as projections onto the orthogonal complement of the space where the scores and regression solution are located, and where also the statistical limit is defined. The alternative non‐orthogonalized PLSR and bidiagonalization (Bidiag2) algorithms, as well as a simple re‐formulation of the NIPALS algorithm (RE‐PLSR), give unambiguous SPE values, and the last two of these also retain orthogonal score vectors. While prediction results from all of these methods in theory are identical, our conclusion is that methods where the and SPE values for process faults are uncorrelated should be preferred. Tests with added errors on real data do not indicate that this conclusion should be altered because of such errors. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
The on‐line monitoring of batch processes based on principal component analysis (PCA) has been widely studied. Nonetheless, researchers have not paid so much attention to the on‐line application of partial least squares (PLS). In this paper, the influence of several issues in the predictive power of a PLS model for the on‐line estimation of key variables in a batch process is studied. Some of the conclusions can help to better understand the capabilities of the proposals presented for on‐line PCA‐based monitoring. Issues like the convenience of batch‐wise or variable‐wise unfolding, the method for the imputation of future measurements and the use of several sub‐models are addressed. This is the first time that the adaptive hierarchical (or multi‐block) approach is extended to the PLS modelling. Also, the formulation of the so‐called trimmed scores regression (TSR), a powerful imputation method defined for PCA, is extended for its application with PLS modelling. Data from two processes, one simulated and one real, are used to illustrate the results. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

14.
成忠  诸爱士 《分析化学》2008,36(6):788-792
针对光谱数据峰宽、局部效应显著、含有噪音、变量个数多及彼此间常存在严重的复共线性等问题,改进和设计一种光谱数据局部校正方法:基于窗口平滑的段式正交信号校正方法,并将之结合偏最小二乘回归,以实现光谱数据的预处理及定量分析。通过NIPALS算法初始化将滤去的正交成分,以近邻分段方式进行逐个波长点的正交信号校正。而后将去噪后的光谱矩阵作为新的自变量阵,通过偏最小二乘回归构建其与性质参变量间的校正模型。通过小麦近红外漫反射光谱数据的应用实验结果表明,本方法正交成分估计稳定,去噪明显,模型的预报性能优于其它方法,PLS成分数减少,模型更加简洁。  相似文献   

15.
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent‐variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post‐processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent‐variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X‐tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three‐component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

16.
A modified partial least squares (PLS) algorithm is presented on the basis of a novel weight updating strategy. The new weight can handle situations with directions in X space having large variance unrelated to Y , whereas the linear PLS may not work well. In the proposed algorithm, the slice transform technique is introduced to provide a piecewise linear representation of the weight vectors. Then, the corresponding mapping functions are estimated by a least square criterion of the inner relation between the observed variables and the score of response variables. At last, weight vectors are updated by the obtained mapping functions, and the corresponding scores and loadings are calculated with the new weights. An optimal piecewise linear replacements of the PLS weights are achieved by the proposed method. The predictive performances of the new approach and other methods are compared statistically using the Wilcoxon signed rank test. Experimental results show that the proposed method can achieve simpler models, whereas the model performances are at least comparable with PLS and other methods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

17.
PLS works     
In a recent paper, claims were made that most current implementations of PLS provide wrong and misleading residuals [1]. In this paper the relation between PLS and Lanczos bidiagonalization is described and it is shown that there is a good rationale behind current implementations of PLS. Most importantly, the residuals determined in current implementations of PLS are independent of the scores used for predicting the dependent variable(s). Oppositely, in the newly suggested approach, the residuals are correlated to the scores and hence may be high due to variation that is actually used for predicting. It is concluded that the current practice of calculating residuals be maintained. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

18.
Processing plants can produce large amounts of data that process engineers use for analysis, monitoring, or control. Principal component analysis (PCA) is well suited to analyze large amounts of (possibly) correlated data, and for reducing the dimensionality of the variable space. Failing online sensors, lost historical data, or missing experiments can lead to data sets that have missing values where the current methods for obtaining the PCA model parameters may give questionable results due to the properties of the estimated parameters. This paper proposes a method based on nonlinear programming (NLP) techniques to obtain the parameters of PCA models in the presence of incomplete data sets. We show the relationship that exists between the nonlinear iterative partial least squares (NIPALS) algorithm and the optimality conditions of the squared residuals minimization problem, and how this leads to the modified NIPALS used for the missing value problem. Moreover, we compare the current NIPALS‐based methods with the proposed NLP with a simulation example and an industrial case study, and show how the latter is better suited when there are large amounts of missing values. The solutions obtained with the NLP and the iterative algorithm (IA) are very similar. However when using the NLP‐based method, the loadings and scores are guaranteed to be orthogonal, and the scores will have zero mean. The latter is emphasized in the industrial case study. Also, with the industrial data used here we are able to show that the models obtained with the NLP were easier to interpret. Moreover, when using the NLP many fewer iterations were required to obtain them. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

19.
Several approaches of investigation of the relationships between two datasets where the individuals are structured into groups are discussed. These strategies fit within the framework of partial least squares (PLS) regression. Each strategy of analysis is introduced on the basis of a maximization criterion, which involves the covariances between components associated with the groups of individuals in each dataset. Thereafter, algorithms are proposed to solve these maximization problems. The strategies of analysis can be considered as extensions of multi‐group principal components analysis to the context of PLS regression. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

20.
Extension of standard regression to the case of multiple regressor arrays is given via the Kronecker product. The method is illustrated using ordinary least squares regression (OLS) as well as the latent variable (LV) methods principal component regression (PCR) and partial least squares regression (PLS). Denoting the method applied to PLS as mrPLS, the latter was shown to explain as much or more variance for the first LV relative to the comparable L‐partial least squares regression (L‐PLS) model. The same relationship holds when mrPLS is compared to PLS or n‐way partial least squares (N‐PLS) and the response array is 2‐way or 3‐way, respectively, where the regressor array corresponding to the first mode of the response array is 2‐way and the second mode regressor array is an identity matrix. In a comparison with N‐PLS using fragrance data, mrPLS proved superior in a validation sense when model selection was used. Though the focus is on 2‐way regressor arrays, the method can be applied to n‐way regressors via N‐PLS. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号