首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The topic of this paper is regression models based on designed experiments, where additional spectroscopic measurements are also available. This particular case describes a situation with two spectral blocks with no natural order: The blocks are parallel. Three methods are described, which combine least squares regression of the design variables with PCA or PLS on the spectra. The methods properties are explored in two simulation studies based on real experiments. The results show that the methods are equal when it comes to prediction, but interpretability varies. One of the methods, LS‐ParPLS, is especially interesting when it comes to interpretability because it splits the spectral information into two parts; information that is common in both blocks and information that is unique for each block. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

2.
This paper presents a new approach to path modelling, based on a sequential multi‐block modelling in latent variables. The approach is explorative and focused on interpretation. The method breaks with standard traditions of estimating all paths using one single modelling. Instead, one separate model is estimated for each endogenous block. Each separate model is constructed by stepwise use of the standard PLS regression on matrices that are orthogonalised with respect to each other. The advantages of the approach are that it can allow for different dimensionality within each block, it is invariant to relative weighting of the blocks and it is based on simple and standard methodology allowing for simple outlier detection, validation and interpretation. No convergence problems are involved and the method can be used for situations with many more variables than samples. An application based on sensory analysis of wines will be used to illustrate the method. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

3.
Extension of standard regression to the case of multiple regressor arrays is given via the Kronecker product. The method is illustrated using ordinary least squares regression (OLS) as well as the latent variable (LV) methods principal component regression (PCR) and partial least squares regression (PLS). Denoting the method applied to PLS as mrPLS, the latter was shown to explain as much or more variance for the first LV relative to the comparable L‐partial least squares regression (L‐PLS) model. The same relationship holds when mrPLS is compared to PLS or n‐way partial least squares (N‐PLS) and the response array is 2‐way or 3‐way, respectively, where the regressor array corresponding to the first mode of the response array is 2‐way and the second mode regressor array is an identity matrix. In a comparison with N‐PLS using fragrance data, mrPLS proved superior in a validation sense when model selection was used. Though the focus is on 2‐way regressor arrays, the method can be applied to n‐way regressors via N‐PLS. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

4.
Partial Least Squares (PLS) is a wide class of regression methods aiming at modelling relationships between sets of observed variables by means of latent variables. Specifically, PLS2 was developed to correlate two blocks of data, the X‐block representing the independent or explanatory variables and the Y‐block representing the dependent or response variables. Lately, OPLS was introduced to further reduce model complexity by removing Y‐orthogonal sources of variation from X in the latent space, thus improving data interpretation through the generated predictive latent variables. Nevertheless, relationships between PLS2 and OPLS in case of multiple Y‐response have not yet been fully explored. With this perspective and taking inspiration from some basic mathematical properties of PLS2, we here present a novel and general approach consisting in a post‐transformation of PLS2 (ptPLS2), which results in a decomposition of the latent space into orthogonal and predictive components, while preserving the same goodness of fit and predictive ability of PLS2. Additionally, we discuss the application of ptPLS2 approach to two metabolomic data sets extracted from earlier published studies and its advantages in model interpretation as compared with the ‘standard’ PLS approach. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

5.
Optimized sample-weighted partial least squares   总被引:2,自引:0,他引:2  
Lu Xu 《Talanta》2007,71(2):561-566
In ordinary multivariate calibration methods, when the calibration set is determined to build the model describing the relationship between the dependent variables and the predictor variables, each sample in the calibration set makes the same contribution to the model, where the difference of representativeness between the samples is ignored. In this paper, by introducing the concept of weighted sampling into partial least squares (PLS), a new multivariate regression method, optimized sample-weighted PLS (OSWPLS) is proposed. OSWPLS differs from PLS in that it builds a new calibration set, where each sample in the original calibration set is weighted differently to account for its representativeness to improve the prediction ability of the algorithm. A recently suggested global optimization algorithm, particle swarm optimization (PSO) algorithm is used to search for the best sample weights to optimize the calibration of the original training set and the prediction of an independent validation set. The proposed method is applied to two real data sets and compared with the results of PLS, the most significant improvement is obtained for the meat data, where the root mean squared error of prediction (RMSEP) is reduced from 3.03 to 2.35. For the fuel data, OSWPLS can also perform slightly better or no worse than PLS for the prediction of the four analytes. The stability and efficiency of OSWPLS is also studied, the results demonstrate that the proposed method can obtain desirable results within moderate PSO cycles.  相似文献   

6.
A novel projection modeling method for quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) is developed in this paper. Orthogonalization of block variables is introduced to deal with the problem of variable selection. Projections based on least squares are used to construct the modeling space in order to search for the best regression directions for chemical modeling. A suitable prediction space for such a model is further defined to confine the usage range of the model. Three real data sets were analyzed to check the performance of the proposed modeling method. The results obtained from Monte‐Carlo cross‐validation (MCCV) showed that the proposed modeling method might provide better results for QSAR and QSPR modeling than PCR and PLS with respect to both fitting and prediction abilities. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

7.
The selection abilities of the two well‐known techniques of variable selection, synergy interval‐partial least‐squares (SiPLS) and genetic algorithm‐partial least‐squares (GA‐PLS), have been examined and compared. By using different simulated and real (corn and metabolite) datasets, keeping in view the spectral overlapping of the components, the influence of the selection of either intervals of variables or individual variables on the prediction performances was examined. In the simulated datasets, with decrease in the overlapping of the spectra of components and cases with components of narrow bands, GA‐PLS results were better. In contrast, the performance of SiPLS was higher for data of intermediate overlapping. For mixtures of high overlapping analytes, GA‐PLS showed slightly better performance. However, significant differences between the results of the two selection methods were not observed in most of the cases. Although SiPLS resulted in slightly better performance of prediction in the case of corn dataset except for the prediction of the moisture content, the improvement obtained by SiPLS compared with that by GA‐PLS was not significant. For real data of less overlapped components (metabolite dataset), GA‐PLS that tends to select far fewer variables did not give significantly better root mean square error of cross‐validation (RMSECV), cross‐validated R2 (Q2), and root mean square error of prediction (RMSEP) compared with SiPLS. Irrespective of the type of dataset, GA‐PLS resulted in models with fewer latent variables (LVs). When comparing the computational time of the methods, GA‐PLS is considered superior to SiPLS. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single‐response case. The present paper presents an approach to identification of the complete y ‐orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non‐deflating OPLS algorithms and a modified principal component analysis (PCA)‐driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the non‐orthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a ‘built‐in’ property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non‐existing. The PCA‐driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix X orth with y ‐orthogonal columns and a rank of one less than the rank of X . The desired truncation corresponds exactly to the first X deflation step of Martens non‐orthogonal PLS algorithm. The significant y ‐orthogonal structure of X found by PCA of X orth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non‐orthogonal algorithm into an efficient dual PLS1–OPLS algorithm. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
10.
Partial least squares (PLS) is a widely used algorithm in the field of chemometrics. In calibration studies, a PLS variant called orthogonal projection to latent structures (O‐PLS) has been shown to successfully reduce the number of model components while maintaining good prediction accuracy, although no theoretical analysis exists demonstrating its applicability in this context. Using a discrete formulation of the linear mixture model known as Beer's law, we explicitly analyze O‐PLS solution properties for calibration data. We find that, in the absence of noise and for large n, O‐PLS solutions are simpler but just as accurate as PLS solutions for systems in which analyte and background concentrations are uncorrelated. However, the same is not true for the most general chemometric data in which correlations between the analyte and background concentrations are nonzero and pure profiles overlap. On the contrary, forcing the removal of orthogonal components may actually degrade interpretability of the model. This situation can also arise when the data are noisy and n is small, because O‐PLS may identify and model the noise as orthogonal when it is statistically uncorrelated with the analytes. For the types of data arising from systems biology studies, in which the number of response variables may be much greater than the number of observations, we show that O‐PLS is unlikely to discover orthogonal variation whether or not it exists. In this case, O‐PLS and PLS solutions are the same. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

11.
Advances in sensory systems have led to many industrial applications with large amounts of highly correlated data, particularly in chemical and pharmaceutical processes. With these correlated data sets, it becomes important to consider advanced modeling approaches built to deal with correlated inputs in order to understand the underlying sources of variability and how this variability will affect the final quality of the product. Additional to the correlated nature of the data sets, it is also common to find missing elements and noise in these data matrices. Latent variable regression methods such as partial least squares or projection to latent structures (PLS) have gained much attention in industry for their ability to handle ill‐conditioned matrices with missing elements. This feature of the PLS method is accomplished through the nonlinear iterative PLS (NIPALS) algorithm, with a simple modification to consider the missing data. Moreover, in expectation maximization PLS (EM‐PLS), imputed values are provided for missing data elements as initial estimates, conventional PLS is then applied to update these elements, and the process iterates to convergence. This study is the extension of previous work for principal component analysis (PCA), where we introduced nonlinear programming (NLP) as a means to estimate the parameters of the PCA model. Here, we focus on the parameters of a PLS model. As an alternative to modified NIPALS and EM‐PLS, this paper presents an efficient NLP‐based technique to find model parameters for PLS, where the desired properties of the parameters can be explicitly posed as constraints in the optimization problem of the proposed algorithm. We also present a number of simulation studies, where we compare effectiveness of the proposed algorithm with competing algorithms. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

12.
This paper presents a multivariate regression method for simultaneous detection of sugar (sucrose as a sugar equivalent) and ethanol concentrations in aqueous solutions via temperature‐dependent ultrasonic velocity. Thus, several samples of different combined concentration values were exposed to a temperature spectrum ranging from 2 to 30°C to investigate the temperature dependence of ultrasonic velocity. Model calibration was performed in order to predict the concentrations of interest. With results of proceeded experiments, the equations for calculation of unknown concentrations were carried out using polynomial regression revealing two equations with functional dependence of concentrations on each other. Further, side effects or systematic errors are still included in this model. To avoid such problems as well as to increase the accuracy with respect to the absolute errors in determining unknown probes, multivariate regression methods such as partial least squares (PLS) were tested and compared to the results obtained by polynomial regression. The accuracy achieved with chemometric models on average was three times higher. In direct comparison, the values of the error for the prediction of sucrose concentration were on average around 0.4 g/100 g in the regression model with polynomial background (RMPA) and around 0.12 g/100 g in the PLS model, and for ethanol concentration 0.13 and 0.04 g/100 g, respectively. Furthermore, calculations of the concentrations are possible without knowing the concentrations of the other solute. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
The purpose of our study was the evaluation of the most important factors that affect the volumetric and conventional mechanical properties of produced asphalt mix and the volumetric properties of built‐in asphalt layer. Asphalt mix design follows the standard procedure (Marshall procedure). We were interested not only in the quantity of bitumen specified by the Marshall procedure, but also in the quantity of stone aggregate fractions, temperatures of production and properties of bitumen that is used. The influence of these factors was investigated with several models. For the building of models we used 444 asphalt samples, analysed by one laboratory. To select the most important factors, several multiple linear regression (MLR) models, partial least squares (PLS) regression models and counterpropagation neural network models were made. Obtained models were tested with leave‐one‐out (LOO) and leave‐10%‐out cross‐validation procedures. The results of MLR and PLS models show that the independent variables are closely related. Among 21 variables there is only one found as less important. MLR and PLS models show better predictive ability than counterpropagation neural network models. The best MLR models will be employed for the preparation of the asphalt mix design (recipe) with some unknown material. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
《Analytical letters》2012,45(12):1910-1921
Multiblock partial least squares (MB-PLS) are applied for determination of corn and tobacco samples by using near-infrared diffuse reflection spectroscopy. In the model, the spectra are separated into several sub-blocks along the wavenumber, and different latent variable number was used for each sub-block. Compared with ordinary PLS, the importance and the contribution of each sub-block can be balanced by super-weights and the usage of different latent variable numbers. Therefore, the prediction obtained by the MB-PLS model is superior to that of the ordinary PLS, especially for the large data sets of tobacco samples with a large number of variables.  相似文献   

15.
The nearest shrunken centroid (NSC) Classifier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classifier is minimized by the so‐called soft‐thresholding property of the approach. In this paper, we first show that for the two‐class prediction problem, the NSC Classifier is similar to a one‐component discriminant partial least squares (PLS) model with soft‐shrinkage of the loading weights. Then we introduce the soft‐threshold‐PLS (ST‐PLS) as a general discriminant‐PLS model with soft‐thresholding of the loading weights of multiple latent components. This method is especially suited for classification and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST‐PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST‐PLS and the NSC classifiers are applied to four real data sets. The results indicate that ST‐PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classification, and there are strong dependencies between subsets of variables. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

16.
A PLS model for prediction of somatic cell count (SCC) based on near-infrared (NIR) spectra of unhomogenized milk is presented in the study. Samples of raw milk were collected from cows in the early lactation period (from 7th to 29th day after parturition). The NIR spectra were measured in the region 400–1100 nm. As reference method a fluoro-opto-electronic method was applied. Different preprocessing methods were investigated. The robust version of PLS regression was applied to handle outliers present in the dataset and the uninformative variable elimination–partial least squares (UVE–PLS) method was used to eliminate uninformative variables. The final model is acceptable for prediction of SCC in raw milk.  相似文献   

17.
18.
The on‐line monitoring of batch processes based on principal component analysis (PCA) has been widely studied. Nonetheless, researchers have not paid so much attention to the on‐line application of partial least squares (PLS). In this paper, the influence of several issues in the predictive power of a PLS model for the on‐line estimation of key variables in a batch process is studied. Some of the conclusions can help to better understand the capabilities of the proposals presented for on‐line PCA‐based monitoring. Issues like the convenience of batch‐wise or variable‐wise unfolding, the method for the imputation of future measurements and the use of several sub‐models are addressed. This is the first time that the adaptive hierarchical (or multi‐block) approach is extended to the PLS modelling. Also, the formulation of the so‐called trimmed scores regression (TSR), a powerful imputation method defined for PCA, is extended for its application with PLS modelling. Data from two processes, one simulated and one real, are used to illustrate the results. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

19.
Ni Xin  Qinghua Meng  Yizhen Li  Yuzhu Hu 《中国化学》2011,29(11):2533-2540
This paper indicates the possibility to use near infrared (NIR) spectral similarity as a rapid method to estimate the quality of Flos Lonicerae. Variable selection together with modelling techniques is utilized to select representative variables that are used to calculate the similarity. NIR is used to build calibration models to predict the bacteriostatic activity of Flos Lonicerae. For the determination of the bacteriostatic activity, the in vitro experiment is used. Models are built for the Gram‐positive bacteria and also for the Gram‐negative bacteria. A genetic algorithm combined with partial least squares regression (GA‐PLS) is used to perform the calibration. The results of GA‐PLS models are compared to interval partial least squares (iPLS) models, full‐spectrum PLS and full‐spectrum principal component regression (PCR) models. Then, the variables in the two GA‐PLS models are combined and then used to calculate the NIR spectral similarity of samples. The similarity based on the characteristic variables and full spectrum is used for evaluating the fingerprints of Flos Lonicerae, respectively. The results show that the combination of variable selection method, modelling techniques and similarity analysis might be a powerful tool for quality control of traditional Chinese medicine (TCM).  相似文献   

20.
We present the response‐oriented sequential alternation (ROSA) method for multiblock data analysis. ROSA is a novel and transparent multiblock extension of the partial least squares regression (PLSR). According to a “winner takes all” approach, each component of the model is calculated from the block of predictors that most reduces the current residual error. The suggested algorithm is computationally fast compared with other multiblock methods because orthogonal scores and loading weights are calculated without deflation of the predictor blocks. Therefore, it can work effectively even with a large number of blocks included. The ROSA method is invariant to block scaling and ordering. The ROSA model has the same attributes (vectors of scores, loadings, and loading weights) as PLSR and is identical to PLSR modeling for the case with only one block of predictors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号