首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The presence of multicollinearity in regression data is no exception in real life examples. Instead of applying ordinary regression methods, biased regression techniques such as principal component regression and ridge regression have been developed to cope with such datasets. In this paper, we consider partial least squares (PLS) regression by means of the SIMPLS algorithm. Because the SIMPLS algorithm is based on the empirical variance-covariance matrix of the data and on least squares regression, outliers have a damaging effect on the estimates. To reduce this pernicious effect of outliers, we propose to replace the empirical variance-covariance matrix in SIMPLS by a robust covariance estimator. We derive the influence function of the resulting PLS weight vectors and the regression estimates, and conclude that they will be bounded if the robust covariance estimator has a bounded influence function. Also the breakdown value is inherited from the robust estimator. We illustrate the results using the MCD estimator and the reweighted MCD estimator (RMCD) for low-dimensional datasets. Also some empirical properties are provided for a high-dimensional dataset.  相似文献   

2.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

3.
The work summarized in this paper presents the first part of a three‐paper series on robust partial least squares (RPLS) regression. Motivated by recent research activities in this area, this part provides a detailed algorithmic analysis of associated techniques, showing that existing work (i) may not represent a true robust formulation of partial least squares (PLS), (ii) may lead to convergence problems or (iii) may be insensitive to a certain type of outlier. On the basis of this analysis, Part I introduces a new conceptual RPLS algorithm that overcomes the deficiencies of existing work. The second part of this work details this new RPLS technique, compares its peformance with existing RPLS methods and provides an analysis on the computational efficiency and sensitivity of these algorithms. Whilst the first two parts of this work discuss algorithmic developments of RPLS, the final part concentrates on practical issues of RPLS implementations. This third part is devoted to practitioners of chemistry and chemical engineering covering a wide range of applications involving a calibration experiment, the analysis of recorded data from an industrial debutanizer process and data from a number of Raman spectroscopy experiments. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

4.
We introduce a new nonlinear partial least squares algorithm ‘Quadratic Fuzzy PLS (QFPLS)’ that combines the outer linear Partial Least Squares (PLS) framework and the Takagi–Sugeno–Kang (TSK) fuzzy inference system. The inner relation between the input and the output PLS score vectors is modeled by a quadratic TSK fuzzy inference system. The performance of the proposed QFPLS method is tested and compared against four other well‐known partial least squares methods (Linear PLS (LPLS), Quadratic PLS (QPLS), Linear Fuzzy PLS (LFPLS), and Neural Network PLS (NNPLS)) on various different types of randomly generated test data. QFPLS outperformed competitors based on two comparison measures: the output variables cumulative per cent variance captured by the PLS latent variables and the root mean‐square error of prediction (RMSEP). Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

6.
The estimation of the prediction region of partial least squares (PLS) is necessary in many engineering applications. However, research in this area focuses on the estimation of prediction intervals only. In this work, a new recursive formulation of PLS is proposed to facilitate the calculation of the Jacobian matrix of the estimated coefficient matrix. Furthermore, the computational complexity analysis indicates that the proposed algorithm is O(m2N + mpN + mpN2 + mN3 + mpN4) per number of component. The prediction region of the multivariate PLS is obtained through local linearization. The new formulation provides one way to obtain the prediction region of the multivariate PLS. Simulation and near‐infrared spectra of corn case studies indicate the utility of the proposed method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
8.
9.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

10.
A partial least squares (PLS-1) calibration model based on kinetic—spectrophotometric measurement, for the simultaneous determination of Cu(II), Ni(II) and Co(II) ions is described. The method was based on the difference in the rate of the reaction between Co(II), Ni(II) and Cu(II) ions with 1-(2-pyridylazo)2-naphthol in a pH 5.8 buffer solution and in micellar media at 25°C. The absorption kinetic profiles of the solutions were monitored by measuring the absorbance at 570 nm at 2 s intervals during the time range of 0–10 min after initiation of the reaction. The experimental calibration matrix for the partial least squares (PLS-1) model was designed with 30 samples. The cross-validation method was used for selecting the number of factors. The results showed that simultaneous determination could be performed in the range 0.1-2 μg mL−1 for each cation. The proposed method was successfully applied to the simultaneous determination of Cu(II), Ni(II) and Co(II) ions in water and in synthetic alloy samples.   相似文献   

11.
In this study we compared the use of ordinary least squares and weighted least squares in the calibration of the method for analyzing essential and toxic metals present in human milk by ICP-OES, in order to avoid systematic errors in the measurements used. Human milk samples were provided by maternity clinic Odete Valadares and digested by means of a high-performance microwave (MW) oven. Evaluation of plasma short and long-term stability was made using a solution of digested milk (1:50) with 2.0 mg L−1 Mg in HNO3 2% (v/v). The detection power resulted to be at or below the μg L−1 level, whilst the precision expressed as relative standard deviation R.S.D. was almost always equal to or better than 3.3%. Certified reference material Infant Formula (NIST SRM 1846) was used to assess the accuracy of the proposed method, which proved to be accurate and precise. Recovery rates were in the range of 83-117%. Aqueous calibration was carried out for each element under study.  相似文献   

12.
将滴定体系调节至pH 2.0,用碱标准溶液滴定至特定pH所消耗滴定荆为测量指标,构建了多组分有机酸滴定数据阵,分别以主成分回归法、偏最小二乘法以及人工神经元网络法进行多组分拟合.结果表明,偏最小二乘法的拟合结果最佳,对混合体系中乙酸、乳酸、草酸、琥珀酸、柠檬酸和乌头酸总量的相对预测均方根误差分别为5.80%、8.88%...  相似文献   

13.
We propose a new data compression method for estimating optimal latent variables in multi‐variate classification and regression problems where more than one response variable is available. The latent variables are found according to a common innovative principle combining PLS methodology and canonical correlation analysis (CCA). The suggested method is able to extract predictive information for the latent variables more effectively than ordinary PLS approaches. Only simple modifications of existing PLS and PPLS algorithms are required to adopt the proposed method. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

14.
With an increasing number of publicly available microarray datasets, it becomes attractive to borrow information from other relevant studies to have more reliable and powerful analysis of a given dataset. We do not assume that subjects in the current study and other relevant studies are drawn from the same population as assumed by meta-analysis. In particular, the set of parameters in the current study may be different from that of the other studies. We consider sample classification based on gene expression profiles in this context. We propose two new methods, a weighted partial least squares (WPLS) method and a weighted penalized partial least squares (WPPLS) method, to build a classifier by a combined use of multiple datasets. The methods can weight the individual datasets depending on their relevance to the current study. A more standard approach is first to build a classifier using each of the individual datasets, then to combine the outputs of the multiple classifiers using a weighted voting. Using two quite different datasets on human heart failure, we show first that WPLS/WPPLS, by borrowing information from the other dataset, can improve the performance of PLS/PPLS built on only a single dataset. Second, WPLS/WPPLS performs better than the standard approach of combining multiple classifiers. Third, WPPLS can improve over WPLS, just as PPLS does over PLS for a single dataset.  相似文献   

15.
本文报道了一种简便、快速、准确的同时测定三种人造甜味剂安赛蜜、阿斯巴甜和糖精钠的方法。方法基于在pH为3.21的盐酸溶液中对安赛蜜、阿斯巴甜和糖精钠三组分混合溶液进行紫外光度测定,所得重叠光谱数据分别用偏最小二乘回归法(partial least squares regression,PLSR)、特征峰结合PLSR法和特征峰结合局部偏最小二乘回归法(local partial least squares regression,LPLSR)进行处理。结果表明,选取特征波段的峰值作为自变量,采用4个局部样本做拟合的预报误差最小,总相对偏差仅为3.05%。对果汁样品进行测定,获得了很好的定量分析结果。安赛蜜、阿斯巴甜和糖精钠的定量线性范围分别为1.0 - 30.0 mg/L、1.0 - 10.0 mg/L和1.0 – 10.0 mg/L。  相似文献   

16.
17.
From the fundamental parts of PLS‐DA, Fisher's canonical discriminant analysis (FCDA) and Powered PLS (PPLS), we develop the concept of powered PLS for classification problems (PPLS‐DA). By taking advantage of a sequence of data reducing linear transformations (consistent with the computation of ordinary PLS‐DA components), PPLS‐DA computes each component from the transformed data by maximization of a parameterized Rayleigh quotient associated with FCDA. Models found by the powered PLS methodology can contribute to reveal the relevance of particular predictors and often requires fewer and simpler components than their ordinary PLS counterparts. From the possibility of imposing restrictions on the powers available for optimization we obtain an explorative approach to predictive modeling not available to the traditional PLS methods. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

18.
19.
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

20.
Multi-way partial least squares modeling of water quality data   总被引:1,自引:0,他引:1  
A 10 years surface water quality data set pertaining to a polluted river was analyzed using partial least squares (PLS) regression models. Both the unfold-PLS and N-PLS (tri-PLS and quadri-PLS) models were calibrated through leave-one out cross-validation method. These were applied to the multivariate, multi-way data array with a view to assess and compare their predictive capabilities for biochemical oxygen demand (BOD) of river water in terms of their relative mean squares error of cross-validation, prediction and variance captured. The sum of squares of residuals and leverages were computed and analyzed to identify the sites, variables, years and months which may have influence on the constructed model. Both the tri- and quadri-PLS models yielded relatively low validation error as compared to unfold-PLS and captured high variance in model. Moreover, both of these methods produced acceptable model precision and accuracy. In case of tri-PLS the root mean squares errors were 1.65 and 2.17 for calibration and prediction, respectively; whereas these were 2.58 and 1.09 for quadri-PLS. At a preliminary level it seems that BOD can be predicted but a different data arrangement is needed. Moreover, analysis of the scores and loadings plots of the N-PLS models could provide information on time evolution of the river water quality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号