首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
The presence of multicollinearity in regression data is no exception in real life examples. Instead of applying ordinary regression methods, biased regression techniques such as principal component regression and ridge regression have been developed to cope with such datasets. In this paper, we consider partial least squares (PLS) regression by means of the SIMPLS algorithm. Because the SIMPLS algorithm is based on the empirical variance-covariance matrix of the data and on least squares regression, outliers have a damaging effect on the estimates. To reduce this pernicious effect of outliers, we propose to replace the empirical variance-covariance matrix in SIMPLS by a robust covariance estimator. We derive the influence function of the resulting PLS weight vectors and the regression estimates, and conclude that they will be bounded if the robust covariance estimator has a bounded influence function. Also the breakdown value is inherited from the robust estimator. We illustrate the results using the MCD estimator and the reweighted MCD estimator (RMCD) for low-dimensional datasets. Also some empirical properties are provided for a high-dimensional dataset.  相似文献   

2.
3.
A novel projection modeling method for quantitative structure activity relationship (QSAR) and quantitative structure property relationship (QSPR) is developed in this paper. Orthogonalization of block variables is introduced to deal with the problem of variable selection. Projections based on least squares are used to construct the modeling space in order to search for the best regression directions for chemical modeling. A suitable prediction space for such a model is further defined to confine the usage range of the model. Three real data sets were analyzed to check the performance of the proposed modeling method. The results obtained from Monte‐Carlo cross‐validation (MCCV) showed that the proposed modeling method might provide better results for QSAR and QSPR modeling than PCR and PLS with respect to both fitting and prediction abilities. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

4.
构建支持向量机-偏最小二乘法为药物构效关系建模   总被引:6,自引:0,他引:6  
李剑  陈德钊  成忠  叶子青 《分析化学》2006,34(2):263-266
为研究药物构效关系积累样本数据的过程中,需为小样本建模。此时较易造成过拟合,影响模型的预测性能和稳定性。为此可用偏最小二乘(PLS)法从样本数据中成对地提取最优成分,消除自变量间的复共线性,并有效的降维,然后应用最小二乘支持向量机对成对成分进行非线性回归,并以基于误差修正的策略调整,使之更有效地表达自、因变量间的非线性关系。由此构建为EB-LSSVM-PLS算法,所建模型的预报精度高,稳定性良好。将其应用于新型黄烷酮类衍生物的QSAR建模,效果令人满意,其泛化性能优于其它方法。  相似文献   

5.
Different published versions of partial least squares discriminant analysis (PLS‐DA) are shown as special cases of an approach exploiting prior probabilities in the estimated between groups covariance matrix used for calculation of loading weights. With prior probabilities included in the calculation of both PLS components and canonical variates, a complete strategy for extracting appropriate decision spaces with multicollinear data is obtained. This idea easily extends to weighted linear dummy regression so that the corresponding fitted values also span the canonical space. Two different choices of prior probabilities are applied with a real dataset to illustrate the effect for the obtained decision spaces. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

6.
Extension of standard regression to the case of multiple regressor arrays is given via the Kronecker product. The method is illustrated using ordinary least squares regression (OLS) as well as the latent variable (LV) methods principal component regression (PCR) and partial least squares regression (PLS). Denoting the method applied to PLS as mrPLS, the latter was shown to explain as much or more variance for the first LV relative to the comparable L‐partial least squares regression (L‐PLS) model. The same relationship holds when mrPLS is compared to PLS or n‐way partial least squares (N‐PLS) and the response array is 2‐way or 3‐way, respectively, where the regressor array corresponding to the first mode of the response array is 2‐way and the second mode regressor array is an identity matrix. In a comparison with N‐PLS using fragrance data, mrPLS proved superior in a validation sense when model selection was used. Though the focus is on 2‐way regressor arrays, the method can be applied to n‐way regressors via N‐PLS. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

7.
This study compares the performance of partial least squares (PLS) regression analysis and artificial neural networks (ANN) for the prediction of total anthocyanin concentration in red-grape homogenates from their visible-near-infrared (Vis-NIR) spectra. The PLS prediction of anthocyanin concentrations for new-season samples from Vis-NIR spectra was characterised by regression non-linearity and prediction bias. In practice, this usually requires the inclusion of some samples from the new vintage to improve the prediction. The use of WinISI LOCAL partly alleviated these problems but still resulted in increased error at high and low extremes of the anthocyanin concentration range. Artificial neural networks regression was investigated as an alternative method to PLS, due to the inherent advantages of ANN for modelling non-linear systems. The method proposed here combines the advantages of the data reduction capabilities of PLS regression with the non-linear modelling capabilities of ANN. With the use of PLS scores as inputs for ANN regression, the model was shown to be quicker and easier to train than using raw full-spectrum data. The ANN calibration for prediction of new vintage grape data, using PLS scores as inputs, was more linear and accurate than global and LOCAL PLS models and appears to reduce the need for refreshing the calibration with new-season samples. ANN with PLS scores required fewer inputs and was less prone to overfitting than using PCA scores. A variation of the ANN method, using carefully selected spectral frequencies as inputs, resulted in prediction accuracy comparable to those using PLS scores but, as for PCA inputs, was also prone to overfitting with redundant wavelengths.  相似文献   

8.
Kernel partial least squares (KPLS) and support vector regression (SVR) have become popular techniques for regression of complex non-linear data sets. The modeling is performed by mapping the data in a higher dimensional feature space through the kernel transformation. The disadvantage of such a transformation is, however, that information about the contribution of the original variables in the regression is lost. In this paper we introduce a method which can retrieve and visualize the contribution of the variables to the regression model and the way the variables contribute to the regression of complex data sets. The method is based on the visualization of trajectories using so-called pseudo samples representing the original variables in the data. We test and illustrate the proposed method to several synthetic and real benchmark data sets. The results show that for linear and non-linear regression models the important variables were identified with corresponding linear or non-linear trajectories. The results were verified by comparing with ordinary PLS regression and by selecting those variables which were indicated as important and rebuilding a model with only those variables.  相似文献   

9.
This paper presents several methods for analysis of data from reflectometric interference spectroscopic measurements (RIfS) of water samples. The set-up consists of three sensors with different polymer layers. Mixtures of butanol and ethanol in water were measured from 0 to 12,000 ppm each. The data space was characterized by principal component analysis (PCA). Calibration and prediction were achieved by multivariate methods, e.g. multiple linear regression (MLR), partial least squares (PLS) with additional predictors, and quadratic partial least squares (Q-PLS), and by use of artificial neural networks. Artificial neural networks gave the best results of all the calibration methods used. Calibration and prediction of the concentration of the two analytes by artificial neural nets were robust and the set-up could be reduced to only two sensors without deterioration of the prediction.  相似文献   

10.
将滴定体系调节至pH 2.0,用碱标准溶液滴定至特定pH所消耗滴定荆为测量指标,构建了多组分有机酸滴定数据阵,分别以主成分回归法、偏最小二乘法以及人工神经元网络法进行多组分拟合.结果表明,偏最小二乘法的拟合结果最佳,对混合体系中乙酸、乳酸、草酸、琥珀酸、柠檬酸和乌头酸总量的相对预测均方根误差分别为5.80%、8.88%...  相似文献   

11.
This paper is about how to incorporate interaction effects in multi‐block methodologies. The method proposed is inspired by polynomial regression modelling in the case with only a few independent variables but extends/generalises the idea to situations where the blocks are potentially very large with respect to the number of variables. The method follows a so‐called type I sums of squares strategy where the linear effects (main effects) are incorporated sequentially and before the interactions. The sequential and orthogonalised partial least squares (SO‐PLS) technique is used as a basis for the proposal. The SO‐PLS method is based on sequential estimation of each new block by the PLS regression method after orthogonalisation with respect to blocks already fitted. The new method preserves the invariance already established for SO‐PLS and can be used for blocks with different dimensionality. The method is tested on one real data set with two independent blocks with different complexity and on a simulated data set with a large number of variables in each block. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

12.
The nearest shrunken centroid (NSC) Classifier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classifier is minimized by the so‐called soft‐thresholding property of the approach. In this paper, we first show that for the two‐class prediction problem, the NSC Classifier is similar to a one‐component discriminant partial least squares (PLS) model with soft‐shrinkage of the loading weights. Then we introduce the soft‐threshold‐PLS (ST‐PLS) as a general discriminant‐PLS model with soft‐thresholding of the loading weights of multiple latent components. This method is especially suited for classification and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST‐PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST‐PLS and the NSC classifiers are applied to four real data sets. The results indicate that ST‐PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classification, and there are strong dependencies between subsets of variables. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

13.
The issue of outer model weight updating is important in extending partial least squares (PLS) regression to modelling data that shows significant non‐linearity. This paper presents a novel co‐evolutionary component approach to the weight updating problem. Specification of the non‐linear PLS model is achieved using an evolutionary computational (EC) method that can co‐evolve all non‐linear inner models and all input projection weights simultaneously. In this method, modular symbolic non‐linear equations are used to represent the inner models and binary sequences are used to represent the projection weights. The approach is flexible, and other representations could be employed within the same co‐evolutionary framework. The potential of these methods is illustrated using a simulated pH neutralisation process data set exhibiting significant non‐linearity. It is demonstrated that the co‐evolutionary component architecture can produce results which are competitive with non‐linear neural network‐based PLS algorithms that use iterative projection weight updating. In addition, a data sampling method for mitigating overfitting to the training data is described. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

14.
邵学广  陈达  徐恒  刘智超  蔡文生 《中国化学》2009,27(7):1328-1332
偏最小二乘法(PLS)在近红外光谱(NIR)定量分析中占有重要地位,但预测结果往往容易受到样本分组和奇异样本等因素的影响,稳健性不强。多模型PLS (EPLS)方法在模型稳健性上得到提高,然而它无法识别样本中存在的奇异样本。为了同时提高模型的预测准确性和稳健性,本文提出了一种根据取样概率重新取样的多模型PLS方法,称为稳健共识PLS(RE-PLS)方法。该方法通过迭代赋权偏最小二乘法(IRPLS)计算样本回归残差得到每个校正集样本的取样概率,然后根据样本的取样概率来选择训练子集建立多个PLS模型,最后将所有PLS模型的预测结果平均作为最终预测结果。该方法用于两种不同植物样品的近红外光谱建模,并与传统的PLS及EPLS方法进行比较。结果表明该方法可以有效的避免校正集中奇异样本对模型的影响,同时可以提高预测精确度和稳健性。对于含有较多奇异样本的,复杂近红外光谱烟草实际样本,利用简单PLS或者EPLS方法建模预测效果不是很理想,而RE-PLS凭借其独特优势则有望在这种复杂光谱定量分析中得到广泛的应用。  相似文献   

15.
成忠  诸爱士 《分析化学》2008,36(6):788-792
针对光谱数据峰宽、局部效应显著、含有噪音、变量个数多及彼此间常存在严重的复共线性等问题,改进和设计一种光谱数据局部校正方法:基于窗口平滑的段式正交信号校正方法,并将之结合偏最小二乘回归,以实现光谱数据的预处理及定量分析。通过NIPALS算法初始化将滤去的正交成分,以近邻分段方式进行逐个波长点的正交信号校正。而后将去噪后的光谱矩阵作为新的自变量阵,通过偏最小二乘回归构建其与性质参变量间的校正模型。通过小麦近红外漫反射光谱数据的应用实验结果表明,本方法正交成分估计稳定,去噪明显,模型的预报性能优于其它方法,PLS成分数减少,模型更加简洁。  相似文献   

16.
自适应模糊偏最小二乘方法在药物构效关系建模中的应用   总被引:2,自引:0,他引:2  
作为一种局部逼近方法,自适应神经模糊推理系统(ANFIS)适于为药物定量构效关系(QSAR)建模。描述药物分子结构的参数较多,常存在耦合关系,会增加建模难度,并影响模型的预报性能。为此,将ANFIS和偏最小二乘(PLS)相结合,先由PLS从样本数据中提取成分,再由ANFIS实现每对成分间的非线性映射,并基于输出误差进一步修正所提取的成分,使之对因变量具有最优的解释能力,由此构建为EB-AFPLS方法。该法已成功地应用于HIV-1蛋白酶抑制剂的QSAR建模,效果良好,显示出很强的学习能力,所建模型的预报性能也优于其它方法。  相似文献   

17.
Ren S  Gao L 《The Analyst》2011,136(6):1252-1261
This paper suggests a novel method named DF-LS-SVM, which is based on least squares support vector machines (LS-SVM) regression combined with data fusion (DF) to enhance the ability to extract characteristic information and improve the quality of the regression. Simultaneous multicomponent determination of Fe(III), Co(II) and Cu(II) was conducted for the first time by using the proposed method. Data fusion is a technique that integrates information from disparate sources to produce a single model or decision. The LS-SVM technique allows for learning a high-dimensional feature with fewer training data, and reduces the computational complexity by only requiring the solution of a set of linear equations instead of a quadratic programming problem. Experimental results showed that the DF-LS-SVM method was successful for simultaneous multicomponent determination even when severe overlap of spectra existed. The DF-LS-SVM method is an attractive and promising hybrid approach that combines the best properties of the two techniques. The results obtained from an additional test case, simultaneous differential pulse voltammetric determination of o-nitrophenol, m-nitrophenol and p-nitrophenol, also demonstrated that the DF-LS-SVM method performed somewhat better than LS-SVM and PLS methods.  相似文献   

18.
Partial least squares (PLS) regression is a linear regression technique developed to relate many regressors to one or several response variables. Robust methods are introduced to reduce or remove the effect of outlying data points. In this paper, we show that if the sample covariance matrix is properly robustified further robustification of the linear regression steps of the PLS algorithm becomes unnecessary. The robust estimate of the covariance matrix is computed by searching for outliers in univariate projections of the data on a combination of random directions (Stahel—Donoho) and specific directions obtained by maximizing and minimizing the kurtosis coefficient of the projected data, as proposed by Peña and Prieto [1]. It is shown that this procedure is fast to apply and provides better results than other methods proposed in the literature. Its performance is illustrated by Monte Carlo and by an example, where the algorithm is able to show features of the data which were undetected by previous methods. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

19.
This paper presents several methods for analysis of data from reflectometric interference spectroscopic measurements (RIfS) of water samples. The set-up consists of three sensors with different polymer layers. Mixtures of butanol and ethanol in water were measured from 0 to 12,000 ppm each. The data space was characterized by principal component analysis (PCA). Calibration and prediction were achieved by multivariate methods, e.g. multiple linear regression (MLR), partial least squares (PLS) with additional predictors, and quadratic partial least squares (Q-PLS), and by use of artificial neural networks. Artificial neural networks gave the best results of all the calibration methods used. Calibration and prediction of the concentration of the two analytes by artificial neural nets were robust and the set-up could be reduced to only two sensors without deterioration of the prediction. Received: 29 September 2000 / Revised: 30 April 2001 / Accepted: 3 May 2001  相似文献   

20.
Linear and non-linear calibration methods (principal component regression (PCR), partial least squares regression (PLS), and neural networks (NN)) were applied to a slightly non-linear Raman data set. Because of the large size of this data set, recently introduced linear calibration methods, specifically optimised for speed, were also used. These fast methods achieve speed improvement by using the Lanczos decomposition for the singular value decomposition steps of the calibration procedures, and for some of their variants, by optimising the models without cross-validation (CV). Linear methods could deal with the slight non-linearity present in the data by including extra components, therefore, performing comparably to NNs. The fast methods performed as well as their classical equivalents in terms of precision in prediction, but the results were obtained considerably faster. It, however, appeared that CV remains the most appropriate method for model complexity estimation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号