首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
采用高分辨电喷雾萃取电离质谱(EESI-MS)技术对肝衰竭患者和健康志愿者呼出气体样本进行快速检测, 结合多块偏最小二乘分析(MB-PLS)方法, 对多批次获取的呼出气体代谢数据进行统计建模分析, 并与传统的PLS方法进行比较. 结果表明, MB-PLS方法能有效消除批次差异对统计建模的影响. 此外, 利用MB-PLS模型变量VIP值对变量进行筛选, 可降低数据的冗余, 消除无关变量对模型的影响, 从而有效提高了模型的性能.  相似文献   

2.
The number of latent variables (LVs) or the factor number is a key parameter in PLS modeling to obtain a correct prediction. Although lots of work have been done on this issue, it is still a difficult task to determine a suitable LV number in practical uses. A method named independent factor diagnostics (IFD) is proposed for investigation of the contribution of each LV to the predicted results on the basis of discussion about the determination of LV number in PLS modeling for near infrared (NIR) spectra of complex samples. The NIR spectra of three data sets of complex samples, including a public data set and two tobacco lamina ones, are investigated. It is shown that several high order LVs constitute main contributions to the predicted results, albeit the contribution of the low order LVs should not be neglected in the PLS models. Therefore, in practical uses of PLS for analysis of complex samples, it may be better to use a slightly large LV number for NIR spectral analysis of complex samples. Supported by the National Natural Science Foundation of China (Grant Nos. 20775036 & 20835002)  相似文献   

3.
4.
The nearest shrunken centroid (NSC) Classifier is successfully applied for class prediction in a wide range of studies based on microarray data. The contribution from seemingly irrelevant variables to the classifier is minimized by the so‐called soft‐thresholding property of the approach. In this paper, we first show that for the two‐class prediction problem, the NSC Classifier is similar to a one‐component discriminant partial least squares (PLS) model with soft‐shrinkage of the loading weights. Then we introduce the soft‐threshold‐PLS (ST‐PLS) as a general discriminant‐PLS model with soft‐thresholding of the loading weights of multiple latent components. This method is especially suited for classification and variable selection when the number of variables is large compared to the number of samples, which is typical for gene expression data. A characteristic feature of ST‐PLS is the ability to identify important variables in multiple directions in the variable space. Both the ST‐PLS and the NSC classifiers are applied to four real data sets. The results indicate that ST‐PLS performs better than the shrunken centroid approach if there are several directions in the variable space which are important for classification, and there are strong dependencies between subsets of variables. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

5.
We propose a new data compression method for estimating optimal latent variables in multi‐variate classification and regression problems where more than one response variable is available. The latent variables are found according to a common innovative principle combining PLS methodology and canonical correlation analysis (CCA). The suggested method is able to extract predictive information for the latent variables more effectively than ordinary PLS approaches. Only simple modifications of existing PLS and PPLS algorithms are required to adopt the proposed method. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
Traditionally the partial least-squares (PLS) algorithm, commonly used in chemistry for ill-conditioned multivariate linear regression, has been derived (motivated) and presented in terms of data matrices. In this work the PLS algorithm is derived probabilistically in terms of stochastic variables where sample estimates calculated using data matrices are employed at the end. The derivation, which offers a probabilistic motivation to each step of the PLS algorithm, is performed for the general multiresponse case and without reference to any latent variable model of the response variable and also without any so-called "inner relation". On the basis of the derivation, some theoretical issues of the PLS algorithm are briefly considered: the complexity of the original motivation of PLS regression which involves an "inner relation"; the original motivation behind the prediction stage of the PLS algorithm; the relationship between uncorrelated and orthogonal latent variables; the limited possibilities to make natural interpretations of the latent variables extracted.  相似文献   

7.
An evaluation of computational performance and precision regarding the cross‐validation error of five partial least squares (PLS) algorithms (NIPALS, modified NIPALS, Kernel, SIMPLS and bidiagonal PLS), available and widely used in the literature, is presented. When dealing with large data sets, computational time is an important issue, mainly in cross‐validation and variable selection. In the present paper, the PLS algorithms are compared in terms of the run time and the relative error in the precision obtained when performing leave‐one‐out cross‐validation using simulated and real data sets. The simulated data sets were investigated through factorial and Latin square experimental designs. The evaluations were based on the number of rows, the number of columns and the number of latent variables. With respect to their performance, the results for both simulated and real data sets have shown that the differences in run time are statistically different. PLS bidiagonal is the fastest algorithm, followed by Kernel and SIMPLS. Regarding cross‐validation error, all algorithms showed similar results. However, in some situations as, for example, when many latent variables were in question, discrepancies were observed, especially with respect to SIMPLS. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

8.
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that,in random test(Monte Carlo) cross-validation,the probability of outliers presenting in good models with smaller prediction residual error sum of squares(PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first,then the models are sorted by the PRESS,and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method,four data sets,including three published data sets and a large data set of tobacco lamina,were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out(LOO) cross validation method.  相似文献   

9.
Near-infrared spectroscopy (NIR) models built on a particular instrument are often invalid on other instruments due to spectral inconsistencies between the instruments. In the present work, global and robust NIR calibration models were constructed by partial least square (PLS) regression based on hybrid calibration sets, which are composed of both primary and secondary spectra. Three datasets were used as case studies. The first consisted of 72 radix scutellaria samples measured on two NIR spectrometers with known baicalin content. The second was composed of 80 corn samples measured on two instruments with known moisture, oil, and protein concentrations. The third dataset included 279 primary samples of tobacco with known nicotine content and 78 secondary samples of tobacco with known nicotine concentrations. The effect of the number of secondary spectra in the hybrid calibration sets and the methods for selecting secondary spectra on the PLS model performance were investigated by comparing the results obtained from different calibration sets. This study shows that the global and robust calibration models accurately predicted both primary and secondary samples as long as the ratios of the number of primary spectra to the number of secondary spectra were less than 22. The models performance was not influenced by the selection method of the secondary spectra. The hybrid calibration sets included the primary spectral information and also the secondary spectra; information, rendering the constructed global and robust models applicable to both primary and secondary instruments.  相似文献   

10.
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent‐variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post‐processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent‐variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X‐tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three‐component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

11.
Partial least squares (PLS) is a widely used algorithm in the field of chemometrics. In calibration studies, a PLS variant called orthogonal projection to latent structures (O‐PLS) has been shown to successfully reduce the number of model components while maintaining good prediction accuracy, although no theoretical analysis exists demonstrating its applicability in this context. Using a discrete formulation of the linear mixture model known as Beer's law, we explicitly analyze O‐PLS solution properties for calibration data. We find that, in the absence of noise and for large n, O‐PLS solutions are simpler but just as accurate as PLS solutions for systems in which analyte and background concentrations are uncorrelated. However, the same is not true for the most general chemometric data in which correlations between the analyte and background concentrations are nonzero and pure profiles overlap. On the contrary, forcing the removal of orthogonal components may actually degrade interpretability of the model. This situation can also arise when the data are noisy and n is small, because O‐PLS may identify and model the noise as orthogonal when it is statistically uncorrelated with the analytes. For the types of data arising from systems biology studies, in which the number of response variables may be much greater than the number of observations, we show that O‐PLS is unlikely to discover orthogonal variation whether or not it exists. In this case, O‐PLS and PLS solutions are the same. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

12.
The calibration performance of partial least squares for one response variable (PLS1) can be improved by elimination of uninformative variables. Many methods are based on so-called predictive variable properties, which are functions of various PLS-model parameters, and which may change during the variable reduction process. In these methods variable reduction is made on the variables ranked in descending order for a given variable property. The methods start with full spectrum modelling. Iteratively, until a specified number of remaining variables is reached, the variable with the smallest property value is eliminated; a new PLS model is calculated, followed by a renewed ranking of the variables. The Stepwise Variable Reduction methods using Predictive-Property-Ranked Variables are denoted as SVR-PPRV. In the existing SVR-PPRV methods the PLS model complexity is kept constant during the variable reduction process. In this study, three new SVR-PPRV methods are proposed, in which a possibility for decreasing the PLS model complexity during the variable reduction process is build in. Therefore we denote our methods as PPRVR-CAM methods (Predictive-Property-Ranked Variable Reduction with Complexity Adapted Models). The selective and predictive abilities of the new methods are investigated and tested, using the absolute PLS regression coefficients as predictive property. They were compared with two modifications of existing SVR-PPRV methods (with constant PLS model complexity) and with two reference methods: uninformative variable elimination followed by either a genetic algorithm for PLS (UVE-GA-PLS) or an interval PLS (UVE-iPLS). The performance of the methods is investigated in conjunction with two data sets from near-infrared sources (NIR) and one simulated set. The selective and predictive performances of the variable reduction methods are compared statistically using the Wilcoxon signed rank test. The three newly developed PPRVR-CAM methods were able to retain significantly smaller numbers of informative variables than the existing SVR-PPRV, UVE-GA-PLS and UVE-iPLS methods without loss of prediction ability. Contrary to UVE-GA-PLS and UVE-iPLS, there is no variability in the number of retained variables in each PPRV(R) method. Renewed variable ranking, after deletion of a variable, followed by remodelling, combined with the possibility to decrease the PLS model complexity, is beneficial. A preferred PPRVR-CAM method is proposed.  相似文献   

13.
《Analytical letters》2012,45(1):171-183
Based on wavelet transformation (WT) and mutual information (MI), a simple and effective procedure is proposed for multivariate calibration of near-infrared spectroscopy. In such a procedure, the original spectra of the training set are first transformed into a set of wavelet representations by wavelet prism transform. Then, the MI value between each wavelet coefficient variable and the dependent variable is calculated, resulting in a MI spectrum; by retaining a subset set of coefficients with higher MI, an update training set consisting of wavelet coefficients is obtained and reconstructed/converted back to the original domain. Based on this, a partial least square (PLS) model can be constructed and optimized. The optimal wavelet and decomposition level are determined by experiment. A NIR quantitative problem involving the determination of total sugar in tobacco is used to demonstrate the overall performance of the proposed procedure, named RPLS, meaning PLS in reconstructed original domain coupled with MI-induced variable selection in wavelet domain (RPLS). Three kinds of procedures, that is, conventional full-spectrum PLS in original domain (FPLS), PLS in original domain coupled with MI-induced variable selection (OPLS), and direct PLS in MI-based wavelet coefficients (WPLS), are used as reference. The result confirms that it can build more accurate and robust calibration models without increasing the complexity.  相似文献   

14.
The Partial least squares class model (PLSCM) was recently proposed for multivariate quality control based on a partial least squares (PLS) regression procedure. This paper presents a case study of quality control of peanut oils based on mid‐infrared (MIR) spectroscopy and class models, focusing mainly on the following aspects: (i) to explain the meanings of PLSCM components and make comparisons between PLSCM and soft independent modeling of class analogy (SIMCA); (ii) to correct the estimation of the original PLSCM confidence interval by considering a nonzero intercept term for center estimation; (iii) to investigate the potential of MIR spectroscopy combined with class models for identifying peanut oils with low doping concentrations of other edible oils. It is demonstrated that PLSCM is actually different from the ordinary PLS procedure, but it estimates the class center and class dispersion in the framework of a latent variable projection model. While SIMCA projects the original variables onto a few dimensions explaining most of the data variances, PLSCM components consider simultaneously the explained variances and the compactness of samples belonging to the same class. The analysis results indicate PLSCM is an intuitive and easy‐to‐use tool to tackle one‐class problems and has comparable performance with SIMCA. The advantages of PLSCM might be attributed to the great success and well‐established foundations of PLS. For PLSCM, the optimization of model complexity and estimation of decision region can be performed as in multivariate calibration routines. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

15.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

16.
17.
18.
μ-芋螺毒素及其类似物的定量构效关系研究   总被引:1,自引:0,他引:1  
μ-芋螺毒素是肌肉型钠离子通道的专一性阻断剂,本文主要采用PLS(PartialLeastSquare)多元数学分析方法对μ-芋螺毒素及其17个类似物进行了定量构效关系研究,建立了QSAR模型,其模型的交叉验证值R2=0.813,Y实验值与Y预测值的相关系数0.903.计算结果表明,对分子活性影响比较大的是13位精氨酸残基和分子中的电荷变化,增加分子的正电荷,将提高分子的活性,其次是19,2,12,9,和17位氨基酸残基.  相似文献   

19.
Extension of standard regression to the case of multiple regressor arrays is given via the Kronecker product. The method is illustrated using ordinary least squares regression (OLS) as well as the latent variable (LV) methods principal component regression (PCR) and partial least squares regression (PLS). Denoting the method applied to PLS as mrPLS, the latter was shown to explain as much or more variance for the first LV relative to the comparable L‐partial least squares regression (L‐PLS) model. The same relationship holds when mrPLS is compared to PLS or n‐way partial least squares (N‐PLS) and the response array is 2‐way or 3‐way, respectively, where the regressor array corresponding to the first mode of the response array is 2‐way and the second mode regressor array is an identity matrix. In a comparison with N‐PLS using fragrance data, mrPLS proved superior in a validation sense when model selection was used. Though the focus is on 2‐way regressor arrays, the method can be applied to n‐way regressors via N‐PLS. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

20.
Optimized sample-weighted partial least squares   总被引:2,自引:0,他引:2  
Lu Xu 《Talanta》2007,71(2):561-566
In ordinary multivariate calibration methods, when the calibration set is determined to build the model describing the relationship between the dependent variables and the predictor variables, each sample in the calibration set makes the same contribution to the model, where the difference of representativeness between the samples is ignored. In this paper, by introducing the concept of weighted sampling into partial least squares (PLS), a new multivariate regression method, optimized sample-weighted PLS (OSWPLS) is proposed. OSWPLS differs from PLS in that it builds a new calibration set, where each sample in the original calibration set is weighted differently to account for its representativeness to improve the prediction ability of the algorithm. A recently suggested global optimization algorithm, particle swarm optimization (PSO) algorithm is used to search for the best sample weights to optimize the calibration of the original training set and the prediction of an independent validation set. The proposed method is applied to two real data sets and compared with the results of PLS, the most significant improvement is obtained for the meat data, where the root mean squared error of prediction (RMSEP) is reduced from 3.03 to 2.35. For the fuel data, OSWPLS can also perform slightly better or no worse than PLS for the prediction of the four analytes. The stability and efficiency of OSWPLS is also studied, the results demonstrate that the proposed method can obtain desirable results within moderate PSO cycles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号