首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A new procedure with high ability to enhance prediction of multivariate calibration models with a small number of interpretable variables is presented. The core of this methodology is to sort the variables from an informative vector, followed by a systematic investigation of PLS regression models with the aim of finding the most relevant set of variables by comparing the cross‐validation parameters of the models obtained. In this work, seven main informative vectors i.e. regression vector, correlation vector, residual vector, variable influence on projection (VIP), net analyte signal (NAS), covariance procedures vector (CovProc), signal‐to‐noise ratios vector (StN) and their combinations were automated and tested with the main purpose of feature selection. Six data sets from different sources were employed to validate this methodology. They originated from: near‐Infrared (NIR) spectroscopy, Raman spectroscopy, gas chromatography (GC), fluorescence spectroscopy, quantitative structure‐activity relationships (QSAR) and computer simulation. The results indicate that all vectors and their combinations were able to enhance prediction capability with respect to the full data sets. However, regression and NAS informative vectors from partial least squares (PLS) regression, both built using more latent variables than when building the model presented in most of tested data sets, were the best informative vectors for variable selection. In all the applications, the selected variables were quite effective and useful for interpretation. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

2.
Ideally, the score vectors numerically computed by an orthogonal scores partial least squares (PLS) algorithm should be orthogonal close to machine precision. However, this is not ensured without taking special precautions. The progressive loss of orthogonality with increasing number of components is illustrated for two widely used PLS algorithms, i.e., one that can be considered as a standard PLS algorithm, and SIMPLS. It is shown that the original standard PLS algorithm outperforms the original SIMPLS in terms of numerical stability. However, SIMPLS is confirmed to perform much better in terms of speed. We have investigated reorthogonalization as the special precaution to ensure orthogonality close to machine precision. Since the increase of computing time is relatively small for SIMPLS, we therefore recommend SIMPLS with reorthogonalization. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

3.
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single‐response case. The present paper presents an approach to identification of the complete y ‐orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non‐deflating OPLS algorithms and a modified principal component analysis (PCA)‐driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the non‐orthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a ‘built‐in’ property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non‐existing. The PCA‐driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix X orth with y ‐orthogonal columns and a rank of one less than the rank of X . The desired truncation corresponds exactly to the first X deflation step of Martens non‐orthogonal PLS algorithm. The significant y ‐orthogonal structure of X found by PCA of X orth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non‐orthogonal algorithm into an efficient dual PLS1–OPLS algorithm. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

4.
The insight from, and conclusions of this paper motivate efficient and numerically robust ‘new’ variants of algorithms for solving the single response partial least squares regression (PLS1) problem. Prototype MATLAB code for these variants are included in the Appendix. The analysis of and conclusions regarding PLS1 modelling are based on a rich and nontrivial application of numerous key concepts from elementary linear algebra. The investigation starts with a simple analysis of the nonlinear iterative partial least squares (NIPALS) PLS1 algorithm variant computing orthonormal scores and weights. A rigorous interpretation of the squared P ‐loadings as the variable‐wise explained sum of squares is presented. We show that the orthonormal row‐subspace basis of W ‐weights can be found from a recurrence equation. Consequently, the NIPALS deflation steps of the centered predictor matrix can be replaced by a corresponding sequence of Gram–Schmidt steps that compute the orthonormal column‐subspace basis of T ‐scores from the associated non‐orthogonal scores. The transitions between the non‐orthogonal and orthonormal scores and weights (illustrated by an easy‐to‐grasp commutative diagram), respectively, are both given by QR factorizations of the non‐orthogonal matrices. The properties of singular value decomposition combined with the mappings between the alternative representations of the PLS1 ‘truncated’ X data (including P t W ) are taken to justify an invariance principle to distinguish between the PLS1 truncation alternatives. The fundamental orthogonal truncation of PLS1 is illustrated by a Lanczos bidiagonalization type of algorithm where the predictor matrix deflation is required to be different from the standard NIPALS deflation. A mathematical argument concluding the PLS1 inconsistency debate (published in 2009 in this journal) is also presented. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

5.
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent‐variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post‐processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent‐variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X‐tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three‐component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

6.
With projection based calibration approaches, such as partial least squares (PLS) and principal component regression (PCR), the calibration space is spanned by respective basis vectors (latent vectors). Up to rank k basis vectors are formed where k ≤ min(m,n) with m and n denoting the number of calibration samples and measured variables. The user needs to decide how many and which respective basis vectors (tuning parameters). To avoid the second issue, basis vectors are selected top‐down starting with the first and sequentially adding until model criteria are satisfied. Ridge regression (RR) avoids the issues by using the full set of basis vectors. Another approach is to select a subset from the total available. The presented work develops a process based on the L1 vector norm to select basis vectors. Specifically, the L1 norm is used to select singular value decomposition (SVD) basis set vectors for PCR (LPCR). Because PCR, PLS, RR, and others can be expressed as linear combination of the SVD basis vectors, the focus is on selection and comparison using the SVD basis set. Results based on respective tuning parameter selections and weights applied to the SVD basis vectors for LPCR, top‐down PCR, correlation PCR (CPCR), PLS, and RR are compared for calibration and calibration updating using spectroscopic data sets. The methods are found to predict equivalently. In particular, the L1 norm produces similar results to those obtained by the well‐studied CPCR process. Thus, the new method provides a different theoretical framework than CPCR for selecting basis vectors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

7.
Abstract

Coptis chinensis Franch contains berberine(1), palmatine(2), jatrorrhizine(3) and etc. And phellodendron amurense Rupr also contains above components. Qingwei-Huanglian Wan is made from Coptis chinensis Franch, phellodendron amurense Rupr and etc. Berberine, palmatine and jatrorrhizine in Qingwei-Huanglian Wan were determined by HPLC. The optimal composition of mobile phase CH3COOEt-HCOOH-EtOH (15:3:2) for HPLC separation of berberine, palmatine and jatrorrhizine was successfully determine by using window diagram technique. Detection wavelength was 345nm. Flow rate: 1.5ml/min. Calibration graphs for (1), (2) and (3) were rectilinear for 0.06–0.39μg, 0.06–0.61μg and 0.01–0.12μg respectively. The basic principle and method of partial least squares method (PLS) is presented in this paper. The data from HPLC were treated with PLS program to obtain the contents of Coptis chinensis corresponding with the requirement. All these indicate that PLS-HPLC method is new and feasible for the determination of crude drugs in Chinese Patent Medicine.

2. PLS is a new multivariate statistical method, and its performance is better than other traditional methods such as ordinary multivariate regression and principal components regression. This is because the calibration technique, which makes good use of the information in concentration matrix Y and content matrix X, is used in PLS method.

3. The method is applicable to the quanlity control of the products. The contents of IV and V in VI can be predicted accurately fast, if the method described here is used by factories making VI.  相似文献   

8.
从20种天然氨基酸的1369种性质参数经主成分分析得出一种新多肽序列表征方法——SZOTT. 将其用于71个不同长度肽序列表征, 以偏最小二乘(PLS)和支持向量机(SVM)建立定量结构-保留模型(QSRM). 研究表明, SZOTT能够较好表征71个肽序列特征, 其含信息量大且易操作, 与PLS相比, SVM对lgk建模预测表现出较强的拟合能力和良好外部预测能力, SZOTT表征方法和SVM建模可进一步用于肽HPLC保留行为研究.  相似文献   

9.
Regression from high dimensional observation vectors is particularly difficult when training data is limited. Partial least squares (PLS) partly solves the high dimensional regression problem by projecting the data to latent variables space. The key issue in PLS is the computation of weight vector which describes the covariance between the responses and observations. For small-sample-size and high-dimensional regression problem, the covariance estimation is usually inaccurate and the correlated components in the predictors will distort the PLS weight. In this paper, we propose a sparse matrix transform (SMT) based PLS (SMT-PLS) method for high-dimensional spectroscopy regression. In SMT-PLS, the observation data is first decorrelated by SMT. Then, in the decorrelated data space, the PLS loading weight is computed by least squares regression. SMT technique provides an accurate data covariance estimation, which can overcome the effect of small-sample-size and benefit both the PLS weight computation and subsequent regression prediction. The proposed SMT-PLS method is compared, in terms of root mean square errors of prediction, to PLS, Power PLS and PLS with orthogonal scatter correction on four real spectroscopic data sets. Experimental results demonstrate the efficacy and effectiveness of our proposed method.  相似文献   

10.
The authentication of virgin olive oil samples requires usually the use of sophisticated and time consuming analytical techniques. There is a need for fast and simple analytical techniques for the objective of a quality control methodology. Virgin olive oils present characteristic NIR spectra. Chemometric treatment of NIR spectra was assessed for the quantification of fatty acids and triacylglycerols in virgin olive oil samples (n=125) and for their classification (PLS1-DA) into five very geographically closed registered designations of origin (RDOs) of French virgin olive oils ("Aix-en-Provence", "Haute-Provence", "Nice", "Nyons" and "Vallée des Baux"). The spectroscopic interpretation of regression vectors showed that each RDO was correlated to one or two specific components of virgin olive oils according to their cultivar compositions. The results were quite satisfactory, in spite of the similarity of cultivar compositions between two denominations of origin ("Aix-en-Provence" and "Vallée des Baux"). Chemometric treatments of NIR spectra allow us to obtain similar results than those obtained by time consuming analytical techniques such as GC and HPLC, and constitute a help fast and robust for authentication of those French virgin olive oils.  相似文献   

11.
A modified partial least squares (PLS) algorithm is presented on the basis of a novel weight updating strategy. The new weight can handle situations with directions in X space having large variance unrelated to Y , whereas the linear PLS may not work well. In the proposed algorithm, the slice transform technique is introduced to provide a piecewise linear representation of the weight vectors. Then, the corresponding mapping functions are estimated by a least square criterion of the inner relation between the observed variables and the score of response variables. At last, weight vectors are updated by the obtained mapping functions, and the corresponding scores and loadings are calculated with the new weights. An optimal piecewise linear replacements of the PLS weights are achieved by the proposed method. The predictive performances of the new approach and other methods are compared statistically using the Wilcoxon signed rank test. Experimental results show that the proposed method can achieve simpler models, whereas the model performances are at least comparable with PLS and other methods. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

12.
This study compares the performance of partial least squares (PLS) regression analysis and artificial neural networks (ANN) for the prediction of total anthocyanin concentration in red-grape homogenates from their visible-near-infrared (Vis-NIR) spectra. The PLS prediction of anthocyanin concentrations for new-season samples from Vis-NIR spectra was characterised by regression non-linearity and prediction bias. In practice, this usually requires the inclusion of some samples from the new vintage to improve the prediction. The use of WinISI LOCAL partly alleviated these problems but still resulted in increased error at high and low extremes of the anthocyanin concentration range. Artificial neural networks regression was investigated as an alternative method to PLS, due to the inherent advantages of ANN for modelling non-linear systems. The method proposed here combines the advantages of the data reduction capabilities of PLS regression with the non-linear modelling capabilities of ANN. With the use of PLS scores as inputs for ANN regression, the model was shown to be quicker and easier to train than using raw full-spectrum data. The ANN calibration for prediction of new vintage grape data, using PLS scores as inputs, was more linear and accurate than global and LOCAL PLS models and appears to reduce the need for refreshing the calibration with new-season samples. ANN with PLS scores required fewer inputs and was less prone to overfitting than using PCA scores. A variation of the ANN method, using carefully selected spectral frequencies as inputs, resulted in prediction accuracy comparable to those using PLS scores but, as for PCA inputs, was also prone to overfitting with redundant wavelengths.  相似文献   

13.
In this study, different approaches to the multivariate calibration of the vapors of two refrigerants are reported. As the relationships between the time-resolved sensor signals and the concentrations of the analytes are nonlinear, the widely used partial least-squares regression (PLS) fails. Therefore, different methods are used, which are known to be able to deal with nonlinearities present in data. First, the Box–Cox transformation, which transforms the dependent variables nonlinearly, was applied. The second approach, the implicit nonlinear PLS regression, tries to account for nonlinearities by introducing squared terms of the independent variables to the original independent variables. The third approach, quadratic PLS (QPLS), uses a nonlinear quadratic inner relationship for the model instead of a linear relationship such as PLS. Tree algorithms are also used, which split a nonlinear problem into smaller subproblems, which are modeled using linear methods or discrete values. Finally, neural networks are applied, which are able to model any relationship. Different special implementations, like genetic algorithms with neural networks and growing neural networks, are also used to prevent an overfitting. Among the fast and simpler algorithms, QPLS shows good results. Different implementations of neural networks show excellent results. Among the different implementations, the most sophisticated and computing-intensive algorithms (growing neural networks) show the best results. Thus, the optimal method for the data set presented is a compromise between quality of calibration and complexity of the algorithm.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

14.
近红外光谱法定量分析维生素E   总被引:9,自引:0,他引:9  
在93%~97.4%浓度范围内,利用维生素E(VE)在6061~5246cm^-1处的近红外吸收峰面积积分值和其浓度关系建立的回归方程为:Y=103.43~0.078624X。用此回归方程对已知浓度的样品进行预测,误差及相对误并均在-0.79%~0.9%内。在较宽浓度范围80%~97%之间的VE,用PLS算法选择不同遥数据预处理方法,对近红外吸光度值和其含量的关系建模,度用已知浓度的VE进行校验。  相似文献   

15.
16.
A molecular structural characterization (MSC) method called reduced molecular electronegativity-distance vector (MEDVR) was used to describe the molecular structures of 55 components of meconopsis integrifolia flowers. By use of stepwise multiple regression (SMR) and partial least square (PLS) methods, a model with the correlation coefficient (R1) of 0.987 and the standard deviation (SD1) of 1.377 could be obtained. Then through multiple linear regression (MLR), another model with the correlation coefficient (R2) of 0.989 and standard deviation (SD2) of 1.395 could be constructed. Furthermore, in virtue of variable screening by the stepwise multiple regression technique (SMR), 8 vectors were selected to build up another model with its correlation coefficient (R3) and standard deviation (SD3) of 0.989 and 1.366, respectively. Then all the three models were evaluated by performing cross-validation with the leave-one-out (LOO) procedure, and the correlation coefficients (QCV) were 0.981, 0.976 and 0.979, respectively. The results show that the models constructed could provide estimation stability and favorable predictive ability.  相似文献   

17.
Different calibration techniques are available for spectroscopic applications that show nonlinear behavior. This comprehensive comparative study presents a comparison of different nonlinear calibration techniques: kernel PLS (KPLS), support vector machines (SVM), least-squares SVM (LS-SVM), relevance vector machines (RVM), Gaussian process regression (GPR), artificial neural network (ANN), and Bayesian ANN (BANN). In this comparison, partial least squares (PLS) regression is used as a linear benchmark, while the relationship of the methods is considered in terms of traditional calibration by ridge regression (RR). The performance of the different methods is demonstrated by their practical applications using three real-life near infrared (NIR) data sets. Different aspects of the various approaches including computational time, model interpretability, potential over-fitting using the non-linear models on linear problems, robustness to small or medium sample sets, and robustness to pre-processing, are discussed. The results suggest that GPR and BANN are powerful and promising methods for handling linear as well as nonlinear systems, even when the data sets are moderately small. The LS-SVM is also attractive due to its good predictive performance for both linear and nonlinear calibrations.  相似文献   

18.
Balabin RM  Smirnov SV 《The Analyst》2012,137(7):1604-1610
Modern analytical chemistry of industrial products is in need of rapid, robust, and cheap analytical methods to continuously monitor product quality parameters. For this reason, spectroscopic methods are often used to control the quality of industrial products in an on-line/in-line regime. Vibrational spectroscopy, including mid-infrared (MIR), Raman, and near-infrared (NIR), is one of the best ways to obtain information about the chemical structures and the quality coefficients of multicomponent mixtures. Together with chemometric algorithms and multivariate data analysis (MDA) methods, which were especially created for the analysis of complicated, noisy, and overlapping signals, NIR spectroscopy shows great results in terms of its accuracy, including classical prediction error, RMSEP. However, it is unclear whether the combined NIR + MDA methods are capable of dealing with much more complex interpolation or extrapolation problems that are inevitably present in real-world applications. In the current study, we try to make a rather general comparison of linear, such as partial least squares or projection to latent structures (PLS); "quasi-nonlinear", such as the polynomial version of PLS (Poly-PLS); and intrinsically non-linear, such as artificial neural networks (ANNs), support vector regression (SVR), and least-squares support vector machines (LS-SVM/LSSVM), regression methods in terms of their robustness. As a measure of robustness, we will try to estimate their accuracy when solving interpolation and extrapolation problems. Petroleum and biofuel (biodiesel) systems were chosen as representative examples of real-world samples. Six very different chemical systems that differed in complexity, composition, structure, and properties were studied; these systems were gasoline, ethanol-gasoline biofuel, diesel fuel, aromatic solutions of petroleum macromolecules, petroleum resins in benzene, and biodiesel. Eighteen different sample sets were used in total. General conclusions are made about the applicability of ANN- and SVM-based regression tools in the modern analytical chemistry. The effectiveness of different multivariate algorithms is different when going from classical accuracy to robustness. Neural networks, which are capable of producing very accurate results with respect to classical RMSEP, are not able to solve interpolation problems or, especially, extrapolation problems. The chemometric methods that are based on the support vector machine (SVM) ideology are capable of solving both classical regression and interpolation/extrapolation tasks.  相似文献   

19.
Carbamazepine is a poorly soluble drug, with known bioavailability problems related to its polymorphism, and a form (C-monoclinic or form IV) less soluble than the pharmaceutically acceptable (P-monoclinic or form III) can be formed under various conditions, possible to occur during drug formulation. Therefore, quantitative analysis of form IV in form III is important to the drug formulators. In the present study, a fast and simple non-destructive method was developed for quantification of form IV in form III, by using DRIFTS spectral data subjected to the standard normal variate transformation (row centering and scaling) and to the lazy learning algorithm. Fast principal component (fast PCR) and partial least squares (PLS) regression methods of multivariate calibration were also used, which were compared with lazy learning. The lazy learning algorithm was performing better than the fast PCR and PLS methods (root mean squared error of cross-validation 1.318% versus 3.337 and 3.058%, respectively). Even with a small number of calibration samples it gave satisfactory predictive performance (root mean squared error of prediction <2.0% versus >3.3% of fast PCR and >2.6% of PLS), in the concentration range below 30% (w/w) of form IV. This is attributed to the capability of handling non-linearity in the relation of reflectance and concentration as well as to local modeling using a pre-selected number of nearest neighbor concentrations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号