首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.  相似文献   

2.
Two alternative partial least squares (PLS) methods, averaged PLS and weighted average PLS, are proposed and compared with the classical PLS in terms of root mean square error of prediction (RMSEP) for three real data sets. These methods compute the (weighted) average of PLS models with different complexity. The prediction abilities of the alternative methods are comparable to that of the classical PLS but they do not require to determine how many components should be included in the model. They are also more robust in the sense that the quality of prediction depends less on a good choice of the number of components to be included. In addition, weighted average PLS is also compared with the weighted average part of LOCAL, a published method that also applies weighted average PLS, with however an entirely different weighting scheme.  相似文献   

3.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

4.
Target projection (TP) also called target rotation (TR) was introduced to facilitate interpretation of latent‐variable regression models. Orthogonal partial least squares (OPLS) regression and PLS post‐processing by similarity transform (PLS + ST) represent two alternative algorithms for the same purpose. In addition, OPLS and PLS + ST provide components to explain systematic variation in X orthogonal to the response. We show, that for the same number of components, OPLS and PLS + ST provide score and loading vectors for the predictive latent variable that are the same as for TP except for a scaling factor. Furthermore, we show how the TP approach can be extended to become a hybrid of latent‐variable (LV) regression and exploratory LV analysis and thus embrace systematic variation in X unrelated to the response. Principal component analysis (PCA) of the residual variation after removal of the target component is here used to extract the orthogonal components, but X‐tended TP (XTP) permits other criteria for decomposition of the residual variation. If PCA is used for decomposing the orthogonal variation in XTP, the variance of the major orthogonal components obtained for OPLS and XTP is observed to be almost the same, showing the close relationship between the methods. The XTP approach is tested and compared with OPLS for a three‐component mixture analyzed by infrared spectroscopy and a multicomponent mixture measured by near infrared spectroscopy in a reactor. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
The partial least squares (PLS-1) calibration model based on spectrophotometric measurement, for the simultaneous determination of CN and SCN ions is described. The method is based on the difference in the rate of the reaction between CN and SCN ions with chloramine-T in a pH 4.0 buffer solution and at 30 °C. The produced cyanogen chloride (CNCl) reacts with pyridine and the product condenses with barbituric acid and forms a final colored product. The absorption kinetic profiles of the solutions were monitored by measuring absorbance at 578 nm in the time range 20-180 s after initiation of the reaction with 2 s intervals. The experimental calibration matrix for partial least squares (PLS-1) calibration was designed with 31 samples. The cross-validation method was used for selecting the number of factors. The results showed that simultaneous determination could be performed in the range 10.0-900.0 and 50.0-1200.0 ng mL−1 for CN and SCN ions, respectively. The proposed method was successfully applied to the simultaneous determination of cyanide and thiocyanate in water samples.  相似文献   

6.
Two novel algorithms which employ the idea of stacked generalization or stacked regression, stacked partial least squares (SPLS) and stacked moving‐window partial least squares (SMWPLS) are reported in the present paper. The new algorithms establish parallel, conventional PLS models based on all intervals of a set of spectra to take advantage of the information from the whole spectrum by incorporating parallel models in a way to emphasize intervals highly related to the target property. It is theoretically and experimentally illustrated that the predictive ability of these two stacked methods combining all subsets or intervals of the whole spectrum is never poorer than that of a PLS model based only on the best interval. These two stacking algorithms generate more parsimonious regression models with better predictive power than conventional PLS, and perform best when the spectral information is neither isolated to a single, small region, nor spread uniformly over the response. A simulation data set is employed in this work not only to demonstrate this improvement, but also to demonstrate that stacked regressions have the potential capability of predicting property information from an outlier spectrum in the prediction set. Moisture, oil, protein and starch in Cargill corn samples have been successfully predicted by these new algorithms, as well as hydroxyl number for different instruments of terpolymer samples including and excluding an outlier spectrum. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

7.
Optimized sample-weighted partial least squares   总被引:2,自引:0,他引:2  
Lu Xu 《Talanta》2007,71(2):561-566
In ordinary multivariate calibration methods, when the calibration set is determined to build the model describing the relationship between the dependent variables and the predictor variables, each sample in the calibration set makes the same contribution to the model, where the difference of representativeness between the samples is ignored. In this paper, by introducing the concept of weighted sampling into partial least squares (PLS), a new multivariate regression method, optimized sample-weighted PLS (OSWPLS) is proposed. OSWPLS differs from PLS in that it builds a new calibration set, where each sample in the original calibration set is weighted differently to account for its representativeness to improve the prediction ability of the algorithm. A recently suggested global optimization algorithm, particle swarm optimization (PSO) algorithm is used to search for the best sample weights to optimize the calibration of the original training set and the prediction of an independent validation set. The proposed method is applied to two real data sets and compared with the results of PLS, the most significant improvement is obtained for the meat data, where the root mean squared error of prediction (RMSEP) is reduced from 3.03 to 2.35. For the fuel data, OSWPLS can also perform slightly better or no worse than PLS for the prediction of the four analytes. The stability and efficiency of OSWPLS is also studied, the results demonstrate that the proposed method can obtain desirable results within moderate PSO cycles.  相似文献   

8.
Fluorescence spectrum, as well as the first and second derivative spectra in the region of 220–900 nm, was utilized to determine the concentration of triglyceride in human serum. Nonlinear partial least squares regression with cubic B‐spline‐function‐based nonlinear transformation was employed as the chemometric method. Window genetic algorithms partial least squares (WGAPLS) was proposed as a new wavelength selection method to find the optimized spectra wavelengths combination. Study shows that when WGAPLS is applied within the optimized regions ascertained by changeable size moving window partial least squares (CSMWPLS) or searching combination moving window partial least squares (SCMWPLS), the calibration and prediction performance of the model can be further improved at a reasonable latent variable number. SCMWPLS should start from the sub‐region found by CSMWPLS with the smallest root mean squares error of calibration (RMSEC). In addition, WGAPLS should be utilized within the region of smallest RMSEC whether it is the sub‐region found by CSMWPLS or region combination found by SCMWPLS. Moreover, the prediction ability of nonlinear models was better than the linear models significantly. The prediction performance of the three spectra was in the following order: second derivative spectrum < original spectrum < first derivative spectrum. Wavelengths within the region of 300–367 nm and 386–392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
In chemistry and many other scientific disciplines, non‐negativity‐constrained estimation of models is of practical importance. The time required for estimating true least squares non‐negativity‐constrained models is typically many times longer than that for estimating unconstrained models. That is why it is necessary to find faster and faster non‐negative least squares (NNLS) algorithms. Very recently, the distance algorithm has been developed, and this algorithm can be adapted to solve NNLS regression task faster (in some cases) than the conventional algorithms. Based on some simulated investigation, DA_NNLS was the fastest for small‐sized and medium‐sized linear regression tasks. The visualization (geometry) of the NNLS task being solved by our new algorithm is discussed as well. Besides linear algebra, convex geometrical concepts and tools are suggested to investigate, to use, and to develop in chemometrics for exploiting the geometry of chemometry. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
Pure component selectivity analysis (PCSA) was successfully utilized to enhance the robustness of a partial least squares (PLS) model by examining the selectivity of a given component to other components. The samples used in this study were composed of NH4OH, H2O2 and H2O, a popular etchant solution in the electronic industry. Corresponding near-infrared (NIR) spectra (9000-7500 cm−1) were used to build PLS models. The selective determination of H2O2 without influences from NH4OH and H2O was a key issue since its molecular structure is similar to that of H2O and NH4OH also has a hydroxyl functional group. The best spectral ranges for the determination of NH4OH and H2O2 were found with the use of moving window PLS (MW-PLS) and corresponding selectivity was examined by pure component selectivity analysis. The PLS calibration for NH4OH was free from interferences from the other components due to the presence of its unique NH absorption bands. Since the spectral variation from H2O2 was broadly overlapping and much less distinct than that from NH4OH, the selectivity and prediction performance for the H2O2 calibration were sensitively varied depending on the spectral ranges and number of factors used. PCSA, based on the comparison between regression vectors from PLS and the net analyte signal (NAS), was an effective method to prevent over-fitting of the H2O2 calibration. A robust H2O2 calibration model with minimal interferences from other components was developed. PCSA should be included as a standard method in PLS calibrations where prediction error only is the usual measure of performance.  相似文献   

11.
Near‐infrared spectroscopy has been used in nutritional metabolomics fingerprinting for the assessment of the intake of intervention breakfasts prepared with four different vegetable oils that were previously subjected to a deep frying process of 20 cycles for 5 min at 180°C. The target oils were an extra virgin olive oil and three varieties of refined sunflower oil. Of the three latter, one of them was used as such, other was spiked with a synthetic oxidation inhibitor (dimethylsiloxane) and, finally, the last one was enriched with an extract of phenolic compounds from olive pomace, the antioxidant properties of which are well known. Urine sampled from individuals before intake and 2 and 4 h after intake was directly analyzed by NIRS to obtain fingerprint characteristics of the metabolome composition. The resulting urinary patterns were combined for statistical analysis by unsupervised and supervised approaches. Partial least squares‐class modeling enabled to develop class‐models for each intervention breakfast, thus achieving discrimination of urinary fingerprints from individuals after breakfast intake. The models were statistically characterized by estimation of sensitivity and specificity parameters for the training and evaluation (validation) steps. The application of variable importance in projection algorithm enabled to detect the spectral regions with higher significance to explain the variability observed in the partial least squares class‐models. Quantitative differences of variable importance in projection scores discriminated among the different classes under study. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

12.
Several approaches of investigation of the relationships between two datasets where the individuals are structured into groups are discussed. These strategies fit within the framework of partial least squares (PLS) regression. Each strategy of analysis is introduced on the basis of a maximization criterion, which involves the covariances between components associated with the groups of individuals in each dataset. Thereafter, algorithms are proposed to solve these maximization problems. The strategies of analysis can be considered as extensions of multi‐group principal components analysis to the context of PLS regression. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

13.
Multi-way partial least squares modeling of water quality data   总被引:1,自引:0,他引:1  
A 10 years surface water quality data set pertaining to a polluted river was analyzed using partial least squares (PLS) regression models. Both the unfold-PLS and N-PLS (tri-PLS and quadri-PLS) models were calibrated through leave-one out cross-validation method. These were applied to the multivariate, multi-way data array with a view to assess and compare their predictive capabilities for biochemical oxygen demand (BOD) of river water in terms of their relative mean squares error of cross-validation, prediction and variance captured. The sum of squares of residuals and leverages were computed and analyzed to identify the sites, variables, years and months which may have influence on the constructed model. Both the tri- and quadri-PLS models yielded relatively low validation error as compared to unfold-PLS and captured high variance in model. Moreover, both of these methods produced acceptable model precision and accuracy. In case of tri-PLS the root mean squares errors were 1.65 and 2.17 for calibration and prediction, respectively; whereas these were 2.58 and 1.09 for quadri-PLS. At a preliminary level it seems that BOD can be predicted but a different data arrangement is needed. Moreover, analysis of the scores and loadings plots of the N-PLS models could provide information on time evolution of the river water quality.  相似文献   

14.
Partial least squares regression (PLS) is proposed for solving ir pollution source apportionment problems as an alternative method to the frequently used chemical mass balance technique. A discriminant PLS is used to calculate linear mixing proportions for a synthetic ambient aerosol data set where the truth is known. Without sacrificing orthogonality of the source profiles, PLS can resolve the emission sources and accurately predict the emission source contributions. Further extensions of the PLS approach to environmental receptor modelling are discussed.  相似文献   

15.
16.
We propose a new data compression method for estimating optimal latent variables in multi‐variate classification and regression problems where more than one response variable is available. The latent variables are found according to a common innovative principle combining PLS methodology and canonical correlation analysis (CCA). The suggested method is able to extract predictive information for the latent variables more effectively than ordinary PLS approaches. Only simple modifications of existing PLS and PPLS algorithms are required to adopt the proposed method. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
The number of latent variables (LVs) or the factor number is a key parameter in PLS modeling to obtain a correct prediction. Although lots of work have been done on this issue, it is still a difficult task to determine a suitable LV number in practical uses. A method named independent factor diagnostics (IFD) is proposed for investigation of the contribution of each LV to the predicted results on the basis of discussion about the determination of LV number in PLS modeling for near infrared (NIR) spectra of complex samples. The NIR spectra of three data sets of complex samples, including a public data set and two tobacco lamina ones, are investigated. It is shown that several high order LVs constitute main contributions to the predicted results, albeit the contribution of the low order LVs should not be neglected in the PLS models. Therefore, in practical uses of PLS for analysis of complex samples, it may be better to use a slightly large LV number for NIR spectral analysis of complex samples. Supported by the National Natural Science Foundation of China (Grant Nos. 20775036 & 20835002)  相似文献   

18.
Pérez NF  Boqué R  Ferré J 《Talanta》2010,83(2):475-481
A novel method for establishing multivariate specifications of food commodities is proposed. The specifications are established for discriminant partial least squares (DPLS) by setting limits on the predictions of the DPLS model together with Hotelling T2 and square error of prediction (SPE). These limits can be tuned depending on whether type I error (i.e. a correct sample is declared out-of-specification) or type II error (i.e. an out-of-specification sample is declared within specifications) need to be minimized. The methodology is illustrated with a set of NIR spectra of Italian olive oils, corresponding to five regions and the class Liguria is the class of interest. The results demonstrate the possibility of establishing multivariate specification for olive oils from the Liguria region on the basis of spectral data obtaining type I and type II errors lower than 5%.  相似文献   

19.
A partial least squares (PLS-1) calibration model based on kinetic—spectrophotometric measurement, for the simultaneous determination of Cu(II), Ni(II) and Co(II) ions is described. The method was based on the difference in the rate of the reaction between Co(II), Ni(II) and Cu(II) ions with 1-(2-pyridylazo)2-naphthol in a pH 5.8 buffer solution and in micellar media at 25°C. The absorption kinetic profiles of the solutions were monitored by measuring the absorbance at 570 nm at 2 s intervals during the time range of 0–10 min after initiation of the reaction. The experimental calibration matrix for the partial least squares (PLS-1) model was designed with 30 samples. The cross-validation method was used for selecting the number of factors. The results showed that simultaneous determination could be performed in the range 0.1-2 μg mL−1 for each cation. The proposed method was successfully applied to the simultaneous determination of Cu(II), Ni(II) and Co(II) ions in water and in synthetic alloy samples.   相似文献   

20.
Ghasemi J  Seifi S 《Talanta》2004,63(3):751-756
An error analysis of predicted values using spectral correction matrix and partial least squares (PLS) modeling is applied for the determination of Zn2+ and Pb2+ with methylthymol blue (MTB) as a metallochromic indicator. The concentration ranges for Pb2+ and Zn2+ in standard solution sets are 0.5-5.2 and 0.1-2.5 μg ml−1, respectively. The experimental calibration set was composed of 20 sample solutions using a random design for two component mixtures. The absorption spectra were recorded from 400 to 700 nm. The two wavelengths, which exert the minimum error in prediction of two metal ion concentrations, are chosen according to an error analysis of different pairs of wavelengths. The effect of the pH on the sensitivity in determination of Zn2+ and Pb2+ using MTB was studied in order to choose the optimum pH (pH=6) for determination. The values of root mean square difference (RMSD) for lead and zinc using β-correction partial least squares were 0.0977 and 0.1266, respectively. The effect of diverse ions and several experimental parameters were studied. The method was used for the determination of lead and zinc in alloy samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号