首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Changeable size moving window partial least squares (CSMWPLS) and searching combination moving window partial least squares (SCMWPLS) are proposed to search for an optimized spectral interval and an optimized combination of spectral regions from informative regions obtained by a previously proposed spectral interval selection method, moving window partial least squares (MWPLSR) [Anal. Chem. 74 (2002) 3555]. The utilization of informative regions aims to construct better PLS models than those based on the whole spectral points. The purpose of CSMWPLS and SCMWPLS is to optimize the informative regions and their combination to further improve the prediction ability of the PLS models. The results of their application to an open-path (OP)/FT-IR spectra data set show that the proposed methods, especially SCMWPLS can find out an optimized combination, with which one can improve, often significantly, the performance of the corresponding PLS model, in terms of low prediction error, root mean square error of prediction (RMSEP) with the reasonable latent variable (LVs) number, comparing with the results obtained using whole spectra or direct combination of informative regions for a compound. Regions consisting of the combinations obtained can easily be explained by the existence of IR absorption bands in those spectral regions.  相似文献   

2.
In the current study, robust boosting partial least squares (RBPLS) regression has been proposed to model the activities of a series of 4H-1,2,4-triazoles as angiotensin II antagonists. RBPLS works by sequentially employing PLS method to the robustly reweighted versions of the training compounds, and then combing these resulting predictors through weighted median. In PLS modeling, an F-statistic has been introduced to automatically determine the number of PLS components. The results obtained by RBPLS have been compared to those by boosting partial least squares (BPLS) repression and partial least squares (PLS) regression, showing the good performance of RBPLS in improving the QSAR modeling. In addition, the interaction of angiotensin II antagonists is a complex one, including topological, spatial, thermodynamic and electronic effects.  相似文献   

3.
Kernel partial least squares (KPLS) has become a popular technique for regression and classification of complex data sets, which is a nonlinear extension of linear PLS in which training samples are transformed into a feature space via a nonlinear mapping. The PLS algorithm can then be carried out in the feature space. In the present study, we attempt to develop a novel tree KPLS (TKPLS) classification algorithm by constructing an informative kernel on the basis of decision tree ensembles. The constructed tree kernel can effectively discover the similarities of samples and select informative features by variable importance ranking in the process of building the kernel. Simultaneously, TKPLS can also handle nonlinear relationships in the structure–activity relationship data by such a kernel. Finally, three data sets related to different categorical bioactivities of compounds are used to evaluate the performance of TKPLS. The results show that the TKPLS algorithm can be regarded as an alternative and promising classification technique. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

4.
Early typical chemometrical applications in oils and fats research concernedpattern recognition problems using multivariate analysis (principal component analysis, discriminant analysis, canonical variates). Various types of fish oils can now be quickly allocated with respect to their origin. Fuzzy set theory was used in a different approach to classification applied to the allocation of yellow fat spreads into product categories using sensory attributes scored on a truth scale. Partial least-squares technique has found practical applications in problems of multivariate calibration, sensory analysis and quantitative structure-activity relationships. Also the theoretical aspects of PLS regression have been investigated, in particular the underlying optimization criterion and the relation to other multivariate techniques. Mixed integer programming has been helpful in identifying and quantifying the oil composition of unknown fat blends from their fatty acid profiles, improving upon an earlier constrained regression technique using brute force all-possible subset selection.  相似文献   

5.
In the literature, much effort has been put into modeling dependence among variables and their interactions through nonlinear transformations of predictive variables. In this paper, we propose a nonlinear generalization of Partial Least Squares (PLS) using multivariate additive splines. We discuss the advantages and drawbacks of the proposed model, building it via the generalized cross validation criterion (GCV) criterion, and show its performance on a real dataset and on simulated datasets in comparison to other methods based on splines. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

6.
The estimation of the prediction region of partial least squares (PLS) is necessary in many engineering applications. However, research in this area focuses on the estimation of prediction intervals only. In this work, a new recursive formulation of PLS is proposed to facilitate the calculation of the Jacobian matrix of the estimated coefficient matrix. Furthermore, the computational complexity analysis indicates that the proposed algorithm is O(m2N + mpN + mpN2 + mN3 + mpN4) per number of component. The prediction region of the multivariate PLS is obtained through local linearization. The new formulation provides one way to obtain the prediction region of the multivariate PLS. Simulation and near‐infrared spectra of corn case studies indicate the utility of the proposed method. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

7.
Fluorescence spectrum, as well as the first and second derivative spectra in the region of 220–900 nm, was utilized to determine the concentration of triglyceride in human serum. Nonlinear partial least squares regression with cubic B‐spline‐function‐based nonlinear transformation was employed as the chemometric method. Window genetic algorithms partial least squares (WGAPLS) was proposed as a new wavelength selection method to find the optimized spectra wavelengths combination. Study shows that when WGAPLS is applied within the optimized regions ascertained by changeable size moving window partial least squares (CSMWPLS) or searching combination moving window partial least squares (SCMWPLS), the calibration and prediction performance of the model can be further improved at a reasonable latent variable number. SCMWPLS should start from the sub‐region found by CSMWPLS with the smallest root mean squares error of calibration (RMSEC). In addition, WGAPLS should be utilized within the region of smallest RMSEC whether it is the sub‐region found by CSMWPLS or region combination found by SCMWPLS. Moreover, the prediction ability of nonlinear models was better than the linear models significantly. The prediction performance of the three spectra was in the following order: second derivative spectrum < original spectrum < first derivative spectrum. Wavelengths within the region of 300–367 nm and 386–392 nm in the first derivative of the original fluorescence spectrum were the optimized wavelength combination for the prediction model. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

8.
In this study we compared the use of ordinary least squares and weighted least squares in the calibration of the method for analyzing essential and toxic metals present in human milk by ICP-OES, in order to avoid systematic errors in the measurements used. Human milk samples were provided by maternity clinic Odete Valadares and digested by means of a high-performance microwave (MW) oven. Evaluation of plasma short and long-term stability was made using a solution of digested milk (1:50) with 2.0 mg L−1 Mg in HNO3 2% (v/v). The detection power resulted to be at or below the μg L−1 level, whilst the precision expressed as relative standard deviation R.S.D. was almost always equal to or better than 3.3%. Certified reference material Infant Formula (NIST SRM 1846) was used to assess the accuracy of the proposed method, which proved to be accurate and precise. Recovery rates were in the range of 83-117%. Aqueous calibration was carried out for each element under study.  相似文献   

9.
The integration of multiple data sources has emerged as a pivotal aspect to assess complex systems comprehensively. This new paradigm requires the ability to separate common and redundant from specific and complementary information during the joint analysis of several data blocks. However, inherent problems encountered when analysing single tables are amplified with the generation of multiblock datasets. Finding the relationships between data layers of increasing complexity constitutes therefore a challenging task. In the present work, an algorithm is proposed for the supervised analysis of multiblock data structures. It associates the advantages of interpretability from the orthogonal partial least squares (OPLS) framework and the ability of common component and specific weights analysis (CCSWA) to weight each data table individually in order to grasp its specificities and handle efficiently the different sources of Y-orthogonal variation.  相似文献   

10.
将滴定体系调节至pH 2.0,用碱标准溶液滴定至特定pH所消耗滴定荆为测量指标,构建了多组分有机酸滴定数据阵,分别以主成分回归法、偏最小二乘法以及人工神经元网络法进行多组分拟合.结果表明,偏最小二乘法的拟合结果最佳,对混合体系中乙酸、乳酸、草酸、琥珀酸、柠檬酸和乌头酸总量的相对预测均方根误差分别为5.80%、8.88%...  相似文献   

11.
Selecting the correct dimensionality is critical for obtaining partial least squares (PLS) regression models with good predictive ability. Although calibration and validation sets are best established using experimental designs, industrial laboratories cannot afford such an approach. Typically, samples are collected in an (formally) undesigned way, spread over time and their measurements are included in routine measurement processes. This makes it hard to evaluate PLS model dimensionality. In this paper, classical criteria (leave-one-out cross-validation and adjusted Wold's criterion) are compared to recently proposed alternatives (smoothed PLS-PoLiSh and a randomization test) to seek out the optimum dimensionality of PLS models. Kerosene (jet fuel) samples were measured by attenuated total reflectance-mid-IR spectrometry and their spectra where used to predict eight important properties determined using reference methods that are time-consuming and prone to analytical errors. The alternative methods were shown to give reliable dimensionality predictions when compared to external validation. By contrast, the simpler methods seemed to be largely affected by the largest changes in the modeling capabilities of the first components.  相似文献   

12.
This paper presents a modified version of the NIPALS algorithm for PLS regression with one single response variable. This version, denoted a CF‐PLS, provides significant advantages over the standard PLS. First of all, it strongly reduces the over‐fit of the regression. Secondly, R2 for the null hypothesis follows a Beta distribution only function of the number of observations, which allows the use of a probabilistic framework to test the validity of a component. Thirdly, the models generated with CF‐PLS have comparable if not better prediction ability than the models fitted with NIPALS. Finally, the scores and loadings of the CF‐PLS are directly related to the R2, which makes the model and its interpretation more reliable. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

13.
14.
A simple, sensitive and selective spectrophotometric method for simultaneous determination of tretinoin and minoxidil using partial least square (PLS) calibration and H-point standard addition method (HPSAM) is described. The results of the H-point standard addition method show that minoxidil and tretinoin can be determined simultaneously with the concentration ratio of tretinoin to minoxidil varying from 2: 1 to 1: 33 in mixed samples. A partial least squares multivariate calibration method for the analysis of binary mixtures of tretinoin and minoxidil was also developed. The total relative standard error for applying the PLS method to eleven synthetic samples in the concentration range of 0–10 μg mL−1 tretinoin and 0–32 μg mL−1 minoxidil was 2.59 %. Both proposed methods (PLS and HPSAM) were also successfully applied in the determination of tretinoin and minoxidil in several synthetic pharmaceutical solutions.  相似文献   

15.
In developing partial least squares calibration models, selecting the number of latent variables used for their construction to minimize both model bias and model variance remains a challenge. Several metrics exist for incorporating these trade‐offs, but the cost of model parsimony and the potential for underfitting on achievable prediction errors are difficult to anticipate. We propose a metric that penalizes growing model variance against decreasing bias as additional latent variables are added. The magnitude of the penalty is scaled by a user‐defined parameter that is formulated to provide a constraint on the fractional increase in root mean square error of cross‐validation (RMSECV) when selecting a parsimonious model over the conventional minimum RMSECV solution. We evaluate this approach for quantification of four organic functional groups using 238 laboratory standards and 750 complex atmospheric organic aerosol mixtures with mid‐infrared spectroscopy. Parametric variation of this penalty demonstrates that increase in prediction errors due to underfitting is bounded by the magnitude of the penalty for samples similar to laboratory standards used for model training and validation. Imposing an ensemble of penalties corresponding to a 0–30% allowable increase in RMSECV through sum of ranking differences leads to the selection of a model that increases the actual RMSECV up to 20% for laboratory standards but achieves an 85% reduction in the mean error in predicted concentrations for environmental mixtures. Partial least squares models developed with laboratory mixtures can provide useful predictions in complex environmental samples, but may benefit from protection against overfitting. © 2015 The Authors. Journal of Chemometrics published by John Wiley & Sons Ltd.  相似文献   

16.
2D gel electrophoresis is a tool for measuring protein regulation, involving image analysis by dedicated software (PDQuest, Melanie, etc.). Here, partial least squares discriminant analysis was applied to improve the results obtained by classic image analysis and to identify the significant spots responsible for the differences between two datasets. A human colon cancer HCT116 cell line was analyzed, treated and not treated with a new histone deacetylase inhibitor, RC307. The proteins regulated by RC307 were detected by analyzing the total lysates and nuclear proteome profiles. Some of the regulated spots were identified by tandem mass spectrometry. The preliminary data are encouraging and the protein modulation reported is consistent with the antitumoral effect of RC307 on the HCT116 cell line. Partial least squares discriminant analysis coupled with backward elimination variable selection allowed the identification of a larger number of spots than classic PDQuest analysis. Moreover, it allows the achievement of the best performances of the model in terms of prediction and provides therefore more robust and reliable results. From this point of view, the multivariate procedure applied can be considered a good alternative to standard differential analysis, also taking into account the interdependencies existing among the variables.  相似文献   

17.
The number of latent variables (LVs) or the factor number is a key parameter in PLS modeling to obtain a correct prediction. Although lots of work have been done on this issue, it is still a difficult task to determine a suitable LV number in practical uses. A method named independent factor diagnostics (IFD) is proposed for investigation of the contribution of each LV to the predicted results on the basis of discussion about the determination of LV number in PLS modeling for near infrared (NIR) spectra of complex samples. The NIR spectra of three data sets of complex samples, including a public data set and two tobacco lamina ones, are investigated. It is shown that several high order LVs constitute main contributions to the predicted results, albeit the contribution of the low order LVs should not be neglected in the PLS models. Therefore, in practical uses of PLS for analysis of complex samples, it may be better to use a slightly large LV number for NIR spectral analysis of complex samples. Supported by the National Natural Science Foundation of China (Grant Nos. 20775036 & 20835002)  相似文献   

18.
The Partial least squares class model (PLSCM) was recently proposed for multivariate quality control based on a partial least squares (PLS) regression procedure. This paper presents a case study of quality control of peanut oils based on mid‐infrared (MIR) spectroscopy and class models, focusing mainly on the following aspects: (i) to explain the meanings of PLSCM components and make comparisons between PLSCM and soft independent modeling of class analogy (SIMCA); (ii) to correct the estimation of the original PLSCM confidence interval by considering a nonzero intercept term for center estimation; (iii) to investigate the potential of MIR spectroscopy combined with class models for identifying peanut oils with low doping concentrations of other edible oils. It is demonstrated that PLSCM is actually different from the ordinary PLS procedure, but it estimates the class center and class dispersion in the framework of a latent variable projection model. While SIMCA projects the original variables onto a few dimensions explaining most of the data variances, PLSCM components consider simultaneously the explained variances and the compactness of samples belonging to the same class. The analysis results indicate PLSCM is an intuitive and easy‐to‐use tool to tackle one‐class problems and has comparable performance with SIMCA. The advantages of PLSCM might be attributed to the great success and well‐established foundations of PLS. For PLSCM, the optimization of model complexity and estimation of decision region can be performed as in multivariate calibration routines. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

19.
Near-infrared (NIR) spectra in the region of 5000-4000 cm−1 with a chemometric method called searching combination moving window partial least squares (SCMWPLS) were employed to determine the concentrations of human serum albumin (HSA), γ-globulin, and glucose contained in the control serum IIB (CS IIB) solutions with various concentrations. SCMWPLS is proposed to search for the optimized combinations of informative regions, which are spectral intervals, considered containing useful information for building partial least squares (PLS) models. The informative regions can easily be found by moving window partial least squares regression (MWPLSR) method. PLS calibration models using the regions obtained by SCMWPLS were developed for HSA, γ-globulin, and glucose. These models showed good prediction with the smallest root mean square error of predictions (RMSEP), the relatively small number of PLS factors, and the highest correlation coefficients among the results achieved by using whole region and MWPLSR methods. The RMSEP values of HSA, γ-globulin, and glucose yielded by SCMWPLS were 0.0303, 0.0327, and 0.0195 g/dl, respectively. These results prove that SCMWPLS can be successfully applied to determine simultaneously the concentrations of HSA, γ-globulin, and glucose in complicated biological fluids such as CS IIB solutions by using NIR spectroscopy.  相似文献   

20.
In this paper, a methodology to evaluate the probability of false non-compliance and false compliance for screening methods, which give first or second-order multivariate signals is proposed. For this task 120 samples of 6 different kinds of milk have been measured by excitation-emission fluorescence. The samples have been spiked with different amounts of three sulfonamides (sulfadiazine, sulfamerazine and sulfamethazine). These substances have been classified in group B1 (veterinary medicines and contaminants) of annex I of Directive 96/23/EC. The European Union (Commission Regulation EC no. 281/96) has set the maximum residue level (MRL) of total sulfonamides at 100 μg kg−1 in muscle, liver, kidney and milk.The work shows that excitation-emission fluorescence together with the partial least squares class modeling (PLS-CM) procedure may be a suitable and cheap screening method for the total amount of sulfonamides in milk. Three models, PLS-CM, have been built, for the emission and excitation spectra (first-order signals) and for the excitation-emission matrices (second-order signals). In all the cases it reaches probabilities of false compliance below 5% as required by Decision 2002/657/EC.With the same flourescence signals, the total quantity of sulfonamide was calibrated using 2-PLS, 3-PLS and PARAFAC regressions. Using this quantitative approach, the capability of detection, CCβ, around the MRL has been estimated between 114.3 and 115.1 μg kg−1 for a probability of false non-compliance and false compliance equal to 5%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号