首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most multivariate calibration methods require selection of tuning parameters, such as partial least squares (PLS) or the Tikhonov regularization variant ridge regression (RR). Tuning parameter values determine the direction and magnitude of respective model vectors thereby setting the resultant predication abilities of the model vectors. Simultaneously, tuning parameter values establish the corresponding bias/variance and the underlying selectivity/sensitivity tradeoffs. Selection of the final tuning parameter is often accomplished through some form of cross-validation and the resultant root mean square error of cross-validation (RMSECV) values are evaluated. However, selection of a “good” tuning parameter with this one model evaluation merit is almost impossible. Including additional model merits assists tuning parameter selection to provide better balanced models as well as allowing for a reasonable comparison between calibration methods. Using multiple merits requires decisions to be made on how to combine and weight the merits into an information criterion. An abundance of options are possible. Presented in this paper is the sum of ranking differences (SRD) to ensemble a collection of model evaluation merits varying across tuning parameters. It is shown that the SRD consensus ranking of model tuning parameters allows automatic selection of the final model, or a collection of models if so desired. Essentially, the user’s preference for the degree of balance between bias and variance ultimately decides the merits used in SRD and hence, the tuning parameter values ranked lowest by SRD for automatic selection. The SRD process is also shown to allow simultaneous comparison of different calibration methods for a particular data set in conjunction with tuning parameter selection. Because SRD evaluates consistency across multiple merits, decisions on how to combine and weight merits are avoided. To demonstrate the utility of SRD, a near infrared spectral data set and a quantitative structure activity relationship (QSAR) data set are evaluated using PLS and RR.  相似文献   

2.
With projection based calibration approaches, such as partial least squares (PLS) and principal component regression (PCR), the calibration space is spanned by respective basis vectors (latent vectors). Up to rank k basis vectors are formed where k ≤ min(m,n) with m and n denoting the number of calibration samples and measured variables. The user needs to decide how many and which respective basis vectors (tuning parameters). To avoid the second issue, basis vectors are selected top‐down starting with the first and sequentially adding until model criteria are satisfied. Ridge regression (RR) avoids the issues by using the full set of basis vectors. Another approach is to select a subset from the total available. The presented work develops a process based on the L1 vector norm to select basis vectors. Specifically, the L1 norm is used to select singular value decomposition (SVD) basis set vectors for PCR (LPCR). Because PCR, PLS, RR, and others can be expressed as linear combination of the SVD basis vectors, the focus is on selection and comparison using the SVD basis set. Results based on respective tuning parameter selections and weights applied to the SVD basis vectors for LPCR, top‐down PCR, correlation PCR (CPCR), PLS, and RR are compared for calibration and calibration updating using spectroscopic data sets. The methods are found to predict equivalently. In particular, the L1 norm produces similar results to those obtained by the well‐studied CPCR process. Thus, the new method provides a different theoretical framework than CPCR for selecting basis vectors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
This paper describes the theoretical background, algorithm and validation of a recently developed novel method of ranking based on the sum of ranking differences [TrAC Trends Anal. Chem. 2010; 29 : 101–109]. The ranking is intended to compare models, methods, analytical techniques, panel members, etc. and it is entirely general. First, the objects to be ranked are arranged in the rows and the variables (for example model results) in the columns of an input matrix. Then, the results of each model for each object are ranked in the order of increasing magnitude. The difference between the rank of the model results and the rank of the known, reference or standard results is then computed. (If the golden standard ranking is known the rank differences can be completed easily.) In the end, the absolute values of the differences are summed together for all models to be compared. The sum of ranking differences (SRD) arranges the models in a unique and unambiguous way. The closer the SRD value to zero (i.e. the closer the ranking to the golden standard), the better is the model. The proximity of SRD values shows similarity of the models, whereas large variation will imply dissimilarity. Generally, the average can be accepted as the golden standard in the absence of known or reference results, even if bias is also present in the model results in addition to random error. Validation of the SRD method can be carried out by using simulated random numbers for comparison (permutation test). A recursive algorithm calculates the discrete distribution for a small number of objects (n < 14), whereas the normal distribution is used as a reasonable approximation if the number of objects is large. The theoretical distribution is visualized for random numbers and can be used to identify SRD values for models that are far from being random. The ranking and validation procedures are called Sum of Ranking differences (SRD) and Comparison of Ranks by Random Numbers (CRNN), respectively. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
The particle size distribution of a solid product can be crucial parameter considering its application to different kinds of processes. The influence of particle size on near infrared (NIR) spectra has been used to develop effective alternative methods to traditional ones in order to determine this parameter. In this work, we used the chemometrical techniques partial least squares 2 (PLS2) and artificial neural networks (ANNs) to simultaneously predict several variables to the rapid construction of particle size distribution curves. The PLS2 algorithm relies on linear relations between variables, while the ANN technique can model non-linear systems.Samples were passed through sieves of different sieve opening in order to separate several size fractions that were used to construct two types of particle size distribution curves. The samples were recorded by NIR and their spectra were used with PLS2 and ANN to develop two calibration models for each. The correlation coefficients and relative standard errors of prediction (RSEP) have been used to assess the goodness of fit and accuracy of the results.The four calibration models studied provided statistically identical results based on RSEP values. Therefore, the combined use of NIR spectroscopy and PLS2 or ANN calibration models allows determining the particle size distributions accurately. The results obtained by ANN or PLS2 are statistically similar.  相似文献   

5.
In this paper, the guidelines for the interpretation of the results of quantitative structure-retention relationship (QSRR) modeling, comparison and assessment of the established models, as well as the selection of the best and most consistent QSRR model were presented. Various linear and non-linear chemometric regression techniques were used to build QSRR models for chromatographic lipophilicity prediction of a series of triazole, tetrazole, toluenesulfonylhydrazide, nitrile, dinitrile and dione steroid derivatives. Linear regression (LR) and multiple linear regression (MLR) were used as linear techniques, while artificial neural networks (ANNs) were applied as non-linear modeling techniques. Generated models were statistically evaluated applying different approaches for model comparison and ranking. Two non-parametric methods (generalized pair correlation method – GPCM and sum of ranking differences – SRD) were used for model ranking and assessment of the best model for chromatographic lipophilicity prediction using experimentally obtained logk values and row average as a reference ranking. Both, GPCM and SRD, provided highly similar model choice regardless on a different background. These results are in agreement with the classical approach.  相似文献   

6.
《Analytical letters》2012,45(6):1227-1251
Abstract

In order to reduce data nonlinearity and overfitting with the multivariate calibration model y=Xb, a modified Tikhonov regularization (TR) algorithm is evaluated for selecting key variables from an X augmented with extra columns that contain the original measured variables (x ij ) as squared terms (x ij 2) and other orders. The TR approach simultaneously develops the multivariate calibration model. The new generalized pair‐correlation method (GPCM) is also studied for variable selection followed by partial least squares (PLS) for multivariate calibration. Results from synthetic spectral data are compared when using the modified TR approach, GPCM, and PLS without variable selection. The GPCM usually performs slightly better than the TR approach for tabulated bias and variance measures and in some cases, at a sacrifice to parsimony. The method of PLS without variable selection performs the worst. By using synthetic spectral data sets, how the methods work could be studied. Thus, results from this study will aid investigators of real spectral data sets exhibiting nonlinear behavior.  相似文献   

7.
Han QJ  Wu HL  Cai CB  Xu L  Yu RQ 《Analytica chimica acta》2008,612(2):121-125
An improved method based on an ensemble of Monte Carlo uninformative variable elimination (EMCUVE) is presented for wavelength selection in multivariate calibration of spectral data. The proposed algorithm introduces Monte Carlo (MC) strategy to uninformative variable elimination-PLS (UVE-PLS) instead of leave-one-out strategy for estimating the contributions of each wavelength variable in the PLS model. In EMCUVE wavelength variables are evaluated by different Monte Carlo uninformative variable elimination (MCUVE) models. Moreover, a fusion of MCUVE and the vote rule can obtain an improvement over the original uninformative variable elimination method. Results obtained from simulated data and real data sets demonstrate that EMCUVE can properly carry out wavelength selection in the course of data analysis and improve predictive ability for multivariate calibration model.  相似文献   

8.
Fourier-transform mid-infrared (FT-MIR) spectroscopy, combined with partial least-squares (PLS) regression and IPW as feature selection method, was used to develop reduced-spectrum calibration models based on a few IR bands to provide near-real-time predictions of two key parameters for the characterization of finished red wines, which are essential from a quality assurance standpoint: total and volatile acidity. Separate PLS calibration models, correlating IR data (only considering those regions showing a high signal to noise ratio) with each response studied, were developed. Wavenumber selection was also performed applying IPW-PLS to take into account only significant predictors, in an attempt to improve the quality of the final models constructed. Using both PLS and IPW-PLS regression, prediction of the two responses modelled was performed with very high reliability, with RMSECV and RMSEP values on the order of 1% (comparable in terms of accuracy to the results provided by the respective reference analysis methods). An important advantage derived from the application of the IPW-PLS method had to do with the low number of original variables needed for modelling both total acidity (22 significant wavenumbers) and volatile acidity (only 11 selected predictor variables), in such a way that variable selection contributed to enhance the stability and parsimony properties of the final calibration models. The high quality of the calibration models proposed encourages the feasibility of implementing them as a fast and reliable tool in routine analysis for the determination of critical parameters for wine quality.  相似文献   

9.
Data fusion in multivariate calibration transfer   总被引:1,自引:0,他引:1  
We report the use of stacked partial least-squares regression and stacked dual-domain regression analysis with four commonly used techniques for calibration transfer to improve predictive performance from transferred multivariate calibration models. The predictive performance from three conventional calibration transfer methods, piecewise direct standardization (PDS), orthogonal signal correction (OSC) and model updating (MUP), requiring standards measured on both instruments, was significantly improved from data fusion either by stacking of wavelet scales or by stacking of spectral intervals, as demonstrated by transfer of calibrations developed on near-infrared spectra of synthetic gasoline. Stacking did not produce as significant an improvement for calibration transfer using a finite impulse response (FIR) filter, but application of SPLS regression to FIR-transferred spectra improves predictive performance of the transferred model.  相似文献   

10.
Near-infrared spectroscopy (NIRS) has been widely used in the pharmaceutical field because of its ability to provide quality information about drugs in near-real time. In practice, however, the NIRS technique requires construction of multivariate models in order to correct collinearity and the typically poor selectivity of NIR spectra. In this work, a new methodology for constructing simple NIR calibration models has been developed, based on the spectrum for the target analyte (usually the active principle ingredient, API), which is compared with that of the sample in order to calculate a correlation coefficient. To this end, calibration samples are prepared spanning an adequate concentration range for the API and their spectra are recorded. The model thus obtained by relating the correlation coefficient to the sample concentration is subjected to least-squares regression. The API concentration in validation samples is predicted by interpolating their correlation coefficients in the straight calibration line previously obtained. The proposed method affords quantitation of API in pharmaceuticals undergoing physical changes during their production process (e.g. granulates, and coated and non-coated tablets). The results obtained with the proposed methodology, based on correlation coefficients, were compared with the predictions of PLS1 calibration models, with which a different model is required for each type of sample. Error values lower than 1-2% were obtained in the analysis of three types of sample using the same model; these errors are similar to those obtained by applying three PLS models for granules, and non-coated and coated samples. Based on the outcome, our methodology is a straightforward choice for constructing calibration models affording expeditious prediction of new samples with varying physical properties. This makes it an effective alternative to multivariate calibration, which requires use of a different model for each type of sample, depending on its physical presentation.  相似文献   

11.
Attenuated total reflectance-Fourier transform infrared spectrometry, in conjunction with multivariate calibration, was used for determination of reducing sugars, humidity and acidity in honey bee samples. Multivariate calibration models were built using partial least squares (PLS) and were refined through variable selection per interval (iPLS) and genetic algorithms. The calibration models show satisfactory results for all parameters with average relative errors of 6% for acidity, 1% for reducing sugars and 2% for humidity. For the acidity and reducing sugars parameters, variable selection was irrelevant, but for humidity it was essential. For the humidity parameter, it was necessary to use two variable selection techniques (by intervals and genetic algorithm) concomitantly in order to obtain a satisfactory calibration model.  相似文献   

12.
The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).  相似文献   

13.
A method for calibration and validation subset partitioning   总被引:13,自引:0,他引:13  
This paper proposes a new method to divide a pool of samples into calibration and validation subsets for multivariate modelling. The proposed method is of value for analytical applications involving complex matrices, in which the composition variability of real samples cannot be easily reproduced by optimized experimental designs. A stepwise procedure is employed to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. The proposed technique is illustrated in a case study involving the prediction of three quality parameters (specific mass and distillation temperatures at which 10 and 90% of the sample has evaporated) of diesel by NIR spectrometry and PLS modelling. For comparison, PLS models are also constructed by full cross-validation, as well as by using the Kennard-Stone and random sampling methods for calibration and validation subset partitioning. The obtained models are compared in terms of prediction performance by employing an independent set of samples not used for calibration or validation. The results of F-tests at 95% confidence level reveal that the proposed technique may be an advantageous alternative to the other three strategies.  相似文献   

14.
In this study, the simultaneous determination of paracetamol, ibuprofen and caffeine in pharmaceuticals by chemometric approaches using UV spectrophotometry has been reported as a simple alternative to using separate models for each component. Spectra of paracetamol, ibuprofen and caffeine were recorded at several concentrations within their linear ranges and were used to compute the calibration mixture between wavelengths 200 and 400 nm at an interval of 1 nm in methanol:0.1 HCl (3:1). Partial least squares regression (PLS), genetic algorithm coupled with PLS (GA-PLS), and principal component-artificial neural network (PC-ANN) were used for chemometric analysis of data and the parameters of the chemometric procedures were optimized. The analytical performances of these chemometric methods were characterized by relative prediction errors and recoveries (%) and were compared with each other. The GA-PLS shows superiority over other applied multivariate methods due to the wavelength selection in PLS calibration using a genetic algorithm without loss of prediction capacity. Although the components show an important degree of spectral overlap, they have been determined simultaneously and rapidly requiring no separation step. These three methods were successfully applied to pharmaceutical formulation, capsule, with no interference from excipients as indicated by the recovery study results. The proposed methods are simple and rapid and can be easily used in the quality control of drugs as alternative analysis tools.  相似文献   

15.
Coscione AR  de A  Poppi RJ 《The Analyst》2002,127(1):135-139
Real samples were used for PLS model calibration and validation steps, showing that this approach can be of value in preventing deviations in the results caused by the matrix effects for the simultaneous spectrophotometric determination of aluminum and iron in plant extracts. One hundred UV-vis spectra, obtained from samples of the 1997 to 2000 International Plant-Analytical Exchange (IPE) program (The Netherlands), were used for model development, with ICP-AES aluminum and iron determinations as reference values for model calculation. The plant extracts were analyzed both by ICP-AES and by the PLS models developed in this work, using calibrations with both aqueous standard solutions and with real sample extracts. In addition, since the use of smaller calibration sets could be of value in reducing both the cost and the time of analysis, sets with fewer calibration samples were also investigated, with the help of the Kennard and Stone algorithm for sample selection. Comparison of the predictability of the best model obtained with each calibration set was made using the ratio of their relative root mean square error (%RMSEV) for samples in the validation set, for aluminum or iron determinations, and were compared against F-test tabulated values. For all the models developed with real samples, the differences in the %RMSEV values for the aluminum or iron determinations were found not to be statistically significant, at a confidence level of 95%. Although it was observed that the aluminum, but not the iron, determinations with the PLS 2 model prepared with aqueous standards tend to be slightly lower than the ICP-AES determinations, this model has a good global prediction ability, as observed through the correlation curves presented, and can be used for screening determinations or for other agricultural purposes.  相似文献   

16.
Ni Xin  Qinghua Meng  Yizhen Li  Yuzhu Hu 《中国化学》2011,29(11):2533-2540
This paper indicates the possibility to use near infrared (NIR) spectral similarity as a rapid method to estimate the quality of Flos Lonicerae. Variable selection together with modelling techniques is utilized to select representative variables that are used to calculate the similarity. NIR is used to build calibration models to predict the bacteriostatic activity of Flos Lonicerae. For the determination of the bacteriostatic activity, the in vitro experiment is used. Models are built for the Gram‐positive bacteria and also for the Gram‐negative bacteria. A genetic algorithm combined with partial least squares regression (GA‐PLS) is used to perform the calibration. The results of GA‐PLS models are compared to interval partial least squares (iPLS) models, full‐spectrum PLS and full‐spectrum principal component regression (PCR) models. Then, the variables in the two GA‐PLS models are combined and then used to calculate the NIR spectral similarity of samples. The similarity based on the characteristic variables and full spectrum is used for evaluating the fingerprints of Flos Lonicerae, respectively. The results show that the combination of variable selection method, modelling techniques and similarity analysis might be a powerful tool for quality control of traditional Chinese medicine (TCM).  相似文献   

17.
This paper indicates the possibility to use near infrared spectroscopy (NIR) combined with PLS as a rapid method to estimate the quality of green tea. NIR is used to build calibration models to predict the content of caffeine, epigallocatechin gallate (EGCG) and epicatechin (EC) and for the prediction of the total antioxidant capacity of green tea. For the determination of the total antioxidant capacity, the trolox equivalent antioxidant capacity (TEAC) method is used. Until now, the prediction of the antioxidant capacity as such by use of NIR has not been reported. For caffeine and TEAC, models are build for the whole green tea leaves and also for the ground leaves. For the polyphenols (EGCG and EC), only models for the whole leaves are investigated. A partial least squares (PLS) algorithm is used to perform the calibration. To decide upon the number of PLS factors included in the PLS model, the model with the lowest root mean square error of cross-validation (RMSECV) for the training set is chosen. The correlation coefficient (r) between the predicted and the reference results for the test set is used as an evaluation parameter for the models: for the TEAC results r=0.90 for the model with the whole leaves, r=0.86 for the model with the powdered leaves are obtained. The caffeine prediction model has a correlation coefficient r=0.96 for the whole leaves and r=0.93 for the ground leaves. The correlation coefficient for the EGCG and the EC content models are, respectively 0.83 and 0.44.  相似文献   

18.
This paper indicates the possibility to use near infrared (NIR) spectroscopy as a rapid method to predict quantitatively the content of caffeine and total polyphenols in green tea. A partial least squares (PLS) algorithm is used to perform the calibration. To decide upon the number of PLS factors included in the PLS model, the model is chosen according to the lowest root mean square error of cross-validation (RMSECV) in training. The correlation coefficient R between the NIR predicted and the reference results for the test set is used as an evaluation parameter for the models. The result showed that the correlation coefficients of the prediction models were R = 0.9688 for the caffeine and R = 0.9299 for total polyphenols. The study demonstrates that NIR spectroscopy technology with multivariate calibration analysis can be successfully applied as a rapid method to determine the valid ingredients of tea to control industrial processes.  相似文献   

19.
遗传算法用于偏最小二乘方法建模中的变量筛选   总被引:19,自引:0,他引:19  
利用全局搜索方法-遗传算法(genetic algorithms,GA)对近红外光谱分析中的波长变量进行筛选,再用偏最小二乘方法(patrial least squares,PLS)建立分析校正模型。对两类样品的近红外光谱分析应用实例表明,这种选取变量进行校正的方法,不仅简化、优化了模型,而且增强了所建模型的预测能力,尤其适用于单纯PLS较以校正关联的体系。  相似文献   

20.
The development of reliable multivariate calibration models for spectroscopic instruments in on-line/in-line monitoring of chemical and bio-chemical processes is generally difficult, time-consuming and costly. Therefore, it is preferable if calibration models can be used for an extended period, without the need to replace them. However, in many process applications, changes in the instrumental response (e.g. owing to a change of spectrometer) or variations in the measurement conditions (e.g. a change in temperature) can cause a multivariate calibration model to become invalid. In this contribution, a new method, systematic prediction error correction (SPEC), has been developed to maintain the predictive abilities of multivariate calibration models when e.g. the spectrometer or measurement conditions are altered. The performance of the method has been tested on two NIR data sets (one with changes in instrumental responses, the other with variations in experimental conditions) and the outcomes compared with those of some popular methods, i.e. global PLS, univariate slope and bias correction (SBC) and piecewise direct standardization (PDS). The results show that SPEC achieves satisfactory analyte predictions with significantly lower RMSEP values than global PLS and SBC for both data sets, even when only a few standardization samples are used. Furthermore, SPEC is simple to implement and requires less information than PDS, which offers advantages for applications with limited data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号