首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
New multivariate calibration methods and other processes are being developed that require selection of multiple tuning parameter (penalty) values to form the final model. With one or more tuning parameters, using only one measure of model quality to select final tuning parameter values is not sufficient. Optimization of several model quality measures is challenging. Thus, three fusion ranking methods are investigated for simultaneous assessment of multiple measures of model quality for selecting tuning parameter values. One is a supervised learning fusion rule named sum of ranking differences (SRD). The other two are non-supervised learning processes based on the sum and median operations. The effect of the number of models evaluated on the three fusion rules are also evaluated using three procedures. One procedure uses all models from all possible combinations of the tuning parameters. To reduce the number of models evaluated, an iterative process (only applicable to SRD) is applied and thresholding a model quality measure before applying the fusion rules is also used. A near infrared pharmaceutical data set requiring model updating is used to evaluate the three fusion rules. In this case, calibration of the primary conditions is for the active pharmaceutical ingredient (API) of tablets produced in a laboratory. The secondary conditions for calibration updating is for tablets produced in the full batch setting. Two model updating processes requiring selection of two unique tuning parameter values are studied. One is based on Tikhonov regularization (TR) and the other is a variation of partial least squares (PLS). The three fusion methods are shown to provide equivalent and acceptable results allowing automatic selection of the tuning parameter values. Best tuning parameter values are selected when model quality measures used with the fusion rules are for the small secondary sample set used to form the updated models. In this model updating situation, evaluation of all possible models, thresholding, and iterative SRD performed equivalently for the three fusion rules with TR and PLS performed worse. While the application is model updating, the fusion processes are applicable to other situations requiring selection of multiple tuning parameter values.  相似文献   

2.
With projection based calibration approaches, such as partial least squares (PLS) and principal component regression (PCR), the calibration space is spanned by respective basis vectors (latent vectors). Up to rank k basis vectors are formed where k ≤ min(m,n) with m and n denoting the number of calibration samples and measured variables. The user needs to decide how many and which respective basis vectors (tuning parameters). To avoid the second issue, basis vectors are selected top‐down starting with the first and sequentially adding until model criteria are satisfied. Ridge regression (RR) avoids the issues by using the full set of basis vectors. Another approach is to select a subset from the total available. The presented work develops a process based on the L1 vector norm to select basis vectors. Specifically, the L1 norm is used to select singular value decomposition (SVD) basis set vectors for PCR (LPCR). Because PCR, PLS, RR, and others can be expressed as linear combination of the SVD basis vectors, the focus is on selection and comparison using the SVD basis set. Results based on respective tuning parameter selections and weights applied to the SVD basis vectors for LPCR, top‐down PCR, correlation PCR (CPCR), PLS, and RR are compared for calibration and calibration updating using spectroscopic data sets. The methods are found to predict equivalently. In particular, the L1 norm produces similar results to those obtained by the well‐studied CPCR process. Thus, the new method provides a different theoretical framework than CPCR for selecting basis vectors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
A method for calibration and validation subset partitioning   总被引:13,自引:0,他引:13  
This paper proposes a new method to divide a pool of samples into calibration and validation subsets for multivariate modelling. The proposed method is of value for analytical applications involving complex matrices, in which the composition variability of real samples cannot be easily reproduced by optimized experimental designs. A stepwise procedure is employed to select samples according to their differences in both x (instrumental responses) and y (predicted parameter) spaces. The proposed technique is illustrated in a case study involving the prediction of three quality parameters (specific mass and distillation temperatures at which 10 and 90% of the sample has evaporated) of diesel by NIR spectrometry and PLS modelling. For comparison, PLS models are also constructed by full cross-validation, as well as by using the Kennard-Stone and random sampling methods for calibration and validation subset partitioning. The obtained models are compared in terms of prediction performance by employing an independent set of samples not used for calibration or validation. The results of F-tests at 95% confidence level reveal that the proposed technique may be an advantageous alternative to the other three strategies.  相似文献   

4.
Pefloxacin mesylate, a broad-spectrum antibacterial fluoroquinolone, has been widely used in clinical practice. Therefore, it is very important to detect the concentration of Pefloxacin mesylate. In this research, the near-infrared spectroscopy (NIRS) has been applied to quantitatively analyze on 108 injection samples, which was divided into a calibration set containing 89 samples and a prediction set containing 19 samples randomly. In order to get a satisfying result, partial least square (PLS) regression and principal components regression (PCR) have been utilized to establish quantitative models. Also, the process of establishing the models, parameters of the models, and prediction results were discussed in detail. In the PLS regression, the values of the coefficient of determination (R2) and root mean square error of cross-validation (RMSECV) of PLS regression are 0.9263 and 0.00119, respectively. For comparison, though applying PCR method to get the values of R2 and RMSECV we obtained are 0.9685 and 0.00108, respectively. And the values of the standard error of prediction set (SEP) of PLS and PCR models are 0.001480 and 0.001140. The result of the prediction set suggests that these two quantitative analysis models have excellent generalization ability and prediction precision. However, for this PFLX injection samples, the PCR quantitative analysis model achieved more accurate results than the PLS model. The experimental results showed that NIRS together with PCR method provide rapid and accurate quantitative analysis of PFLX injection samples. Moreover, this study supplied technical support for the further analysis of other injection samples in pharmaceuticals.  相似文献   

5.
Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores—the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can be applied to any experimental problem in which multianalyte results obtained either by several analytical procedures, analysts, instruments, or laboratories need to be compared.
Figure
Sum of ranking differences (SRD) order analytical methods or laboratories according to their overall (multianalyte) performances using either the average (or median) or the assigned values as the reference for the ranking  相似文献   

6.
Coscione AR  de A  Poppi RJ 《The Analyst》2002,127(1):135-139
Real samples were used for PLS model calibration and validation steps, showing that this approach can be of value in preventing deviations in the results caused by the matrix effects for the simultaneous spectrophotometric determination of aluminum and iron in plant extracts. One hundred UV-vis spectra, obtained from samples of the 1997 to 2000 International Plant-Analytical Exchange (IPE) program (The Netherlands), were used for model development, with ICP-AES aluminum and iron determinations as reference values for model calculation. The plant extracts were analyzed both by ICP-AES and by the PLS models developed in this work, using calibrations with both aqueous standard solutions and with real sample extracts. In addition, since the use of smaller calibration sets could be of value in reducing both the cost and the time of analysis, sets with fewer calibration samples were also investigated, with the help of the Kennard and Stone algorithm for sample selection. Comparison of the predictability of the best model obtained with each calibration set was made using the ratio of their relative root mean square error (%RMSEV) for samples in the validation set, for aluminum or iron determinations, and were compared against F-test tabulated values. For all the models developed with real samples, the differences in the %RMSEV values for the aluminum or iron determinations were found not to be statistically significant, at a confidence level of 95%. Although it was observed that the aluminum, but not the iron, determinations with the PLS 2 model prepared with aqueous standards tend to be slightly lower than the ICP-AES determinations, this model has a good global prediction ability, as observed through the correlation curves presented, and can be used for screening determinations or for other agricultural purposes.  相似文献   

7.
Linear and non-linear calibration methods (principal component regression (PCR), partial least squares regression (PLS), and neural networks (NN)) were applied to a slightly non-linear Raman data set. Because of the large size of this data set, recently introduced linear calibration methods, specifically optimised for speed, were also used. These fast methods achieve speed improvement by using the Lanczos decomposition for the singular value decomposition steps of the calibration procedures, and for some of their variants, by optimising the models without cross-validation (CV). Linear methods could deal with the slight non-linearity present in the data by including extra components, therefore, performing comparably to NNs. The fast methods performed as well as their classical equivalents in terms of precision in prediction, but the results were obtained considerably faster. It, however, appeared that CV remains the most appropriate method for model complexity estimation.  相似文献   

8.
Partial Least Squares (PLS) is by far the most popular regression method for building multivariate calibration models for spectroscopic data. However, the success of the conventional PLS approach depends on the availability of a ‘representative data set’ as the model needs to be trained for all expected variation at the prediction stage. When the concentration of the known interferents and their correlation with the analyte of interest change in a fashion which is not covered in the calibration set, the predictive performance of inverse calibration approaches such as conventional PLS can deteriorate. This underscores the need for calibration methods that are capable of building multivariate calibration models which can be robustified against the unexpected variation in the concentrations and the correlations of the known interferents in the test set. Several methods incorporating ‘a priori’ information such as pure component spectra of the analyte of interest and/or the known interferents have been proposed to build more robust calibration models. In the present study, four such calibration techniques have been benchmarked on two data sets with respect to their predictive ability and robustness: Net Analyte Preprocessing (NAP), Improved Direct Calibration (IDC), Science Based Calibration (SBC) and Augmented Classical Least Squares (ACLS) Calibration. For both data sets, the alternative calibration techniques were found to give good prediction performance even when the interferent structure in the test set was different from the one in the calibration set. The best results were obtained by the ACLS model incorporating both the pure component spectra of the analyte of interest and the interferents, resulting in a reduction of the RMSEP by a factor 3 compared to conventional PLS for the situation when the test set had a different interferent structure than the one in the calibration set.  相似文献   

9.
This study attempted the feasibility to use near infrared (NIR) spectroscopy as a rapid analysis method to qualitative and quantitative assessment of the tea quality. NIR spectroscopy with soft independent modeling of class analogy (SIMCA) method was proposed to identify rapidly tea varieties in this paper. In the experiment, four tea varieties from Longjing, Biluochun, Qihong and Tieguanyin were studied. The better results were achieved following as: the identification rate equals to 90% only for Longjing in training set; 80% only for Biluochun in test set; while, the remaining equal to 100%. A partial least squares (PLS) algorithm is used to predict the content of caffeine and total polyphenols in tea. The models are calibrated by cross-validation and the best number of PLS factors was achieved according to the lowest root mean square error of cross-validation (RMSECV). The correlation coefficients and the root mean square error of prediction (RMSEP) in the test set were used as the evaluation parameters for the models as follows: R = 0.9688, RMSEP = 0.0836% for the caffeine; R = 0.9299, RMSEP = 1.1138% for total polyphenols. The overall results demonstrate that NIR spectroscopy with multivariate calibration could be successfully applied as a rapid method not only to identify the tea varieties but also to determine simultaneously some chemical compositions contents in tea.  相似文献   

10.
Simultaneous multicomponent analysis is usually carried out using multivariate calibration models, such as the partial least squares (PLS) one, that utilize the full spectrum. It has been shown by both experimental and theoretical considerations that better results can by obtained by proper selection of the spectral range to be included in calculations. A genetic algorithm (GA) is one of the most popular methods for selecting variables for PLS calibration of mixtures with almost identical spectra without loss of predictive capability. In this work, a simple and precise method for rapid and accurate simultaneous determination of sulfide and sulfite ions based on the addition reaction of these ions with new fuchsin at pH 8 and 25°C using PLS regression and GA for variable selection is proposed. The concentrations of sulfide ions varied between 0.05–2.50 and 0.15–2.00 μg/mL, respectively. A series of model solutions containing different concentrations of sulfide and sulfite were used to check the predictive ability of GA-PLS models. The root mean square error of prediction with PLS on the whole data set was 0.19 μg/mL for sulfide and 0.09 μg/mL for sulfite. After the application of GA, these values reduced to 0.04 and 0.03 μg/mL, respectively. The text was submitted by the authors in English.  相似文献   

11.
The study demonstrates an application of the front-face fluorescence spectroscopy combined with multivariate regression methods to the analysis of fluorescent beer components. Partial least-squares regressions (PLS1, PLS2, and N-way PLS) were utilized to develop calibration models between synchronous fluorescence spectra and excitation-emission matrices of beers, on one hand, and analytical concentrations of riboflavin and aromatic amino acids, on the other hand. The best results were obtained in the analysis of excitation-emission matrices using the N-way PLS2 method. The respective correlation coefficients, and the values of the root mean-square error of cross-validation (RMSECV), expressed as percentages of the respective mean analytic concentrations, were: 0.963 and 14% for riboflavin, 0.974 and 4% for tryptophan, 0.980 and 4% for tyrosine, and 0.982 and 19% for phenylalanine.  相似文献   

12.
This paper indicates the possibility to use near infrared (NIR) spectroscopy as a rapid method to predict quantitatively the content of caffeine and total polyphenols in green tea. A partial least squares (PLS) algorithm is used to perform the calibration. To decide upon the number of PLS factors included in the PLS model, the model is chosen according to the lowest root mean square error of cross-validation (RMSECV) in training. The correlation coefficient R between the NIR predicted and the reference results for the test set is used as an evaluation parameter for the models. The result showed that the correlation coefficients of the prediction models were R = 0.9688 for the caffeine and R = 0.9299 for total polyphenols. The study demonstrates that NIR spectroscopy technology with multivariate calibration analysis can be successfully applied as a rapid method to determine the valid ingredients of tea to control industrial processes.  相似文献   

13.
Based on a so-called ensemble strategy, an algorithm is proposed for near-infrared (NIR) spectral calibration of complex beverage samples. This algorithm is a combination of a novel training set/test set sample-selection procedure based on a Kohonen self-organizing map (SOM) with a simple procedure to calculate an average partial least-squares (PLS) calibration model, which is therefore named SOMEPLS. In order to verify the proposed SOMEPLS, two NIR beverage datasets involving the determination of sugar content are considered, and three kinds of reference algorithm, i.e., conventional PLS (CPLS), the Kennard-Stone (KS) algorithm in combination with PLS (KSPLS), and sample set partitioning based on the joint x-y distance (SPXY) algorithm in combination with PLS (SPXYPLS), are used. Of these, both KS and SPXY are well-known representative sample-selection algorithms. By comparison, it was found that when there is a training set of appropriate size, SOMEPLS can achieve better prediction accuracy than the three reference algorithms, but without increasing the complexity of the corresponding calibration model for the future application, indicating that SOMEPLS can serve as a promising tool for NIR spectral calibration.  相似文献   

14.
Proteins possess strong absorption features in the combination range (5000-4000 cm−1) of the near infrared (NIR) spectrum. These features can be used for quantitative analysis. Partial least squares (PLS) regression was used to analyze NIR spectra of lysozyme with the leave-one-out, full cross-validation method. A strategy for spectral range optimization with cross-validation PLS calibration was presented. A five-factor PLS model based on the spectral range between 4720 and 4540 cm−1 provided the best calibration model for lysozyme in aqueous solutions. For 47 samples ranging from 0.01 to 10 mg/mL, the root mean square error of prediction was 0.076 mg/mL. This result was compared with values reported in the literature for protein measurements by NIR absorption spectroscopy in human serum and animal cell culture supernatants.  相似文献   

15.
A spectrofluorometric method for the quantitative determination of flufenamic, mefenamic and meclofenamic acids in mixtures has been developed by recording emission fluorescence spectra between 370 and 550 nm with an excitation wavelength of 352 nm. The excitation–emission spectra of these compounds are deeply overlapped which does not allow their direct determination without previous separation. The proposed method applies partial least squares (PLS) multivariate calibration to the resolution of this mixture using a set of wavelengths previously selected by Kohonen artificial neural networks (K-ANN). The linear calibration graphs used to construct the calibration matrix were selected in the ranges from 0.25 to 1.00 μg ml−1 for flufenamic and meclofenamic acids, and from 1.00 to 4.00 μg ml−1 for mefenamic acid. A cross-validation procedure was used to select the number of factors. The selected calibration model has been applied to the determination of these compounds in synthetic mixtures and pharmaceutical formulations.  相似文献   

16.
This paper evaluates analytical methods based on near infrared (NIR) and middle infrared (MIR) spectroscopy and multivariate calibration to monitor the stability of biodiesel. There was a focus on three parameters: oxidative stability index, acid number and water content. Ethylic and methylic biodiesel from different feedstocks were used in experiments of accelerated aging, in order to take into account the wide variety of oilseeds and feedstocks available in Brazil. Partial least squares (PLS) and multiple linear regression (MLR) models were developed. Different pre-processing techniques and spectral variable/regions selection algorithms were evaluated. For MLR models, the successive projection algorithm (SPA) was employed. Interval PLS (iPLS) and selection of variables taking into account the significant regression coefficients were used for PLS models. Results showed that both near and middle infrared regions, and all variable selection methods tested were efficient for predicting these three important quality parameters of B100, the root mean squares error of prediction (RMSEP) values being comparable to the reproducibility of the corresponding standard method for each property investigated.  相似文献   

17.
This study proposes an analytical method for the simultaneous near infrared (NIR) spectrometric determination of palmitic, oleic, linoleic and linolenic acids in sea buckthorn seed oil. For this purpose, four different combinations of multivariate calibration methods and variable selections were evaluated: partial least squares (PLS) with full spectrum; PLS with uninformative variables elimination (UVE); PLS with competitive adaptive reweighted sampling (CARS); and multiple linear regression (MLR) with uninformative variable elimination combined with successive projections algorithm (UVE-SPA). An independent set of samples was employed to evaluate the performance of the resulting models. The UVE-SPA-MLR model developed with a few spectral variables provided the best results for each parameter. The values of relative errors of prediction (REP) from the UVE-SPA-MLR model for palmitic, oleic, linoleic and linolenic acids are 1.77%, 1.20%, 1.02% and 1.40%, respectively. These results indicate that this method is a feasible and fast method for the determination of the fatty acid content of sea buckthorn seed oil.  相似文献   

18.
Metal ions such as Co(II), Ni(II), Cu(II), Fe(III) and Cr(III), which are commonly present in electroplating baths at high concentrations, were analysed simultaneously by a spectrophotometric method modified by the inclusion of the ethylenediaminetetraacetate (EDTA) solution as a chromogenic reagent. The prediction of the metal ion concentrations was facilitated by the use of an orthogonal array design to build a calibration data set consisting of absorption spectra collected in the 370-760 nm range from solution mixtures containing the five metal ions earlier. With the aid of this data set, calibration models were built based on 10 different chemometrics methods such as classical least squares (CLS), principal component regression (PCR), partial least squares (PLS), artificial neural networks (ANN) and others. These were tested with the use of a validation data set constructed from synthetic solutions of the five metal ions. The analytical performance of these chemometrics methods were characterized by relative prediction errors and recoveries (%). On the basis of these results, the computational methods were ranked according to their performances using the multi-criteria decision making procedures preference ranking organization method for enrichment evaluation (PROMETHEE) and geometrical analysis for interactive aid (GAIA). PLS and PCR models applied to the spectral data matrix that used the first derivative pre-treatment were the preferred methods. They together with ANN-radial basis function (RBF) and PLS were applied for analysis of results from some typical industrial samples analysed by the EDTA-spectrophotometric method described. DPLS, DPCR and the ANN-RBF chemometrics methods performed particularly well especially when compared with some target values provided by industry.  相似文献   

19.
This study continues the development of a method, implicit calibration, for estimating kinetic parameters from on-line measurements of batch reactions. The basic idea of implicit calibration is to combine non-linear parameter estimation with the calibration of measured spectra with concentrations calculated by an assumed kinetic model. A new example is studied, an esterification reaction with a rather complicated kinetic mechanism, where activities, instead of concentrations, and NIR spectra are used as measurements. The emphasis in the study is on estimating the uncertainty of the kinetic parameters. Two approaches, linearization and bootstrap, are applied. In the case studied, the two approaches give closely similar estimates of the uncertainty. As well, a new way is introduced to control the rigidity of the implicit calibration, based on minimizing the lack of fit of the model. It is also shown that ‘mixed implicit calibration’, i.e. implicit calibration combined with a few off-line calibrated concentrations, greatly enhances the identifiability of the kinetic model. Copyright © 2003 John Wiley & Sons, Ltd.  相似文献   

20.
A novel strategy for the optimization of wavelet transforms with respect to the statistics of the data set in multivariate calibration problems is proposed. The optimization follows a linear semi-infinite programming formulation, which does not display local maxima problems and can be reproducibly solved with modest computational effort. After the optimization, a variable selection algorithm is employed to choose a subset of wavelet coefficients with minimal collinearity. The selection allows the building of a calibration model by direct multiple linear regression on the wavelet coefficients. In an illustrative application involving the simultaneous determination of Mn, Mo, Cr, Ni, and Fe in steel samples by ICP-AES, the proposed strategy yielded more accurate predictions than PCR, PLS, and nonoptimized wavelet regression.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号