首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Variable selection using a genetic algorithm is combined with partial least squares (PLS) for the prediction of additive concentrations in polymer films using Fourier transform-infrared (FT-IR) spectral data. An approach using an iterative application of the genetic algorithm is proposed. This approach allows for all variables to be considered and at the same time minimizes the risk of overfitting. We demonstrate that the variables selected by the genetic algorithm are consistent with expert knowledge. This very exciting result is a convincing application that the algorithm can select correct variables in an automated fashion.  相似文献   

2.
Nowadays, near-infrared spectroscopy chemical imaging (NIR-CI) has been widely used in pharmaceutical analysis since it provides important surface information about the samples. In this work the information of NIR-CI at the pixel level was compared through calculation of the similarity between distribution maps of concentration obtained by different multivariate calibration approaches. The comparison was performed by using four different multivariate methods (MCR, MLR, CLS and PLS) in analysis of carbamazepine pharmaceutical formulations. For global determination, all models developed showed RMSEP below 1.9% (w/w) for active principal ingredient (API) and better than 4.6% (w/w) for excipients. Also, the distribution maps obtained by PLS, CLS and MCR showed great similarity for all compounds of the formulation as well with concentrations in the tablets. However, comparing the distribution maps obtained by MLR with those from the other chemometric tools, a lower similarity was observed. Thus, this fitted model does not ensure, by itself, that the images obtained are reliable or accurate. The paper also compares the distribution maps of concentrations obtained from all constituents present in the pharmaceutical formulation with their respective micrographs.  相似文献   

3.
This study proposes an analytical method for the simultaneous near infrared (NIR) spectrometric determination of palmitic, oleic, linoleic and linolenic acids in sea buckthorn seed oil. For this purpose, four different combinations of multivariate calibration methods and variable selections were evaluated: partial least squares (PLS) with full spectrum; PLS with uninformative variables elimination (UVE); PLS with competitive adaptive reweighted sampling (CARS); and multiple linear regression (MLR) with uninformative variable elimination combined with successive projections algorithm (UVE-SPA). An independent set of samples was employed to evaluate the performance of the resulting models. The UVE-SPA-MLR model developed with a few spectral variables provided the best results for each parameter. The values of relative errors of prediction (REP) from the UVE-SPA-MLR model for palmitic, oleic, linoleic and linolenic acids are 1.77%, 1.20%, 1.02% and 1.40%, respectively. These results indicate that this method is a feasible and fast method for the determination of the fatty acid content of sea buckthorn seed oil.  相似文献   

4.
Han QJ  Wu HL  Cai CB  Xu L  Yu RQ 《Analytica chimica acta》2008,612(2):121-125
An improved method based on an ensemble of Monte Carlo uninformative variable elimination (EMCUVE) is presented for wavelength selection in multivariate calibration of spectral data. The proposed algorithm introduces Monte Carlo (MC) strategy to uninformative variable elimination-PLS (UVE-PLS) instead of leave-one-out strategy for estimating the contributions of each wavelength variable in the PLS model. In EMCUVE wavelength variables are evaluated by different Monte Carlo uninformative variable elimination (MCUVE) models. Moreover, a fusion of MCUVE and the vote rule can obtain an improvement over the original uninformative variable elimination method. Results obtained from simulated data and real data sets demonstrate that EMCUVE can properly carry out wavelength selection in the course of data analysis and improve predictive ability for multivariate calibration model.  相似文献   

5.
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.  相似文献   

6.
Multivariate calibration problems often involve the identification of a meaningful subset of variables, from a vast number of variables for better prediction of output variables. A new graph theoretic method based on partial correlations (variable interaction network—VIN) is proposed. Many well studied representative calibration datasets spanning different application domains are selected for investigating the performance. Partial least squares (PLS) regression models combined with variable selection techniques are employed for benchmarking the performance. Subsets of variables with different number of variables are retained for the final analysis after VIN selection and progressive prediction accuracies are used for comparison. VIN-PLS results show significant improvement in prediction efficiencies and variable subset optimization. Improvement of up to 45% over existing methods with significantly fewer variables is achieved using the new method. Advantages of VIN based variable selection are highlighted.  相似文献   

7.
Rodrigues LO  Cardoso JP  Menezes JC 《Talanta》2008,75(5):1203-1207
The use of near infrared spectroscopy (NIRS) in downstream solvent based processing steps of an active pharmaceutical ingredient (API) is reported. A single quantitative method was developed for API content assessment in the organic phase of a liquid–liquid extraction process and in multiple process streams of subsequent concentration and depuration steps. A new methodology based in spectra combinations and variable selection by genetic algorithm was used with an effective improvement in calibration model prediction ability. Root mean standard error of prediction (RMSEP) of 0.05 in the range of 0.20–3.00% (w/w) was achieved. With this method, it is possible to balance the calibration data set with spectra of desired concentrations, whenever acquisition of new spectra is no longer possible or improvements in model's accuracy for a specific selected range are necessary. The inclusion of artificial spectra prior to genetic algorithms use improved RMSEP by 10%. This method gave a relative RMSEP improvement of 46% compared with a standard PLS of full spectral length.  相似文献   

8.
A spectrofluorometric method for the quantitative determination of flufenamic, mefenamic and meclofenamic acids in mixtures has been developed by recording emission fluorescence spectra between 370 and 550 nm with an excitation wavelength of 352 nm. The excitation–emission spectra of these compounds are deeply overlapped which does not allow their direct determination without previous separation. The proposed method applies partial least squares (PLS) multivariate calibration to the resolution of this mixture using a set of wavelengths previously selected by Kohonen artificial neural networks (K-ANN). The linear calibration graphs used to construct the calibration matrix were selected in the ranges from 0.25 to 1.00 μg ml−1 for flufenamic and meclofenamic acids, and from 1.00 to 4.00 μg ml−1 for mefenamic acid. A cross-validation procedure was used to select the number of factors. The selected calibration model has been applied to the determination of these compounds in synthetic mixtures and pharmaceutical formulations.  相似文献   

9.
利用近红外光谱技术对食用植物油中反式脂肪酸(Trans fatty acids,TFA)含量进行快速定量检测,并通过波段选择、预处理方法、变量筛选及建模方法对TFA含量预测模型进行优化.采用AntarisⅡ傅里叶变换近红外光谱仪在4000~10000 cm-1光谱范围采集98个食用植物油样本的近红外透射光谱,然后采用气相色谱法测定TFA的真实含量.首先,对样本原始光谱进行波段、预处理方法优选;在此基础上,采用竞争自适应重加权法(Competitive adaptive reweighted sampling,CARS)筛选TFA相关的重要变量,最后应用主成分回归、偏最小二乘和最小二乘支持向量机方法分别建立食用植物油中TFA含量的预测模型.研究结果表明,近红外光谱技术检测食用植物油中的TFA含量是可行的,优化后的最佳预测模型的校正集和预测集R2分别为0.992和0.989,RMSEC和RMSEP分别为0.071%和0.075%.最佳预测模型所用的变量仅26个,占全波段变量的0.854%.此外,与全波段偏最小二乘预测模型相比,其预测集R2由0.904上升为0.989,RMSEP由0.230%下降为0.075%.由此表明,模型优化非常必要,CARS能有效筛选TFA相关的重要变量,极大减少建模变量数,从而简化预测模型,并较大提高预测模型的精度和稳定性.  相似文献   

10.
Most multivariate calibration methods require selection of tuning parameters, such as partial least squares (PLS) or the Tikhonov regularization variant ridge regression (RR). Tuning parameter values determine the direction and magnitude of respective model vectors thereby setting the resultant predication abilities of the model vectors. Simultaneously, tuning parameter values establish the corresponding bias/variance and the underlying selectivity/sensitivity tradeoffs. Selection of the final tuning parameter is often accomplished through some form of cross-validation and the resultant root mean square error of cross-validation (RMSECV) values are evaluated. However, selection of a “good” tuning parameter with this one model evaluation merit is almost impossible. Including additional model merits assists tuning parameter selection to provide better balanced models as well as allowing for a reasonable comparison between calibration methods. Using multiple merits requires decisions to be made on how to combine and weight the merits into an information criterion. An abundance of options are possible. Presented in this paper is the sum of ranking differences (SRD) to ensemble a collection of model evaluation merits varying across tuning parameters. It is shown that the SRD consensus ranking of model tuning parameters allows automatic selection of the final model, or a collection of models if so desired. Essentially, the user’s preference for the degree of balance between bias and variance ultimately decides the merits used in SRD and hence, the tuning parameter values ranked lowest by SRD for automatic selection. The SRD process is also shown to allow simultaneous comparison of different calibration methods for a particular data set in conjunction with tuning parameter selection. Because SRD evaluates consistency across multiple merits, decisions on how to combine and weight merits are avoided. To demonstrate the utility of SRD, a near infrared spectral data set and a quantitative structure activity relationship (QSAR) data set are evaluated using PLS and RR.  相似文献   

11.
A methodology was developed to determine the intrinsic viscosity of poly(ethylene terephthalate) (PET) using diffuse reflectance infrared Fourier transform spectroscopy (DRIFTS) and multivariate calibration (MVC) methods. Multivariate partial least squares calibration was applied to the spectra using mean centering and cross validation. The results were correlated to the intrinsic viscosities determined by the standard chemical method (ASTM D 4603-01) and a very good correlation for values in the range from 0.346 to 0.780 dL g−1 (relative viscosity values ca. 1.185-1.449) was observed. The spectrophotometer detector sensitivity and the humidity of the samples did not influence the results. The methodology developed is interesting because it does not produce hazardous wastes, avoids the use of time-consuming chemical methods and can rapidly predict the intrinsic viscosity of PET samples over a large range of values, which includes those of recycled materials.  相似文献   

12.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes.  相似文献   

13.
14.
This study attempted the feasibility to use near infrared (NIR) spectroscopy as a rapid analysis method to qualitative and quantitative assessment of the tea quality. NIR spectroscopy with soft independent modeling of class analogy (SIMCA) method was proposed to identify rapidly tea varieties in this paper. In the experiment, four tea varieties from Longjing, Biluochun, Qihong and Tieguanyin were studied. The better results were achieved following as: the identification rate equals to 90% only for Longjing in training set; 80% only for Biluochun in test set; while, the remaining equal to 100%. A partial least squares (PLS) algorithm is used to predict the content of caffeine and total polyphenols in tea. The models are calibrated by cross-validation and the best number of PLS factors was achieved according to the lowest root mean square error of cross-validation (RMSECV). The correlation coefficients and the root mean square error of prediction (RMSEP) in the test set were used as the evaluation parameters for the models as follows: R = 0.9688, RMSEP = 0.0836% for the caffeine; R = 0.9299, RMSEP = 1.1138% for total polyphenols. The overall results demonstrate that NIR spectroscopy with multivariate calibration could be successfully applied as a rapid method not only to identify the tea varieties but also to determine simultaneously some chemical compositions contents in tea.  相似文献   

15.
A simple and reliable method for rapid evaluation of mixtures of phenolic compounds (phenol/chlorophenol, cathecol/phenol, cresol/chlorocresol and phenol/cresol) using a dual amperometric device is described. This new approach is based on the difference between the sensitivity of laccase and tyrosinase for different phenolic compounds. A multichannel potentiostat was used to monitor simultaneously laccase- and tyrosinase-based biosensors, and the data were treated using the partial least squares (PLS) chemometric algorithm. This system showed an excellent efficiency for the resolution of the phenolic mixtures. For example, in the phenol/chlorophenol mixture it was studied the determination of individual species in a concentration range from 1.0×10−6 to 10.0×10−6 mol l−1 obtaining relative standard deviations of 3.5 and 3.1% for phenol and chlorophenol, respectively. The excellent correlation between the estimated and the real concentrations can also be observed by the correlation coefficients (0.9958 and 0.9981 for phenol and chlorophenol, respectively). These results show that proposed methodology can be successfully employed to the simultaneous determination of phenolic compounds in mixtures, even in more diluted solutions.  相似文献   

16.
The partial least squares regression method has been applied for simultaneous spectrophotometric determination of harmine, harmane, harmalol and harmaline in Peganum harmala L. (Zygophyllaceae) seeds. The effect of pH was optimized employing multivariate definition of selectivity and sensitivity and best results were obtained in basic media (pH > 9). The calibration models were optimized for number of latent variables by the cross-validation procedure. Determinations were made over the concentration range of 0.15-10 μg mL−1. The proposed method was validated by applying it to the analysis of the β-carbolines in synthetic quaternary mixtures of media at pH 9 and 11. The relative standard errors of prediction were less than 4% in most cases. Analysis of P. harmala seeds by the proposed models for contents of the β-carboline derivatives resulted in 1.84%, 0.16%, 0.25% and 3.90% for harmine, harmane, harmaline and harmalol, respectively. The results were validated against an existing HPLC method and it no significant differences were observed between the results of two methods.  相似文献   

17.
Near infrared (NIR) spectroscopy based on effective wavelengths (EWs) and chemometrics was proposed to discriminate the varieties of fruit vinegars including aloe, apple, lemon and peach vinegars. One hundred eighty samples (45 for each variety) were selected randomly for the calibration set, and 60 samples (15 for each variety) for the validation set, whereas 24 samples (6 for each variety) for the independent set. Partial least squares discriminant analysis (PLS-DA) and least squares-support vector machine (LS-SVM) were implemented for calibration models. Different input data matrices of LS-SVM were determined by latent variables (LVs) selected by explained variance, and EWs selected by x-loading weights, regression coefficients, modeling power and independent component analysis (ICA). Then the LS-SVM models were developed with a grid search technique and RBF kernel function. All LS-SVM models outperformed PLS-DA model, and the optimal LS-SVM model was achieved with EWs (4021, 4058, 4264, 4400, 4853, 5070 and 5273 cm−1) selected by regression coefficients. The determination coefficient (R2), RMSEP and total recognition ratio with cutoff value ±0.1 in validation set were 1.000, 0.025 and 100%, respectively. The overall results indicted that the regression coefficients was an effective way for the selection of effective wavelengths. NIR spectroscopy combined with LS-SVM models had the capability to discriminate the varieties of fruit vinegars with high accuracy.  相似文献   

18.
A new method to determine a mixture for preserving sorbic and benzoic acids in commercial juices is proposed. The PLS-2 model was obtained preparing 40 standard solutions adding concentration of sorbic and benzoic acid to filtered natural juices of apple, lemon, orange and grapefruit. The concentration of analytes in the commercial samples was evaluated using the obtained model by UV spectral data. The PLS-2 method was validated by high performance liquid chromatography (HPLC), finding a relative error less than 12% between the PLS-2 and HPLC methods in all cases.  相似文献   

19.
A differential spectrophotometric method has been developed for the simultaneous quantitative determination of glucose (GLU), fructose (FRU) and lactose (LAC) in food samples. It relies on the different kinetic rates of the analytes in their oxidative reaction with potassium ferricyanide (K3Fe(CN)6) as the oxidant. The reaction data were recorded at the analytical wavelength (420 nm) of the K3Fe(CN)6 spectrum. Since the kinetic runs of glucose, fructose and lactose overlap seriously, the condition number was calculated for the data matrix to assist with the optimisation of the experimental conditions. Values of 80 °C and 1.5 mol l−1 were selected for the temperature and concentration of sodium hydroxide (NaOH), respectively. Linear calibration graphs were obtained in the concentration range of 2.96-66.7, 3.21-67.1 and 4.66-101 mg l−1 for glucose, fructose and lactose, respectively. Synthetic mixtures of the three reducing sugar were analysed, and the data obtained were processed by chemometrics methods, such as partial least square (PLS), principal component regression (PCR), classical least square (CLS), back propagation-artificial neural network (BP-ANN) and radial basis function-artificial neural network (RBF-ANN), using the normal and the first-derivative kinetic data. The results show that calibrations based on first-derivative data have advantages for the prediction of the analytes and the RBF-ANN gives the lowest prediction errors of the five chemometrics methods. Following the validation of the proposed method, it was applied for the determination of the three reducing sugars in several commercial food samples; and the standard addition method yielded satisfactory recoveries in all instances.  相似文献   

20.
New multivariate calibration methods and other processes are being developed that require selection of multiple tuning parameter (penalty) values to form the final model. With one or more tuning parameters, using only one measure of model quality to select final tuning parameter values is not sufficient. Optimization of several model quality measures is challenging. Thus, three fusion ranking methods are investigated for simultaneous assessment of multiple measures of model quality for selecting tuning parameter values. One is a supervised learning fusion rule named sum of ranking differences (SRD). The other two are non-supervised learning processes based on the sum and median operations. The effect of the number of models evaluated on the three fusion rules are also evaluated using three procedures. One procedure uses all models from all possible combinations of the tuning parameters. To reduce the number of models evaluated, an iterative process (only applicable to SRD) is applied and thresholding a model quality measure before applying the fusion rules is also used. A near infrared pharmaceutical data set requiring model updating is used to evaluate the three fusion rules. In this case, calibration of the primary conditions is for the active pharmaceutical ingredient (API) of tablets produced in a laboratory. The secondary conditions for calibration updating is for tablets produced in the full batch setting. Two model updating processes requiring selection of two unique tuning parameter values are studied. One is based on Tikhonov regularization (TR) and the other is a variation of partial least squares (PLS). The three fusion methods are shown to provide equivalent and acceptable results allowing automatic selection of the tuning parameter values. Best tuning parameter values are selected when model quality measures used with the fusion rules are for the small secondary sample set used to form the updated models. In this model updating situation, evaluation of all possible models, thresholding, and iterative SRD performed equivalently for the three fusion rules with TR and PLS performed worse. While the application is model updating, the fusion processes are applicable to other situations requiring selection of multiple tuning parameter values.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号