首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Multivariate calibration problems often involve the identification of a meaningful subset of variables, from a vast number of variables for better prediction of output variables. A new graph theoretic method based on partial correlations (variable interaction network—VIN) is proposed. Many well studied representative calibration datasets spanning different application domains are selected for investigating the performance. Partial least squares (PLS) regression models combined with variable selection techniques are employed for benchmarking the performance. Subsets of variables with different number of variables are retained for the final analysis after VIN selection and progressive prediction accuracies are used for comparison. VIN-PLS results show significant improvement in prediction efficiencies and variable subset optimization. Improvement of up to 45% over existing methods with significantly fewer variables is achieved using the new method. Advantages of VIN based variable selection are highlighted.  相似文献   

2.
In this study, different approaches to the multivariate calibration of the vapors of two refrigerants are reported. As the relationships between the time-resolved sensor signals and the concentrations of the analytes are nonlinear, the widely used partial least-squares regression (PLS) fails. Therefore, different methods are used, which are known to be able to deal with nonlinearities present in data. First, the Box–Cox transformation, which transforms the dependent variables nonlinearly, was applied. The second approach, the implicit nonlinear PLS regression, tries to account for nonlinearities by introducing squared terms of the independent variables to the original independent variables. The third approach, quadratic PLS (QPLS), uses a nonlinear quadratic inner relationship for the model instead of a linear relationship such as PLS. Tree algorithms are also used, which split a nonlinear problem into smaller subproblems, which are modeled using linear methods or discrete values. Finally, neural networks are applied, which are able to model any relationship. Different special implementations, like genetic algorithms with neural networks and growing neural networks, are also used to prevent an overfitting. Among the fast and simpler algorithms, QPLS shows good results. Different implementations of neural networks show excellent results. Among the different implementations, the most sophisticated and computing-intensive algorithms (growing neural networks) show the best results. Thus, the optimal method for the data set presented is a compromise between quality of calibration and complexity of the algorithm.Electronic Supplementary Material Supplementary material is available for this article at  相似文献   

3.
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.  相似文献   

4.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes.  相似文献   

5.
Variable selection using a genetic algorithm is combined with partial least squares (PLS) for the prediction of additive concentrations in polymer films using Fourier transform-infrared (FT-IR) spectral data. An approach using an iterative application of the genetic algorithm is proposed. This approach allows for all variables to be considered and at the same time minimizes the risk of overfitting. We demonstrate that the variables selected by the genetic algorithm are consistent with expert knowledge. This very exciting result is a convincing application that the algorithm can select correct variables in an automated fashion.  相似文献   

6.
By employing the simple but effective principle ‘survival of the fittest’ on which Darwin's Evolution Theory is based, a novel strategy for selecting an optimal combination of key wavelengths of multi-component spectral data, named competitive adaptive reweighted sampling (CARS), is developed. Key wavelengths are defined as the wavelengths with large absolute coefficients in a multivariate linear regression model, such as partial least squares (PLS). In the present work, the absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each wavelength. Then, based on the importance level of each wavelength, CARS sequentially selects N subsets of wavelengths from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each sampling run, a fixed ratio (e.g. 80%) of samples is first randomly selected to establish a calibration model. Next, based on the regression coefficients, a two-step procedure including exponentially decreasing function (EDF) based enforced wavelength selection and adaptive reweighted sampling (ARS) based competitive wavelength selection is adopted to select the key wavelengths. Finally, cross validation (CV) is applied to choose the subset with the lowest root mean square error of CV (RMSECV). The performance of the proposed procedure is evaluated using one simulated dataset together with one near infrared dataset of two properties. The results reveal an outstanding characteristic of CARS that it can usually locate an optimal combination of some key wavelengths which are interpretable to the chemical property of interest. Additionally, our study shows that better prediction is obtained by CARS when compared to full spectrum PLS modeling, Monte Carlo uninformative variable elimination (MC-UVE) and moving window partial least squares regression (MWPLSR).  相似文献   

7.
Most of the current expressions used to calculate figures of merit in multivariate calibration have been derived assuming independent and identically distributed (iid) measurement errors. However, it is well known that this condition is not always valid for real data sets, where the existence of many external factors can lead to correlated and/or heteroscedastic noise structures. In this report, the influence of the deviations from the classical iid paradigm is analyzed in the context of error propagation theory. New expressions have been derived to calculate sample dependent prediction standard errors under different scenarios. These expressions allow for a quantitative study of the influence of the different sources of instrumental error affecting the system under analysis. Significant differences are observed when the prediction error is estimated in each of the studied scenarios using the most popular first-order multivariate algorithms, under both simulated and experimental conditions.  相似文献   

8.
The determination of the contents of therapeutic drugs, metabolites and other important biomedical analytes in biological samples is usually performed by using high-performance liquid chromatography (HPLC). Modern multivariate calibration methods constitute an attractive alternative, even when they are applied to intrinsically unselective spectroscopic or electrochemical signals. First-order (i.e., vectorized) data are conveniently analyzed with classical chemometric tools such as partial least-squares (PLS). Certain analytical problems require more sophisticated models, such as artificial neural networks (ANNs), which are especially able to cope with non-linearities in the data structure. Finally, models based on the acquisition and processing of second- or higher-order data (i.e., matrices or higher dimensional data arrays) present the phenomenon known as “second-order advantage”, which permits quantitation of calibrated analytes in the presence of interferents. The latter models show immense potentialities in the field of biomedical analysis. Pertinent literature examples are reviewed.  相似文献   

9.
Three multivariate calibration methods, partial least squares (PLS-1 and PLS-2) and principal component regression (PCR), were applied for the first time to the simultaneous determination of a mixture of six pesticides in vegetables samples by gas chromatography with mass spectrometric detection (GC-MS). PLS-1 method showed better prediction ability than PLS-2 and PCR methods. The GC-MS chromatograms obtained of vegetable samples spiked with the target pesticides were used to build the calibration matrix. The PLS-1 models were evaluated by predicting the concentrations of independent test samples. Also, the proposed models were successfully applied for the determination of these pesticides in vegetable samples after an extraction step with dichloromethane. By using the first derivative signals in PLS-1 models, simultaneous determination of the compounds was not improved.  相似文献   

10.
An analytical methodology based on differential pulse voltammetry (DPV) on a glassy carbon electrode and the partial least-squares (PLS-1) algorithm for the simultaneous determination of levodopa, carbidopa and benserazide in pharmaceutical formulations was developed and validated. Some sources of bi-linearity deviation for electrochemical data are discussed and analyzed. The multivariate model was developed as a ternary calibration model and it was built and validated with an independent set of drug mixtures in presence of excipients, according with manufacturer specifications. The proposed method was applied to both the assay and the uniformity content of two commercial formulations containing mixtures of levodopa-carbidopa (10:1) and levodopa-benserazide (4:1). The results were satisfactory and statistically comparable to those obtained by applying the reference Pharmacopoeia method based on high performance liquid chromatography. In conclusion, the methodology proposed based on DPV data processed with the PLS-1 algorithm was able to quantify simultaneously levodopa, carbidopa and benserazide in its pharmaceuticals formulations using a ternary calibration model for these drugs in presence of excipients. Furthermore, the model appears to be successful even in the presence of slight potential shifts in the processed data, which have been taken into account by the flexible chemometric PLS-1 approach.  相似文献   

11.
Preprocessing of raw near-infrared (NIR) spectral data is indispensable in multivariate calibration when the measured spectra are subject to significant noises, baselines and other undesirable factors. However, due to the lack of sufficient prior information and an incomplete knowledge of the raw data, NIR spectra preprocessing in multivariate calibration is still trial and error. How to select a proper method depends largely on both the nature of the data and the expertise and experience of the practitioners. This might limit the applications of multivariate calibration in many fields, where researchers are not very familiar with the characteristics of many preprocessing methods unique in chemometrics and have difficulties to select the most suitable methods. Another problem is many preprocessing methods, when used alone, might degrade the data in certain aspects or lose some useful information while improving certain qualities of the data. In order to tackle these problems, this paper proposes a new concept of data preprocessing, ensemble preprocessing method, where partial least squares (PLSs) models built on differently preprocessed data are combined by Monte Carlo cross validation (MCCV) stacked regression. Little or no prior information of the data and expertise are required. Moreover, fusion of complementary information obtained by different preprocessing methods often leads to a more stable and accurate calibration model. The investigation of two real data sets has demonstrated the advantages of the proposed method.  相似文献   

12.
Benzoic acid(BA),methylparaben(MP),propylparaben(PP)and sorbic acid(SA)are food preservatives,and they have well defined UV spectra.However,their spectra overlap seriously,and it is difficult to determine them individually from their mixtures without preseparation.In this paper,seven different chemometric approaches were applied to resolve the overlapping spectra and to determine these compounds simultaneously.With respect to the criteria of%relative prediction error(RPE)and%recovery, principal component...  相似文献   

13.
In multivariate regression, it is often reported that wavelength selection can improve results. Improvement is often solely based on bias measures such as the root mean square error of calibration (RMSEC) and root mean square error of validation (RMSEV), R2 for the calibration and validation, etc. In recent studies, it has been shown that when variance measures are included, Pareto optimal models can be determined. However, variance measures used to date do not provide the ability to choose wavelength subset models relative to full wavelength models when wavelength subset models may be the Pareto models. In this paper, simplex optimization is used with a more complete variance measure to generate Pareto optimal models. The standard basis set is used as well a basis set that includes the range and null space of the calibration spectra. Results show that it is possible to identify Pareto optimal models and if a wavelength subset is best, these are the models found. Regression coefficients for non-essential wavelengths are zero to near zero.  相似文献   

14.
Cocchi M  Durante C  Foca G  Marchetti A  Tassi L  Ulrici A 《Talanta》2006,68(5):1505-1511
In the present work, we explored the possibility of using near-infrared spectroscopy in order to quantify the degree of adulteration of durum wheat flour with common bread wheat flour. The multivariate calibration techniques adopted to this aim were PLS and a wavelet-based calibration algorithm, recently developed by some of us, called WILMA. Both techniques provided satisfactory results, the percentage of adulterant present in the samples being quantified with an uncertainty lower than that associated to the Italian official method. In particular the WILMA algorithm, by performing feature selection, allowed the signal pretreatment to be avoided and obtaining more parsimonious models.  相似文献   

15.
A method is proposed for the simultaneous determination of albumin and immunoglobulin G (IgG1) with fluorescence spectroscopy and multivariate calibration with partial least squares regression (PLS). The influence of some instrumental parameters were investigated with two experimental designs comprising 19 and 11 experiments, respectively. The investigated parameters were excitation and emission slit, detection voltage and scan rate. When a suitable instrumental setting had been found, a minor calibration and test set were analysed and evaluated. Thereafter, a larger calibration of albumin and IgG1 was made out of 26 samples (0-42 μg ml−1 albumin and 0-12.7 μg ml−1 IgG1). This calibration was validated with a test set consisting of 14 samples in the same concentration range. The precision of the method was estimated by analysing two test set samples for six times each. The scan modes tested were emission scan and synchronous scan Δ60 nm. The results showed that the method could be used for determination of albumin and IgG1 (albumin, root mean square error of prediction (RMSEP) <2, relative standard error of prediction (RSEP) <6% and IgG1, RMSEP <1, RSEP <8%) in spite of the overlapping fluorescence of the two compounds. The estimated precision was relative standard deviation (R.S.D.) <1.7%. The method was finally applied for the analysis of some sample fractions from an albumin standard used in affinity chromatography.  相似文献   

16.
With projection based calibration approaches, such as partial least squares (PLS) and principal component regression (PCR), the calibration space is spanned by respective basis vectors (latent vectors). Up to rank k basis vectors are formed where k ≤ min(m,n) with m and n denoting the number of calibration samples and measured variables. The user needs to decide how many and which respective basis vectors (tuning parameters). To avoid the second issue, basis vectors are selected top‐down starting with the first and sequentially adding until model criteria are satisfied. Ridge regression (RR) avoids the issues by using the full set of basis vectors. Another approach is to select a subset from the total available. The presented work develops a process based on the L1 vector norm to select basis vectors. Specifically, the L1 norm is used to select singular value decomposition (SVD) basis set vectors for PCR (LPCR). Because PCR, PLS, RR, and others can be expressed as linear combination of the SVD basis vectors, the focus is on selection and comparison using the SVD basis set. Results based on respective tuning parameter selections and weights applied to the SVD basis vectors for LPCR, top‐down PCR, correlation PCR (CPCR), PLS, and RR are compared for calibration and calibration updating using spectroscopic data sets. The methods are found to predict equivalently. In particular, the L1 norm produces similar results to those obtained by the well‐studied CPCR process. Thus, the new method provides a different theoretical framework than CPCR for selecting basis vectors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

17.
Oxidation stability is an important quality parameter for biodiesel. In general, the methods used to evaluate the oxidation stability of oils and biodiesels are time-consuming. This work reports the use of spectrofluorimetry, a fast analytical technique, associated with multivariate data analysis as a powerful analytical tool to prediction of the oxidation stability. The prediction of the oxidation stability showed a good agreement with the results obtained by the EN14112 reference method Rancimat. The models presented high correlation (0.99276 and 0.97951) between real and predicted values. The R2 values of 0.98557 and 0.95943 indicated the accuracy of the models to predict the oxidation stability of soy oil and soy biodiesel, respectively. The residual distribution does not follow a trend with respect to the predicted variables indicating the good quality of the fits.  相似文献   

18.
The present article describes the spectrofluorimetric determination of galantamine, a widely used acetylcholinesterase inhibitor, through excitation-emission fluorescence matrices and second-order calibration. With the purpose of enhancing the fluorescence intensity of this substance, the effect of different organized assemblies was evaluated. Although the interaction of galantamine with different cyclodextrins is weak, it was corroborated that the fluorescence intensity of this pharmaceutical in the presence of α-cyclodextrin is increased by a twofold factor. Among the studied micellar media, the anionic surfactant sodium dodecyl sulfate produced the largest signals for the compound of interest (sixfold enhancement), and was selected as auxiliary reagent for the subsequent determinations. The developed approach enabled the determination of galantamine at the ng mL−1 level without the necessity of applying separation steps, and in the presence of uncalibrated interferences. The applied second-order chemometric tools were parallel factor analysis (PARAFAC), unfolded partial least-squares coupled to residual bilinearization (U-PLS/RBL), and multidimensional partial least-squares coupled to residual bilinearization (N-PLS/RBL). The ability of U-PLS/RBL to successfully overcome spectral interference problems is demonstrated. The quality of the proposed method was established with the determination of galantamine in both artificial and natural water samples.  相似文献   

19.
This paper presents new methods for multivariate calibration. A unique aspect is that this approach uses rational functions with either Least Absolute Shrinkage and Selection Operator (LASSO) or Elastic Net (ENET), and builds parsimonious models in an automated way via cross-validation. Rational function modeling provides robustness, as will be briefly demonstrated. Interestingly, rational function models are also flexible, in that occasionally they are reduced to ordinary linear models based on cross-validation. Thus, model complexity is not forced to take the form of rational functions.  相似文献   

20.
The simplicity, sensitivity and expeditiousness of ion mobility spectrometry (IMS) make it especially useful for the determination of active principal ingredients (APIs) present at low concentrations in pharmaceuticals. However, the poor resolution of this technique precludes the identification and/or determination of substances with similar molecular weights, which exhibit also similar drift times and give overlapped peaks as a result. Oral contraceptives are pharmaceutical formulations containing two APIs of similar molecular weights at very low concentrations which therefore give strongly overlapped peaks hindering their determination by IMS. In this work, we assessed the potential of IMS for detecting and quantifying the contraceptives ethinylestradiol (ETE) and desogestrel (DES) in commercial tablets. To this end, we used various chemometric techniques including a second-derivative (TN2D) algorithm and the more powerful choice Multivariate Curve Resolution (MCR) to improve the resolution of IMS and enable the determination of both APIs. Quantitation was based on PLS1 models for each API. The models constructed involve a single PLS factor with a Y-explained variance above 98.4%, obtaining a RMSEP of 0.34 and 0.63 for ETE and DES, respectively. The ensuing method, which was validated for use in routine analyses, is quite expeditious (analyses take less than 1 min) and uses very small amounts of sample (a few microliters). Based on the results, IMS has a great potential for the qualitative and quantitative determination of APIs in low doses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号