共查询到20条相似文献,搜索用时 0 毫秒
1.
Yong-Huan Yun Wei-Ting Wang Bai-Chuan Deng Guang-Bi Lai Xin-bo Liu Da-Bing Ren Yi-Zeng Liang Wei Fan Qing-Song Xu 《Analytica chimica acta》2015
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750. 相似文献
2.
This paper presents a Bayesian approach to the development of spectroscopic calibration models. By formulating the linear regression in a probabilistic framework, a Bayesian linear regression model is derived, and a specific optimization method, i.e. Bayesian evidence approximation, is utilized to estimate the model “hyper-parameters”. The relation of the proposed approach to the calibration models in the literature is discussed, including ridge regression and Gaussian process model. The Bayesian model may be modified for the calibration of multivariate response variables. Furthermore, a variable selection strategy is implemented within the Bayesian framework, the motivation being that the predictive performance may be improved by selecting a subset of the most informative spectral variables. The Bayesian calibration models are applied to two spectroscopic data sets, and they demonstrate improved prediction results in comparison with the benchmark method of partial least squares. 相似文献
3.
In multivariate calibration with the spectral dataset, variable selection is often applied to identify relevant subset of variables, leading to improved prediction accuracy and easy interpretation of the selected fingerprint regions. Until now, numerous variable selection methods have been proposed, but a proper choice among them is not trivial. Furthermore, in many cases, a set of variables found by those methods might not be robust due to the irreproducibility and uncertainty issues, posing a great challenge in improving the reliability of the variable selection. In this study, the reproducibility of the 5 variable selection methods was investigated quantitatively for evaluating their performance. The reproducibility of variable selection was quantified by using Monte-Carlo sub-sampling (MCS) techniques together with the quantitative similarity measure designed for the highly collinear spectral dataset. The investigation of reproducibility and prediction accuracy of the several variable selection algorithms with two different near-infrared (NIR) datasets illustrated that the different variable selection methods exhibited wide variability in their performance, especially in their capabilities to identify the consistent subset of variables from the spectral datasets. Thus the thorough assessment of the reproducibility together with the predictive accuracy of the identified variables improved the statistical validity and confidence of the selection outcome, which cannot be addressed by the conventional evaluation schemes. 相似文献
4.
Recently we have proposed a new variable selection algorithm, based on clustering of variable concept (CLoVA) in classification problem. With the same idea, this new concept has been applied to a regression problem and then the obtained results have been compared with conventional variable selection strategies for PLS. The basic idea behind the clustering of variable is that, the instrument channels are clustered into different clusters via clustering algorithms. Then, the spectral data of each cluster are subjected to PLS regression. Different real data sets (Cargill corn, Biscuit dough, ACE QSAR, Soy, and Tablet) have been used to evaluate the influence of the clustering of variables on the prediction performances of PLS. Almost in the all cases, the statistical parameter especially in prediction error shows the superiority of CLoVA-PLS respect to other variable selection strategies. Finally the synergy clustering of variable (sCLoVA-PLS), which is used the combination of cluster, has been proposed as an efficient and modification of CLoVA algorithm. The obtained statistical parameter indicates that variable clustering can split useful part from redundant ones, and then based on informative cluster; stable model can be reached. 相似文献
5.
We present a novel algorithm for linear multivariate calibration that can generate good prediction results. This is accomplished by the idea of that testing samples are mixed by the calibration samples in proper proportion. The algorithm is based on the mixed model of samples and is therefore called MMS algorithm. With both theoretical support and analysis of two data sets, it is demonstrated that MMS algorithm produces lower prediction errors than partial least squares (PLS2) model, has similar prediction performance to PLS1. In the anti-interference test of background, MMS algorithm performs better than PLS2. At the condition of the lack of some component information, MMS algorithm shows better robustness than PLS2. 相似文献
6.
An ensemble of Monte Carlo uninformative variable elimination for wavelength selection 总被引:1,自引:0,他引:1
An improved method based on an ensemble of Monte Carlo uninformative variable elimination (EMCUVE) is presented for wavelength selection in multivariate calibration of spectral data. The proposed algorithm introduces Monte Carlo (MC) strategy to uninformative variable elimination-PLS (UVE-PLS) instead of leave-one-out strategy for estimating the contributions of each wavelength variable in the PLS model. In EMCUVE wavelength variables are evaluated by different Monte Carlo uninformative variable elimination (MCUVE) models. Moreover, a fusion of MCUVE and the vote rule can obtain an improvement over the original uninformative variable elimination method. Results obtained from simulated data and real data sets demonstrate that EMCUVE can properly carry out wavelength selection in the course of data analysis and improve predictive ability for multivariate calibration model. 相似文献
7.
Most multivariate calibration methods require selection of tuning parameters, such as partial least squares (PLS) or the Tikhonov regularization variant ridge regression (RR). Tuning parameter values determine the direction and magnitude of respective model vectors thereby setting the resultant predication abilities of the model vectors. Simultaneously, tuning parameter values establish the corresponding bias/variance and the underlying selectivity/sensitivity tradeoffs. Selection of the final tuning parameter is often accomplished through some form of cross-validation and the resultant root mean square error of cross-validation (RMSECV) values are evaluated. However, selection of a “good” tuning parameter with this one model evaluation merit is almost impossible. Including additional model merits assists tuning parameter selection to provide better balanced models as well as allowing for a reasonable comparison between calibration methods. Using multiple merits requires decisions to be made on how to combine and weight the merits into an information criterion. An abundance of options are possible. Presented in this paper is the sum of ranking differences (SRD) to ensemble a collection of model evaluation merits varying across tuning parameters. It is shown that the SRD consensus ranking of model tuning parameters allows automatic selection of the final model, or a collection of models if so desired. Essentially, the user’s preference for the degree of balance between bias and variance ultimately decides the merits used in SRD and hence, the tuning parameter values ranked lowest by SRD for automatic selection. The SRD process is also shown to allow simultaneous comparison of different calibration methods for a particular data set in conjunction with tuning parameter selection. Because SRD evaluates consistency across multiple merits, decisions on how to combine and weight merits are avoided. To demonstrate the utility of SRD, a near infrared spectral data set and a quantitative structure activity relationship (QSAR) data set are evaluated using PLS and RR. 相似文献
8.
A strategy that iteratively retains informative variables for selecting optimal variable subset in multivariate calibration 总被引:4,自引:0,他引:4
Yong-Huan Yun Wei-Ting Wang Min-Li Tan Yi-Zeng Liang Hong-Dong Li Dong-Sheng Cao Hong-Mei Lu Qing-Song Xu 《Analytica chimica acta》2014
Nowadays, with a high dimensionality of dataset, it faces a great challenge in the creation of effective methods which can select an optimal variables subset. In this study, a strategy that considers the possible interaction effect among variables through random combinations was proposed, called iteratively retaining informative variables (IRIV). Moreover, the variables are classified into four categories as strongly informative, weakly informative, uninformative and interfering variables. On this basis, IRIV retains both the strongly and weakly informative variables in every iterative round until no uninformative and interfering variables exist. Three datasets were employed to investigate the performance of IRIV coupled with partial least squares (PLS). The results show that IRIV is a good alternative for variable selection strategy when compared with three outstanding and frequently used variable selection methods such as genetic algorithm-PLS, Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS) and competitive adaptive reweighted sampling (CARS). The MATLAB source code of IRIV can be freely downloaded for academy research at the website: http://code.google.com/p/multivariate-calibration/downloads/list. 相似文献
9.
Riccardo LeardiRandy J. Pell 《Analytica chimica acta》2002,461(2):189-200
Variable selection using a genetic algorithm is combined with partial least squares (PLS) for the prediction of additive concentrations in polymer films using Fourier transform-infrared (FT-IR) spectral data. An approach using an iterative application of the genetic algorithm is proposed. This approach allows for all variables to be considered and at the same time minimizes the risk of overfitting. We demonstrate that the variables selected by the genetic algorithm are consistent with expert knowledge. This very exciting result is a convincing application that the algorithm can select correct variables in an automated fashion. 相似文献
10.
Javier Moros 《Analytica chimica acta》2008,630(2):150-160
A new cut-off criterion has been proposed for the selection of uninformative variables prior to chemometric partial least squares (PLS) modelling. After variable elimination, PLS regressions were made and assessed comparing the results with those obtained by PLS models based on the full spectral range. To assess the prediction capabilities, uninformative variable elimination (UVE)-PLS and PLS were applied to diffuse reflectance near-infrared spectra of heroin samples. The application of the proposed new cut-off criterion, based on the t-Students distribution, provided similar predictive capabilities of the PLS models than those obtained using the original criteria based on quantile value. However, the repeatability of the number of selected variables was improved significantly. 相似文献
11.
Jan Gerretzen Ewa Szymańska Jacob Bart Antony N. Davies Henk-Jan van Manen Edwin R. van den Heuvel Jeroen J. Jansen Lutgarde M.C. Buydens 《Analytica chimica acta》2016
The aim of data preprocessing is to remove data artifacts—such as a baseline, scatter effects or noise—and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames. 相似文献
12.
Hemmateenejad B Abbaspour A Maghami H Miri R Panjehshahin MR 《Analytica chimica acta》2006,575(2):290-299
The partial least squares regression method has been applied for simultaneous spectrophotometric determination of harmine, harmane, harmalol and harmaline in Peganum harmala L. (Zygophyllaceae) seeds. The effect of pH was optimized employing multivariate definition of selectivity and sensitivity and best results were obtained in basic media (pH > 9). The calibration models were optimized for number of latent variables by the cross-validation procedure. Determinations were made over the concentration range of 0.15-10 μg mL−1. The proposed method was validated by applying it to the analysis of the β-carbolines in synthetic quaternary mixtures of media at pH 9 and 11. The relative standard errors of prediction were less than 4% in most cases. Analysis of P. harmala seeds by the proposed models for contents of the β-carboline derivatives resulted in 1.84%, 0.16%, 0.25% and 3.90% for harmine, harmane, harmaline and harmalol, respectively. The results were validated against an existing HPLC method and it no significant differences were observed between the results of two methods. 相似文献
13.
Halide and thiocyanate ions can be determined by a precipitation titration with silver nitrate as the titrant, and the end-point can be evaluated by a potentiometric method, in which generally a silver indicator electrode is used as the indicator electrode and a double-junction Ag–AgCl electrode as the reference electrode. However, when mixtures of halide and thiocyanate are titrated, it is difficult to determine these components individually for there are overlapping steps in the potentiometric titration curves, especially in the case that there are obvious differences between concentrations of the components. In this paper, the linear equation for the potentiometric precipitation titration of a mixture of halide and thiocyanate ions was developed and it was then used for determining the components in the mixtures simultaneously with the aid of multivariate calibration methods. By application of this model, 27 synthetic mixtures with three- and four-component combinations of chloride, bromide, iodide and thiocyanate with low concentration levels from 1.8×10−4 to 6.2×10−4 mol l−1 were analyzed and acceptable results were obtained. 相似文献
14.
Marengo E Robotti E Bobba M Milli A Campostrini N Righetti SC Cecconi D Righetti PG 《Analytical and bioanalytical chemistry》2008,390(5):1327-1342
2D gel electrophoresis is a tool for measuring protein regulation, involving image analysis by dedicated software (PDQuest,
Melanie, etc.). Here, partial least squares discriminant analysis was applied to improve the results obtained by classic image
analysis and to identify the significant spots responsible for the differences between two datasets. A human colon cancer
HCT116 cell line was analyzed, treated and not treated with a new histone deacetylase inhibitor, RC307. The proteins regulated
by RC307 were detected by analyzing the total lysates and nuclear proteome profiles. Some of the regulated spots were identified
by tandem mass spectrometry. The preliminary data are encouraging and the protein modulation reported is consistent with the
antitumoral effect of RC307 on the HCT116 cell line. Partial least squares discriminant analysis coupled with backward elimination
variable selection allowed the identification of a larger number of spots than classic PDQuest analysis. Moreover, it allows
the achievement of the best performances of the model in terms of prediction and provides therefore more robust and reliable
results. From this point of view, the multivariate procedure applied can be considered a good alternative to standard differential
analysis, also taking into account the interdependencies existing among the variables. 相似文献
15.
In this study, an algorithm for growing neural networks is proposed. Starting with an empty network the algorithm reduces the error of prediction by subsequently inserting connections and neurons. The type of network element and the location where to insert the element is determined by the maximum reduction of the error of prediction. The algorithm builds non-uniform neural networks without any constraints of size and complexity. The algorithm is additionally implemented into two frameworks, which use a data set limited in size very efficiently, resulting in a more reproducible variable selection and network topology.
The algorithm is applied to a data set of binary mixtures of the refrigerants R22 and R134a, which were measured by a surface plasmon resonance (SPR) device in a time-resolved mode. Compared with common static neural networks all implementations of the growing neural networks show better generalization abilities resulting in low relative errors of prediction of 0.75% for R22 and 1.18% for R134a using unknown data. 相似文献
16.
J.L Beltrán 《Analytica chimica acta》2004,501(2):137-141
This paper describes a new procedure for the determination of quinolones ciprofloxacin and sarafloxacin in chicken muscle samples. It is based on a previously developed capillary zone electrophoresis (CZE) separation, in which all the quinolones regulated by EU Council Regulation number 2377/90 could be separated. However, as ciprofloxacin and sarafloxacin coelute in the CZE run and they have strongly overlapped spectra, separation between them is not possible.To overcome this problem, we have used a multivariate calibration procedure (partial least square regression (PLS-2)), applied to the spectra obtained at the maximum of the electrophoretic peaks, by using a diode array detector. The method has been validated by a combination of pure standards and fortified blank chicken muscle extracts. The recoveries obtained in the validation set were 101±6 and 93±6% for sarafloxacin and ciprofloxacin, respectively. The method has been also applied to chicken muscle samples, fortified at concentration levels between 100 and 350 μg kg−, corresponding to values near the maximum residue level (MRL) regulated by the European Community. 相似文献
17.
18.
Near-infrared spectroscopy and multivariate calibration for the quantitative determination of certain properties in the petrochemical industry 总被引:3,自引:0,他引:3
Near-infrared (NIR) spectroscopy in conjunction with chemometric techniques allows on-line monitoring in real time, which can be of considerable use in industry. If it is to be correctly used in industrial applications, generally some basic considerations need to be taken into account, although this does not always apply. This study discusses some of the considerations that would help evaluate the possibility of applying multivariate calibration in combination with NIR to properties of industrial interest. Examples of these considerations are whether there is a relation between the NIR spectrum and the property of interest, what the calibration constraints are and how a sample-specific error of prediction can be quantified. Various strategies for maintaining a multivariate model after it has been installed are also presented and discussed. 相似文献
19.
J. J. Berzas Nevado J. Rodríguez Flores G. Castaeda Pealvo 《Analytica chimica acta》1997,340(1-3):257-265
Two spectrophotometric methods for the determination of Ethinylestradiol (ETE) and Levonorgestrel (LEV) by using the multivariate calibration technique of partial least square (PLS) and principal component regression (PCR) are presented. In this study the PLS and PCR are successfully applied to quantify both hormones using the information contained in the absorption spectra of appropriate solutions. In order to do this, a calibration set of standard samples composed of different mixtures of both compounds has been designed. The results found by application of the PLS and PCR methods to the simultaneous determination of mixtures, containing 4–11 μg ml−1 of ETE and 2–23 μg ml−1 of LEV, are reported. Five different oral contraceptives were analyzed and the results were very similar to that obtained by a reference liquid Chromatographic method. 相似文献
20.
A novel alternative for the simultaneous determination of compounds with similar structure is described, using the whole chemiluminescence-time profiles, acquired by the stopped-flow technique, in combination with mathematical treatments of multivariate calibration. The proposed method is based on the chemiluminescent oxidation of morphine and naloxone by their reaction with potassium permanganate in an acidic medium, using formaldehyde as co-factor. The whole chemiluminescence-time profiles, acquired using the stopped-flow technique in a continuous-flow system, allowed the use of the time-resolved chemiluminescence (CL) data in combination with multivariate calibration techniques, as partial least squares (PLS), for the quantitative determination of both opiate narcotics in binary mixtures.In order to achieve overcoat the additivity of the CL profiles and beside to obtain CL profiles for each drug the most separated as possible in the time, the optimum chemical conditions for the CL emission were investigated. The effect of common emission enhancers on the CL emission obtained in the oxidation reaction of these compounds in different acidic media was studied. The parameters selected were sulphuric acid 1.0 mol L−1, permanganate 0.2 mmol L−1 and formaldehyde 0.8 mol L−1. A calibration set of standard samples was designed by combination of a factorial design, with three levels for each factor and a central composite design. Finally, with the aim of validating the chemometric proposed method, a prediction set of binary samples was prepared. Using the multivariate calibration method proposed, the analytes were determined in synthetic samples, obtaining recoveries of 97-109%. 相似文献