首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An efficient method for detecting malicious and accidental contamination of foods has been developed using a combined 1H nuclear magnetic resonance (NMR) and chemometrics approach. The method has been demonstrated using a commercially available carbonated soft drink, as being capable of identifying atypical products and to identify contaminant resonances. Soft-independent modelling of class analogy (SIMCA) was used to compare 1H NMR profiles of genuine products (obtained from the manufacturer) against retail products spiked in the laboratory with impurities. The benefits of using feature selection for extracting contaminant NMR frequencies were also assessed. Using example impurities (paraquat, p-cresol and glyphosate) NMR spectra were analysed using multivariate methods resulting in detection limits of approximately 0.075, 0.2, and 0.06 mM for p-cresol, paraquat and glyphosate, respectively. These detection limits are shown to be approximately 100-fold lower than the minimum lethal dose for paraquat. The methodology presented here is used to assess the composition of complex matrices for the presence of contaminating molecules without a priori knowledge of the nature of potential contaminants. The ability to detect if a sample does not fit into the expected profile without recourse to multiple targeted analyses is a valuable tool for incident detection and forensic applications.  相似文献   

2.
The aim of data preprocessing is to remove data artifacts—such as a baseline, scatter effects or noise—and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames.  相似文献   

3.
This paper uses Mutual Information as an alternative variable selection method for quantitative structure-property relationships data. To evaluate the performance of this criterion, the enantioselectivity of 67 molecules, in three different chiral stationary phases, is modelled. Partial Least Squares together with three commonly used variable selection techniques was evaluated and then compared with the results obtained when using Mutual Information together with Support Vector Machines. The results show not only that variable selection is a necessary step in quantitative structure-property relationship modelling, but also that Mutual Information associated with Support Vector Machines is a valuable alternative to Partial Least Squares together with correlation between the explanatory and the response variables or Genetic Algorithms. This study also demonstrates that by producing models that use a rather small set of variables the interpretation can be also be improved.  相似文献   

4.
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.  相似文献   

5.
Being able to predict the final product yield at all stages in long-running, industrial, mammalian cell culture processes is vital for both operational efficiency, process consistency, and the implementation of quality by design (QbD) practices. Here we used Raman spectroscopy to monitor (in terms of glycoprotein yield prediction) a fed-batch fermentation from start to finish. Raman data were collected from 12 different time points in a Chinese hamster ovary (CHO) based manufacturing process and across 37 separate production runs. The samples comprised of clarified bioprocess broths extracted from the CHO cell based process with varying amounts of fresh and spent cell culture media. Competitive adaptive reweighted sampling (CoAdReS) and ant colony optimization (ACO) variable selection methods were used to enhance the predictive ability of the chemometric models by removing unnecessary spectral information. Using CoAdReS accurate prediction models (relative error of predictions between 2.1% and 3.3%) were built for the final glycoprotein yield at every stage of the bioprocess from small scale up to the final 5000 L bioreactor. This result reinforces our previous studies which indicate that media quality is one of the most significant factors determining the efficiency of industrial CHO-cell processes. This Raman based approach could thus be used to manage production in terms of selecting which small scale batches are progressed to large-scale manufacture, thus improving process efficiency significantly.  相似文献   

6.
Multivariate calibration problems often involve the identification of a meaningful subset of variables, from a vast number of variables for better prediction of output variables. A new graph theoretic method based on partial correlations (variable interaction network—VIN) is proposed. Many well studied representative calibration datasets spanning different application domains are selected for investigating the performance. Partial least squares (PLS) regression models combined with variable selection techniques are employed for benchmarking the performance. Subsets of variables with different number of variables are retained for the final analysis after VIN selection and progressive prediction accuracies are used for comparison. VIN-PLS results show significant improvement in prediction efficiencies and variable subset optimization. Improvement of up to 45% over existing methods with significantly fewer variables is achieved using the new method. Advantages of VIN based variable selection are highlighted.  相似文献   

7.
In this study, an algorithm for growing neural networks is proposed. Starting with an empty network the algorithm reduces the error of prediction by subsequently inserting connections and neurons. The type of network element and the location where to insert the element is determined by the maximum reduction of the error of prediction. The algorithm builds non-uniform neural networks without any constraints of size and complexity. The algorithm is additionally implemented into two frameworks, which use a data set limited in size very efficiently, resulting in a more reproducible variable selection and network topology.

The algorithm is applied to a data set of binary mixtures of the refrigerants R22 and R134a, which were measured by a surface plasmon resonance (SPR) device in a time-resolved mode. Compared with common static neural networks all implementations of the growing neural networks show better generalization abilities resulting in low relative errors of prediction of 0.75% for R22 and 1.18% for R134a using unknown data.  相似文献   


8.
A new variable selection algorithm is described, based on ant colony optimization (ACO). The algorithm aim is to choose, from a large number of available spectral wavelengths, those relevant to the estimation of analyte concentrations or sample properties when spectroscopic analysis is combined with multivariate calibration techniques such as partial least-squares (PLS) regression. The new algorithm employs the concept of cooperative pheromone accumulation, which is typical of ACO selection methods, and optimizes PLS models using a pre-defined number of variables, employing a Monte Carlo approach to discard irrelevant sensors. The performance has been tested on a simulated system, where it shows a significant superiority over other commonly employed selection methods, such as genetic algorithms. Several near infrared spectroscopic experimental data sets have been subjected to the present ACO algorithm, with PLS leading to improved analytical figures of merit upon wavelength selection. The method could be helpful in other chemometric activities such as classification or quantitative structure-activity relationship (QSAR) problems.  相似文献   

9.
A new heuristic and parallel simulated annealing algorithm was proposed for variable selection in near‐infrared spectroscopy analysis. The algorithm employs a parallel mechanism to enhance the search efficiency, a heuristic mechanism to generate high‐quality candidate solutions, and the concept of Metropolis criterion to estimate accuracy of the candidate solutions. Several near‐infrared datasets have been evaluated under the proposed new algorithm, with partial least squares leading to improved analytical figures of merit upon wavelength selection. Improved robust and predictive regression models were obtained by the new algorithm. The method could also be helpful in other chemometric activities such as classification or quantitative structure‐activity relationship problems.  相似文献   

10.
This paper presents a Bayesian approach to the development of spectroscopic calibration models. By formulating the linear regression in a probabilistic framework, a Bayesian linear regression model is derived, and a specific optimization method, i.e. Bayesian evidence approximation, is utilized to estimate the model “hyper-parameters”. The relation of the proposed approach to the calibration models in the literature is discussed, including ridge regression and Gaussian process model. The Bayesian model may be modified for the calibration of multivariate response variables. Furthermore, a variable selection strategy is implemented within the Bayesian framework, the motivation being that the predictive performance may be improved by selecting a subset of the most informative spectral variables. The Bayesian calibration models are applied to two spectroscopic data sets, and they demonstrate improved prediction results in comparison with the benchmark method of partial least squares.  相似文献   

11.
In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.  相似文献   

12.
This work proposes a modification to the successive projections algorithm (SPA) aimed at selecting spectral variables for multiple linear regression (MLR) in the presence of unknown interferents not included in the calibration data set. The modified algorithm favours the selection of variables in which the effect of the interferent is less pronounced. The proposed procedure can be regarded as an adaptive modelling technique, because the spectral features of the samples to be analyzed are considered in the variable selection process. The advantages of this new approach are demonstrated in two analytical problems, namely (1) ultraviolet–visible spectrometric determination of tartrazine, allure red and sunset yellow in aqueous solutions under the interference of erythrosine, and (2) near-infrared spectrometric determination of ethanol in gasoline under the interference of toluene. In these case studies, the performance of conventional MLR-SPA models is substantially degraded by the presence of the interferent. This problem is circumvented by applying the proposed Adaptive MLR-SPA approach, which results in prediction errors smaller than those obtained by three other multivariate calibration techniques, namely stepwise regression, full-spectrum partial-least-squares (PLS) and PLS with variables selected by a genetic algorithm. An inspection of the variable selection results reveals that the Adaptive approach successfully avoids spectral regions in which the interference is more intense.  相似文献   

13.
14.
Chemometric Modeling Markup Language (CMML) is developed by us for containing chemometrics models within one document through converting binary data into strings by base64 encode/decode algorithms to solve the interoperability issue in sharing chemometrics models. It provides a base functionality for storage of sampling, variable selection, pretreating, outlier and modeling parameters and data. With the help of base64 algorithm, the usability of CMML is in equilibrium with size by transforming the binary data into base64 encoded string. Due to the advantages of Extensible Markup Language (XML), models stored in CMML can be easily reused in various other software and programming languages as long as the programming language has XML parsing library. One can also use the XML Path Language (XPath) query language to select desired data from the CMML file effectively. The application of this language in near infrared spectroscopy model storage is implemented as a class in C++ language and available as open source software (http://code.google.com/p/cmml), and the implementations in other languages, such as MATLAB and R are in progress.  相似文献   

15.
Variable selection using a genetic algorithm is combined with partial least squares (PLS) for the prediction of additive concentrations in polymer films using Fourier transform-infrared (FT-IR) spectral data. An approach using an iterative application of the genetic algorithm is proposed. This approach allows for all variables to be considered and at the same time minimizes the risk of overfitting. We demonstrate that the variables selected by the genetic algorithm are consistent with expert knowledge. This very exciting result is a convincing application that the algorithm can select correct variables in an automated fashion.  相似文献   

16.
17.
2D gel electrophoresis is a tool for measuring protein regulation, involving image analysis by dedicated software (PDQuest, Melanie, etc.). Here, partial least squares discriminant analysis was applied to improve the results obtained by classic image analysis and to identify the significant spots responsible for the differences between two datasets. A human colon cancer HCT116 cell line was analyzed, treated and not treated with a new histone deacetylase inhibitor, RC307. The proteins regulated by RC307 were detected by analyzing the total lysates and nuclear proteome profiles. Some of the regulated spots were identified by tandem mass spectrometry. The preliminary data are encouraging and the protein modulation reported is consistent with the antitumoral effect of RC307 on the HCT116 cell line. Partial least squares discriminant analysis coupled with backward elimination variable selection allowed the identification of a larger number of spots than classic PDQuest analysis. Moreover, it allows the achievement of the best performances of the model in terms of prediction and provides therefore more robust and reliable results. From this point of view, the multivariate procedure applied can be considered a good alternative to standard differential analysis, also taking into account the interdependencies existing among the variables.  相似文献   

18.
A simple and environment friendly method was developed for determination of Malathion content of analytical and commercial insecticide samples with no special preparation. Attenuated total reflectance-Fourier transform infrared (ATR-FTIR) spectra were characterized and 1000-2000 cm−1 region was selected for quantitative analysis utilizing partial least square (PLS) and two wavelength selection methods: (a) principal component regression (PCR) and (b) genetic algorithm (GA). Relative error of prediction (REP) was calculated in PLS, PCR-PLS and GA-PLS methods and was 3.536, 1.656 and 0.188, respectively. Proposed method is successfully applicable for quantification of Malathion in commercial grade samples and reliable results in comparison with known methods, confirms this idea.  相似文献   

19.
Investigations have shown that visible-near-infrared (VNIR) spectroscopy can accurately determine soil properties under laboratory conditions. In situ assessment of soil properties is of great benefit for several applications, as spectra can be acquired fast and almost continuously. The present study used partial least squares (PLS) regression to establish a relationship between soil reflectance spectra measured under field conditions and the organic matter and clay content of the soil. Spectra were acquired with a fieldspectrometer in a recently reconstructed floodplain along the river Rhine in The Netherlands. Several spectral pre-processing methods were employed to improve the performance and robustness of the models. Results indicate that, under varying surface conditions, field spectroscopy in combination with multivariate calibration does result in a qualitative relation for organic matter (R2=0.45) and clay content (R2=0.43) while under laboratory conditions more accurate results are obtained (R2=0.69 and 0.92, respectively). Soil moisture and vegetation cover had a negative influence on the prediction capabilities for both soil properties. Although the performance of the spectra measured in situ is not as accurate as physical analysis, the accuracy obtained is useful for rapid soil characterisation and remote sensing applications.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号