首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Many commercially available software programs claim similar efficiency and accuracy as variable selection tools. Genetic algorithms are commonly used variable selection methods where most relevant variables can be differentiated from less important variables using evolutionary computing techniques. However, different vendors offer several algorithms, and the puzzling question is: which one is the appropriate method of choice? In this study, several genetic algorithm tools (e.g. GFA from Cerius2, QuaSAR-Evolution from MOE and Parteks genetic algorithm) were compared. Stepwise multiple linear regression models were generated using the most relevant variables identified by the above genetic algorithms. This procedure led to the successful generation of Quantitative Structure–activity Relationship (QSAR) models for (a) proprietary datasets and (b) the Selwood dataset.  相似文献   

2.
Many commercially available software programs claim similar efficiency and accuracy as variable selection tools. Genetic algorithms are commonly used variable selection methods where most relevant variables can be differentiated from ‘less important’ variables using evolutionary computing techniques. However, different vendors offer several algorithms, and the puzzling question is: which one is the appropriate method of choice? In this study, several genetic algorithm tools (e.g. GFA from Cerius2, QuaSAR-Evolution from MOE and Partek’s genetic algorithm) were compared. Stepwise multiple linear regression models were generated using the most relevant variables identified by the above genetic algorithms. This procedure led to the successful generation of Quantitative Structure–activity Relationship (QSAR) models for (a) proprietary datasets and (b) the Selwood dataset.  相似文献   

3.
4.
Six different clones of 1-year-old loblolly pine (Pinus taeda L.) seedlings grown under standardized conditions in a green house were used for sample preparation and further analysis. Three independent and complementary analytical techniques for metabolic profiling were applied in the present study: hydrophilic interaction chromatography (HILIC-LC/ESI-MS), reversed-phase liquid chromatography (RP-LC/ESI-MS), and gas chromatography all coupled to mass spectrometry (GC/TOF-MS). Unsupervised methods, such as principle component analysis (PCA) and clustering, and supervised methods, such as classification, were used for data mining. Genetic algorithms (GA), a multivariate approach, was probed for selection of the smallest subsets of potentially discriminative classifiers. From more than 2000 peaks found in total, small subsets were selected by GA as highly potential classifiers allowing discrimination among six investigated genotypes. Annotated GC/TOF-MS data allowed the generation of a small subset of identified metabolites. LC/ESI-MS data and small subsets require further annotation. The present study demonstrated that combination of comprehensive metabolic profiling and advanced data mining techniques provides a powerful metabolomic approach for biomarker discovery among small molecules. Utilizing GA for feature selection allowed the generation of small subsets of potent classifiers.  相似文献   

5.
The non-linear regression technique known as alternating conditional expectations (ACE) method is only applicable when the number of objects available for calibration is considerably greater than the number of considered predictors. Alternating conditional expectations regression with selection of significant predictors by genetic algorithms (GA-ACE), the non-linear regression technique presented here, is based on the ACE algorithm but introducing several modifications to resolve the applicability limitations of the original ACE method, thus facilitating the practical implementation of a very interesting calibration tool. In order to overcome the lack of reliability displayed by the original ACE algorithm when working on data sets characterized by a too large number of variables and prior to the development of the non-linear regression model, GA-ACE applies genetic algorithms as a variable selection technique to select a reduced subset of significant predictors able to accurately model and predict a considered variable response. Furthermore, GA-ACE actually provides two alternative application approaches, since it allows either the performance of prior data compression computing a number of principal components to be subsequently subjected to GA-selection, or working directly on original variables.In this study, GA-ACE was applied to two real calibration problems, with a very low observation/variable ratio (NIR data), and the results were compared with those obtained by several linear regression techniques usually employed. When using the GA-ACE non-linear method, notably improved regression models were developed for the two response variables modeled, with root mean square errors of the residuals in external prediction (RMSEP) equal to 11.51 and 6.03% for moisture and lipid contents of roasted coffee samples, respectively. The improvement achieved by applying the new non-linear method introduced is even more remarkable taking into account the results obtained with the best performance linear method (IPW-PLS) applied to predict the studied responses (14.61 and 7.74% RMSEP, respectively).  相似文献   

6.
The applicability of genetic algorithms for solving multicomponent analyses is systematically examined. As a genetic algorithm (GA), the basic proposal of Goldberg is implemented in a straightforward manner to simulate multicomponent analyses in analogy to the well-established UV-vis or IR methods, especially multicomponent regression. The main focus of the study is to investigate the behavior of the genetic algorithm in order to compare it with the well-known behavior of multicomponent regression. A remarkable difference between the two methods is that the genetic algorithm method does not need any calibration procedure because of its pure searching characteristic. As important features of multicomponent systems, the degree of signal overlap (selectivity), the behavior of systems with known and unknown component numbers and qualities, and linear as well as nonlinear relationships between the analytical signal and concentration are varied within the simulations. According to multicomponent regression, recovering concentrations by a genetic algorithm is of limited applicability with the exception of systems at a low degree of signal overlap. On the other hand, the recovery of a probe spectrum in the analytical process always gives satisfactory results independent of the features of the probe system. The genetic algorithm obviously shows autoadaptive behavior in probe spectrum recovery. The quality and quantity of the resulting components may dramatically differ from the given probe, although the resulting spectrum is nearly the same. In such cases, the resulting component mixture can be interpreted as an imitation of the probe. As well probe spectra, theoretically designed spectra can also be autoadapted by genetic algorithms. The only limitation is that the desired spectrum must, of course, be incorporated into the search space defined by the involved components. Furthermore, a spectral signal is only one single property of a chemical compound or mixture. Because of the nonlinear search characteristic of genetic algorithms, any other chemical or physical property can also be treated as a desired property. Therefore, the conclusion of the study is well-founded that an old challenge of applied chemistry, namely, the development of new chemical products with desired properties, seems to be reachable under the control of genetic algorithms.  相似文献   

7.
Simultaneous multicomponent analysis is usually carried out by multivariate calibration models such as partial least squares (PLS) that utilize the full spectrum. It has been demonstrated by both experimental and theoretical considerations that better results can be obtained by a proper selection of the spectral range to be included in calculations. A genetic algorithm is one of the most popular methods for selecting variables for PLS calibration of mixtures with almost identical spectra without loss of prediction capacity. In this work, a simple and precise method for rapid and accurate simultaneous determination of sulfide and sulfite ions based on the addition reaction of these ions with new fuchsin at pH 8 and 25°C by PLS regression and using a genetic algorithm (GA) for variable selection is proposed. The concentrations of sulfide and sulfite ions varied between 0.05–2.50 and 0.15–2.00 μg/mL, respectively. A series of synthetic solutions containing different concentrations of sulfide and sulfite were used to check the prediction ability of GA-PLS models. The root mean square error of prediction with PLS on the whole data set was 0.19 μg/mL for sulfide and 0.09 μg/mL for sulfite. After the application of GA, these values were reduced to 0.04 and 0.03 μg/mL, respectively. The text was submitted by the authors in English.  相似文献   

8.
For a set of a priori given radionuclides, extracted from a general nuclide data library, the authors use median estimates of the gamma-peak areas and estimates of their errors to produce a list of possible radionuclides matching gamma-ray line(s) and some measure of the reliability of this assignment.

An a priori determined list of nuclides is obtained by searching for a match with the energy information of the database. This procedure is performed in an interactive graphic mode by markers that superimpose the energy information provided by a general gamma-ray data library on the spectral data. This library of experimental data includes approximately 17,000 gamma-energy lines related to 756 known gamma emitter radionuclides listed by ICRP.  相似文献   


9.
The insertion of random sequences into protein-encoding genes in combination with biologicalselection techniques has become a valuable tool in the design of molecules that have usefuland possibly novel properties. By employing highly effective screening protocols, a functionaland unique structure that had not been anticipated can be distinguished among a hugecollection of inactive molecules that together represent all possible amino acid combinations.This technique is severely limited by its restriction to a library of manageable size. Oneapproach for limiting the size of a mutant library relies on doping schemes, where subsetsof amino acids are generated that reveal only certain combinations of amino acids in a proteinsequence. Three mononucleotide mixtures for each codon concerned must be designed, suchthat the resulting codons that are assembled during chemical gene synthesis represent thedesired amino acid mixture on the level of the translated protein. In this paper we present adoping algorithm that reverse translates a desired mixture of certain amino acids into threemixtures of mononucleotides. The algorithm is designed to optimally bias these mixturestowards the codons of choice. This approach combines a genetic algorithm with localoptimization strategies based on the downhill simplex method. Disparate relativerepresentations of all amino acids (and stop codons) within a target set can be generated.Optional weighing factors are employed to emphasize the frequencies of certain amino acidsand their codon usage, and to compensate for reaction rates of different mononucleotidebuilding blocks (synthons) during chemical DNA synthesis. The effect of statistical errors thataccompany an experimental realization of calculated nucleotide mixtures on the generatedmixtures of amino acids is simulated. These simulations show that the robustness of differentoptima with respect to small deviations from calculated values depends on their concomitantfitness. Furthermore, the calculations probe the fitness landscape locally and allow apreliminary assessment of its structure.  相似文献   

10.
We demonstrate the possibility to design molecules for specific tasks, using a fully automatic global optimization setup employing genetic algorithms. As an example, we tune the two excitation wavelengths of a molecular switch backbone to arbitrarily pre-set values, by an automatic optimization of the substituent pattern.  相似文献   

11.
The aim of data preprocessing is to remove data artifacts—such as a baseline, scatter effects or noise—and to enhance the contextually relevant information. Many preprocessing methods exist to deliver one or more of these benefits, but which method or combination of methods should be used for the specific data being analyzed is difficult to select. Recently, we have shown that a preprocessing selection approach based on Design of Experiments (DoE) enables correct selection of highly appropriate preprocessing strategies within reasonable time frames.  相似文献   

12.
Multivariate curve resolution (MCR) and especially the orthogonal projection approach (OPA) can be applied to spectroscopic data and were proved to be suitable for process monitoring. To improve the quality of the on-line monitoring of batch processes, it is interesting to get as many as possible spectra in a given period of time. Nevertheless, hardware limitations could lead to the fact that it is not possible to acquire more than a certain number of spectra in this given period of time. Wavelength selection could be a good way to limit this problem since it decreases size, and consequently the acquisition time, of each recorded spectrum. This paper details an industrial application of genetic algorithms (GA) coupled with a curve resolution method (OPA) for such purpose.  相似文献   

13.
《Analytica chimica acta》2002,471(2):173-186
An automated and versatile sequential injection spectrofluorimetric procedure for the simultaneous determination of multicomponent mixtures in micellar medium without prior separation processes is reported. The methodology is based upon the segmentation of a sample slug between two different buffer zones in order to attain both an improvement of sensitivity and residual minimization for the whole species. Resolution of overlapping fluorescence profiles is achieved using a variable angle scanning technique coupled to multivariate least-squares regression (MLR) algorithms at both sample edges.The potentialities of the described methodology are illustrated with the spectrofluorimetric determination of four widespread pesticides with different acid-base properties; viz. carbaryl (CBL) (1-naphthyl-N-methylcarbamate), fuberidazole (FBZ) (2-(2′-furyl)benzimidazole), thiabendazole (TBZ) (2-(4′-thiazolyl)benzimidazole) and warfarin (W) (3-α-acetonylbenzyl)-4-hydroxycoumarin). Detection limits at the 3σ level were 3.9, 0.02, 0.03 and 10 μg l−1 for CBL, FBZ, TBZ and W, respectively at the maximum sensitivity pH. Dynamic ranges of 13-720 μg l−1 CBL, 0.10-14 μg l−1 FBZ, 0.19-60 μg l−1 TBZ and 0.05-5 mg l−1 W were achieved. Relative standard deviations (n=10) were 0.2% for 100 μg l−1 CBL and 2.4 μg l−1 FBZ, 0.7% for 8 μg l−1 TBZ and 1.0% for 1 mg l−1 W. The proposed automated methodology, which handles 17 samples/h, was validated and applied to spiked real water samples with very satisfactory results.  相似文献   

14.
Science China Chemistry - Variable selection is a universal problem in building multivariate calibration models, such as quantitative structure-activity relationship (QSAR) and quantitative...  相似文献   

15.
Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of ‘survival of the fittest’ from Darwin’s natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm–partial least squares (GA–PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.  相似文献   

16.
The recent discovery of riboswitches in diverse species of bacteria and few eukaryotes added metabolite-responsive gene regulation to the growing list of RNA functions in biology. The natural riboswitches have inspired several designs of synthetic analogues capable of gene regulation in response to a small molecule trigger. In this work, we describe our efforts to engineer complex riboswitches capable of sensing and responding to two small molecules according to Boolean logics AND and NAND. Two aptamers that recognize theophylline and thiamine pyrophosphate were embedded in tandem in the 5' UTR of bacterial mRNA, and riboswitches that function as logic gates were isolated by dual genetic selection. The diverse phenotype of the engineered logic gates supports the versatility of RNA-based gene regulation which may have preceded the modern protein-based gene regulators. Additionally, our design strategy advances our ability to harness the versatile capacities of RNA to program complex behavior in bacteria without the use of engineered proteins.  相似文献   

17.
18.
A genetic algorithm has been developed for molecular mechanics calculations. It has been proved to be a robust and efficient structure optimization technique. Because it uses randomly generated starting structures and stochastic operators, the resulting structures are not subjected to the chemist's bias. © 1994 by John Wiley & Sons, Inc.  相似文献   

19.
Simultaneous multicomponent analysis is usually carried out using multivariate calibration models, such as the partial least squares (PLS) one, that utilize the full spectrum. It has been shown by both experimental and theoretical considerations that better results can by obtained by proper selection of the spectral range to be included in calculations. A genetic algorithm (GA) is one of the most popular methods for selecting variables for PLS calibration of mixtures with almost identical spectra without loss of predictive capability. In this work, a simple and precise method for rapid and accurate simultaneous determination of sulfide and sulfite ions based on the addition reaction of these ions with new fuchsin at pH 8 and 25°C using PLS regression and GA for variable selection is proposed. The concentrations of sulfide ions varied between 0.05–2.50 and 0.15–2.00 μg/mL, respectively. A series of model solutions containing different concentrations of sulfide and sulfite were used to check the predictive ability of GA-PLS models. The root mean square error of prediction with PLS on the whole data set was 0.19 μg/mL for sulfide and 0.09 μg/mL for sulfite. After the application of GA, these values reduced to 0.04 and 0.03 μg/mL, respectively. The text was submitted by the authors in English.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号