首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

2.
Abstract

Quantitative structure-activity relationship (QSAR) studies based on chemometric techniques are reviewed. Partial least squares (PLS) is introduced as a novel robust method to replace classical methods such as multiple linear regression (MLR). Advantages of PLS compared to MLR are illustrated with typical applications. Genetic algorithm (GA) is a novel optimization technique which can be used as a search engine in variable selection. A novel hybrid approach comprising GA and PLS for variable selection developed in our group (GAPLS) is described. The more advanced method for comparative molecular field analysis (CoMFA) modeling called GA-based region selection (GARGS) is described as well. Applications of GAPLS and GARGS to QSAR and 3D-QSAR problems are shown with some representative examples. GA can be hybridized with nonlinear modeling methods such as artificial neural networks (ANN) for providing useful tools in chemometric and QSAR.  相似文献   

3.
Autocatalysis is a ubiquitous chemical process that drives a plethora of biological phenomena, including the self-propagation of prions etiological to the Creutzfeldt-Jakob disease and bovine spongiform encephalopathy. To explain the dynamics of these systems, we have solved the chemical master equation for the irreversible autocatalytic reaction A+B-->2A. This solution comprises the first closed form expression describing the probabilistic time evolution of the populations of autocatalytic and noncatalytic molecules from an arbitrary initial state. Grand probability distributions are likewise presented for autocatalysis in the equilibrium limit (A+B <==>2A), allowing for the first mechanistic comparison of this process with chemical isomerization (B<==>A) in small systems. Although the average population of autocatalytic (i.e., prion) molecules largely conforms to the predictions of the classical "rate law" approach in time and the law of mass action at equilibrium, thermodynamic differences between the entropies of isomerization and autocatalysis are revealed, suggesting a "mechanism dependence" of state variables for chemical reaction processes. These results demonstrate the importance of chemical mechanism and molecularity in the development of stochastic processes for chemical systems and the relationship between the stochastic approach to chemical kinetics and nonequilibrium thermodynamics.  相似文献   

4.
A Genetic Algorithm (GA) is a stochastic optimization technique based on the mechanisms of biological evolution. These algorithms have been successfully applied in many fields to solve a variety of complex nonlinear problems. While they have been used with some success in chemical problems such as fitting spectroscopic and kinetic data, many have avoided their use due to the unconstrained nature of the fitting process. In engineering, this problem is now being addressed through incorporation of adaptive penalty functions, but their transfer to other fields has been slow. This study updates the Nanakorrn Adaptive Penalty function theory, expanding its validity beyond maximization problems to minimization as well. The expanded theory, using a hybrid genetic algorithm with an adaptive penalty function, was applied to analyze variable temperature variable field magnetic circular dichroism (VTVH MCD) spectroscopic data collected on exchange coupled Fe(II)Fe(II) enzyme active sites. The data obtained are described by a complex nonlinear multimodal solution space with at least 6 to 13 interdependent variables and are costly to search efficiently. The use of the hybrid GA is shown to improve the probability of detecting the global optimum. It also provides large gains in computational and user efficiency. This method allows a full search of a multimodal solution space, greatly improving the quality and confidence in the final solution obtained, and can be applied to other complex systems such as fitting of other spectroscopic or kinetics data.  相似文献   

5.
遗传算法用于化学结构图的同态研究   总被引:4,自引:0,他引:4  
提出一种采用整数串编码和基于节点基因交换方式的遗传算法,并应用于化学结构图的同态研究.遗传算法在一组随机生成的表示目标结构与查询结构节点间映射关系的整数串中进行逐步优化,直到找出与查询结构匹配的映射,从而实现化学结构图的同态匹配,并实现多重匹配.  相似文献   

6.
7.
Many important problems in chemistry require knowledge of the 3-D conformation of a molecule. A commonly used computational approach is to search for a variety of low-energy conformations. Here, we study the behavior of the genetic algorithm (GA) method as a global search technique for finding these low-energy conformations. Our test molecule is cyclic hexaglycine. The goal of this study is to determine how to best utilize GAs to find low-energy populations of conformations given a fixed amount of CPU time. Two measures are presented that help monitor the improvement in the GA populations and their loss of diversity. Different hybrid methods that combine coarse GA global search with local gradient minimization are evaluated. We present several specific recommendations about trade-offs when choosing GA parameters such as population size, number of generations, rate of interaction between subpopulations, and combinations of GA and gradient minimization. In particular, our results illustrate why approaches that emphasize convergence of the GA can actually decrease its effectiveness as a global conformation search method. © John Wiley & Sons, Inc.  相似文献   

8.
9.
In this study, chemometric predictive models were developed from near infrared (NIR) spectra for the quantitative determination of saturates, aromatics, resins and asphaltens (SARA) in heavy petroleum products. Model optimisation was based on adequate pre-processing and/or variable selection. In addition to classical methods, the potential of a genetic algorithm (GA) optimisation, which allows the co-optimisation of pre-processing methods and variable selection, was evaluated. The prediction results obtained with the different models were compared and decision regarding their statistical significance was taken applying a randomization t-test. Finally, the results obtained for the root mean square errors of prediction (and the corresponding concentration range) expressed in %(w/w), are 1.51 (14.1-99.1) for saturates, 1.59 (0.7-61.1) for aromatics, 0.77 (0-34.5) for resins and 1.26 (0-14.7) for asphaltens. In addition, the usefulness of the proposed optimisation method for global interpretation is shown, in accordance with the known chemical composition of SARA fractions.  相似文献   

10.
The solution dependence of gas-phase unfolding for ubiquitin [M + 7H]7+ ions has been studied by ion mobility spectrometry-mass spectrometry (IMS-MS). Different acidic water:methanol solutions are used to favor the native (N), more helical (A), or unfolded (U) solution states of ubiquitin. Unfolding of gas-phase ubiquitin ions is achieved by collisional heating and newly formed structures are examined by IMS. With an activation voltage of 100 V, a selected distribution of compact structures unfolds, forming three resolvable elongated states (E1-E3). The relative populations of these elongated structures depend strongly on the solution composition. Activation of compact ions from aqueous solutions known to favor N-state ubiquitin produces mostly the E1 type elongated state, whereas activation of compact ions from methanol containing solutions that populate A-state ubiquitin favors the E3 elongated state. Presumably, this difference arises because of differences in precursor ion structures emerging from solution. Thus, it appears that information about solution populations can be retained after ionization, selection, and activation to produce the elongated states. These data as well as others are discussed. Figure
?  相似文献   

11.
12.
13.
Near-infrared spectroscopy (NIR) is widely used in food quantitative and qualitative analysis. Variable selection technique is a critical step of the spectrum modeling with the development of chemometrics. In this study, a novel variable selection strategy, automatic weighting variable combination population analysis (AWVCPA), is proposed. Firstly, binary matrix sampling (BMS) strategy, which provides each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, the variable frequency (Fre) and partial least squares regression (Reg), two kinds of information vector (IVs), are weighted to obtain the value of the contribution of each spectral variables, and the influence of two IVs of Rre and Reg is considered to each spectral variable. Finally, it uses the exponentially decreasing function (EDF) to remove the low contribution wavelengths so as to select the characteristic variables. In the case of near infrared spectra of beer and corn, yeast and oil concentration models based on partial least squares (PLS) of prediction are established. Compared with other variable selection methods, the research shows that AWVCPA is the best variable selection strategy in the same situation. It has 72.7% improvement comparing AWVCPA-PLS to PLS and the predicted root mean square error (RMSEP) decreases from 0.5348 to 0.1457 on beer dataset. Also it has 64.7% improvement comparing AWVCPA-PLS to PLS and the RMSEP decreases from 0.0702 to 0.0248 on corn dataset.  相似文献   

14.
Czekaj T  Wu W  Walczak B 《Talanta》2008,76(3):564-574
Feature selection, while working with genomic data sets, is of particular interest, not only for classification (diagnostics) improvement, but also for the data interpretability. Application of the multivariate feature selection approaches allows an efficient reduction of data dimensionality, but as demonstrated in our study, sets of the selected variables depend on the objective function of the classifier. It is possible to select different subset of genes for classification due to the correlation of genes but their interpretation ought to be cautiously made.  相似文献   

15.
16.
Poly(vinyl alcohol) (PVA) membranes crosslinked with glutaraldehyde (GA) were prepared by a solution method for the pervaporation separation of acetic acid-water mixtures. In the solution method, dry PVA films were crosslinked by immersion for 2 days at 40°C in reaction solutions which contained different contents of GA, acetone and a catalyst, HCl. In order to fabricate the crosslinked PVA membranes which were stable in aqueous solutions, acetone was used as reaction medium in stead of aqueous inorganic salt solutions which have been commonly used in reaction solution for PVA crosslinking reaction. The crosslinking reaction between the hydroxyl group of PVA and the aldehyde group of GA was characterized by IR spectroscopy. Swelling measurements were carried out in both water and acetic acid to investigate the swelling behavior of the membranes. The swelling behaviour of a membrane fabricated at different GA content in a reaction solution was dependent on crosslinking density and chemical functional groups created as a result of the reaction between PVA and GA, such as the acetal group, ether linkage and unreacted pendent aldehydes in PVA. The pervaporation separation of acetic acid-water mixtures was performed over a range of 70–90 wt% acetic acid in the feed at temperatures varying from 35 to 50°C to examine the separation performances of the PVA membranes. Permeation behaviour through the membranes was analyzed by using pervaporation activation energies which had been calculated from the Arrhenius plots of permeation rates.  相似文献   

17.
This study presents an analytical method for determining interfacial tension and relative density in insulating oils using near infrared spectrometry (NIR). Five different strategies of regression were evaluated: partial least squares (PLS) with significant regression coefficients selected by jack-knife algorithm; interval PLS (iPLS); multiple linear regression (MLR) with variable selection by genetic algorithm (MLR/GA), successive projections algorithm (MLR/SPA) and stepwise strategy (SR/MLR). The overall results point to MLR/SPA as the best modeling strategy. The strategy is simpler and uses fewer spectral variables.  相似文献   

18.
An extended system molecular dynamics method for the isomolar semigrand ensemble (fixed number of particles, pressure, temperature, and fugacity fraction) is developed and applied to the calculation of liquid-liquid equilibria (LLE) for two Lennard-Jones mixtures. The method utilizes an extended system variable to dynamically control the fugacity fraction xi of the mixture by gradually transforming the identity of particles in the system. Two approaches are used to compute coexistence points. The first approach uses multiple-histogram reweighting techniques to determine the coexistence xi and compositions of each phase at temperatures near the upper critical solution temperature. The second approach, useful for cases in which there is no critical solution temperature, is based on principles of small system thermodynamics. In this case a coexistence point is found by running N-P-T-xi simulations at a common temperature and pressure and varying the fugacity fraction to map out the difference in chemical potential between the two species A and B (mu(A)-mu(B)) as a function of composition. Once this curve is known the equal-distance/equal-area criterion is used to determine the coexistence point. Both approaches give results that are comparable to those of previous Monte Carlo (MC) simulations. By formulating this approach in a molecular dynamics framework, it should be easier to compute the LLE of complex molecules whose intramolecular degrees of freedom are often difficult to properly sample with MC techniques.  相似文献   

19.
Feature selection is a key step in Quantitative Structure Activity Relationship (QSAR) analysis. Chance correlations and multicollinearity are two major problems often encountered when attempting to find generalized QSAR models for use in drug design. Optimal QSAR models require an objective variable relevance analysis step for producing robust classifiers with low complexity and good predictive accuracy. Genetic algorithms coupled with information theoretic approaches such as mutual information have been used to find near-optimal solutions to such multicriteria optimization problems. In this paper, we describe a novel approach for analyzing QSAR data based on these methods. Our experiments with the Thrombin dataset, previously studied as part of the KDD (Knowledge Discovery and Data Mining) Cup 2001 demonstrate the feasibility of this approach. It has been found that it is important to take into account the data distribution, the rule "interestingness", and the need to look at more invariant and monotonic measures of feature selection.  相似文献   

20.
The European Consortium "High-throughput analysis of single nucleotide polymorphisms for the forensic identification of persons--SNPforID", has performed a selection of candidate Y-chromosome single nucleotide polymorphisms (SNPs) for making inferences on the geographic origin of an unknown sample. From more than 200 SNPs compiled in the phylogenetic tree published by the Y-Chromosome Consortium, and looking at the population studies previously published, a package of 29 SNPs has been selected for the identification of major population haplogroups. A "Major Y-chromosome haplogroup typing kit" has been developed, which allows the multiplex amplification of all 29 SNPs in a single reaction. Allele genotyping was performed with a single base extension reaction (minisequencing) detected by CE. The validation of the multiplex was performed in a total of 1126 unrelated males distributed among 12 worldwide populations. The approach takes advantage of the specific geographic distribution of the Y-chromosome haplogroups and demonstrates the utility of binary polymorphisms to infer the origin of a male lineage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号