首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The requirement of aligning each individual molecule in a data set severely limits the type of molecules which can be analysed with traditional structure activity relationship (SAR) methods. A method which solves this problem by using relations between objects is inductive logic programming (ILP). Another advantage of this methodology is its ability to include background knowledge as 1st-order logic. However, previous molecular ILP representations have not been effective in describing the electronic structure of molecules. We present a more unified and comprehensive representation based on Richard Bader's quantum topological atoms in molecules (AIM) theory where critical points in the electron density are connected through a network. AIM theory provides a wealth of chemical information about individual atoms and their bond connections enabling a more flexible and chemically relevant representation. To obtain even more relevant rules with higher coverage, we apply manual postprocessing and interpretation of ILP rules. We have tested the usefulness of the new representation in SAR modelling on classifying compounds of low/high mutagenicity and on a set of factor Xa inhibitors of high and low affinity.  相似文献   

2.
Summary One of the largest available data sets for developing a quantitative structure-activity relationship (QSAR) — the inhibition of dihydrofolate reductase (DHFR) by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazine derivatives — has been used for a sixfold cross-validation trial of neural networks, inductive logic programming (ILP) and linear regression. No statistically significant difference was found between the predictive capabilities of the methods. However, the representation of molecules by attributes, which is integral to the ILP approach, provides understandable rules about drug-receptor interactions.  相似文献   

3.
Summary Neural networks and inductive logic programming (ILP) have been compared to linear regression for modelling the QSAR of the inhibition of E. coli dihydrofolate reductase (DHFR) by 2,4-diamino-5-(substitured benzyl)pyrimidines, and, in the subsequent paper [Hirst, J.D., King, R.D. and Sternberg, M.J.E., J. Comput.-Aided Mol. Design, 8 (1994) 421], the inhibition of rodent DHFR by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazines. Cross-validation trials provide a statistically rigorous assessment of the predictive capabilities of the methods, with training and testing data selected randomly and all the methods developed using identical training data. For the ILP analysis, molecules are represented by attributes other than Hansch parameters. Neural networks and ILP perform better than linear regression using the attribute representation, but the difference is not statistically significant. The major benefit from the ILP analysis is the formulation of understandable rules relating the activity of the inhibitors to their chemical structure.  相似文献   

4.
In chemoinformatics, searching for compounds which are structurally diverse and share a biological activity is called scaffold hopping. Scaffold hopping is important since it can be used to obtain alternative structures when the compound under development has unexpected side-effects. Pharmaceutical companies use scaffold hopping when they wish to circumvent prior patents for targets of interest. We propose a new method for scaffold hopping using inductive logic programming (ILP). ILP uses the observed spatial relationships between pharmacophore types in pretested active and inactive compounds and learns human-readable rules describing the diverse structures of active compounds. The ILP-based scaffold hopping method is compared to two previous algorithms (chemically advanced template search, CATS, and CATS3D) on 10 data sets with diverse scaffolds. The comparison shows that the ILP-based method is significantly better than random selection while the other two algorithms are not. In addition, the ILP-based method retrieves new active scaffolds which were not found by CATS and CATS3D. The results show that the ILP-based method is at least as good as the other methods in this study. ILP produces human-readable rules, which makes it possible to identify the three-dimensional features that lead to scaffold hopping. A minor variant of a rule learnt by ILP for scaffold hopping was subsequently found to cover an inhibitor identified by an independent study. This provides a successful result in a blind trial of the effectiveness of ILP to generate rules for scaffold hopping. We conclude that ILP provides a valuable new approach for scaffold hopping.  相似文献   

5.
There exists many databases containing information on genes that are useful for background information in machine learning analysis of microarray data. The gene ontology and gene ontology annotation projects are among the most comprehensive of these. We demonstrate how inductive logic programming (ILP) can be used to build classification rules for microarray data which naturally incorporates the gene ontology and annotations to it as background knowledge without removing the inherent graph structure of the ontology. The ILP rules generated are parsimonious and easy to interpret. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.  相似文献   

7.
Traditional 3D‐quantitative structure–activity relationship (QSAR)/structure–activity relationship (SAR) methodologies are sensitive to the quality of an alignment step which is required to make molecular structures comparable. Even though many methods have been proposed to solve this problem, they often result in a loss of model interpretability. The requirement of alignment is a restriction imposed by traditional regression methods due to their failure to represent relations between data objects directly. Inductive logic programming (ILP) is a class of machine‐learning methods able to describe relational data directly. We propose a new methodology which is aimed at using the richness in molecular interaction fields (MIFs) without being restricted by any alignment procedure. A set of MIFs is computed and further compressed by finding their minima corresponding to the sites of strongest interaction between a molecule and the applied test probe. ILP uses these minima to build easily interpretable rules about activity expressed as pharmacophore rules in the powerful language of first‐order logic. We use a set of previously published inhibitors of factor Xa of the benzamidine family to discuss the problems, requirements and advantages of the new methodology. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

8.
9.
Humans are exposed to thousands of environmental chemicals for which no developmental toxicity information is available. Structure-activity relationships (SARs) are models that could be used to efficiently predict the biological activity of potential developmental toxicants. However, at this time, no adequate SAR models of developmental toxicity are available for risk assessment. In the present study, a new developmental database was compiled by combining toxicity information from the Teratogen Information System (TERIS) and the Food and Drug Administration (FDA) guidelines. We implemented a decision tree modeling procedure, using Classification and Regression Tree software and a model ensemble approach termed bagging. We then assessed the empirical distributions of the prediction accuracy measures of the single and ensemble-based models, achieved by repeating our modeling experiment many times by repeated random partitioning of the working database. The decision tree developmental SAR models exhibited modest prediction accuracy. Bagging tended to enhance the accuracy of prediction. Also, the model ensemble approach reduced the variability of prediction measures compared to the single model approach. Further research with data derived from animal species- and endpoint-specific components of an extended and refined FDA/TERIS database has the potential to derive SAR models that would be useful in the developmental risk assessment of the thousands of untested chemicals.  相似文献   

10.

Humans are exposed to thousands of environmental chemicals for which no developmental toxicity information is available. Structure-activity relationships (SARs) are models that could be used to efficiently predict the biological activity of potential developmental toxicants. However, at this time, no adequate SAR models of developmental toxicity are available for risk assessment. In the present study, a new developmental database was compiled by combining toxicity information from the Teratogen Information System (TERIS) and the Food and Drug Administration (FDA) guidelines. We implemented a decision tree modeling procedure, using Classification and Regression Tree software and a model ensemble approach termed bagging. We then assessed the empirical distributions of the prediction accuracy measures of the single and ensemble-based models, achieved by repeating our modeling experiment many times by repeated random partitioning of the working database. The decision tree developmental SAR models exhibited modest prediction accuracy. Bagging tended to enhance the accuracy of prediction. Also, the model ensemble approach reduced the variability of prediction measures compared to the single model approach. Further research with data derived from animal species- and endpoint-specific components of an extended and refined FDA/TERIS database has the potential to derive SAR models that would be useful in the developmental risk assessment of the thousands of untested chemicals.  相似文献   

11.
An ab initio method has been developed to predict beta architectures in polypeptides. The approach predicts the topology of beta-sheets and disulfide bridges through a novel superstructure-based mathematical framework originally established for chemical process synthesis problems. Two types of superstructure are introduced, both of which emanate from the principle that hydrophobic interactions drive the formation of a beta-structure. The mathematical formulation of the problem results in a set of integer linear programming (ILP) problems that can be solved to global optimality to identify the optimal beta-configuration. These (ILP) models can also predict a ranked ordered list of the best, second-best, third-best, etc., topologies of beta-sheets and disulfide bridges. The approach is shown to perform very well for several benchmark polypeptide systems, as well as polypeptides exhibiting challenging nonsequential beta-sheet topologies folds (56 to 187 amino acids).  相似文献   

12.
Warmr: a data mining tool for chemical data   总被引:5,自引:0,他引:5  
  相似文献   

13.
14.
15.
In silico methods are a valid tool for analysing the properties of chemical compounds and interest in computational modelling techniques to predict the activity of chemicals is constantly growing. Many computational methods can be used to analyse the toxicity or biological activity of chemicals, particularly as regards their interactions with biological macromolecules (e.g. receptors) and other physico-chemical properties. An overview of these methods is provided in this tutorial review, with some examples of their application to predict oestrogen receptor (ER)-mediated effects. Nuclear receptors, particularly ER, have been studied with in silico tools since concern is growing about substances, called endocrine disrupters, that can interfere with hormone regulation. Molecular modelling techniques such as Quantitative Structure-Activity Relationships (QSAR), related methods like 3D-QSAR, and virtual docking have been used to investigate these phenomena and are described here. Implications about regulatory acceptance and use of these methods and the resulting models for identifying hazards and setting priorities are also addressed.  相似文献   

16.
Using a perturbative approach to simple model systems, we derive useful propensity rules for inelastic electron tunneling spectroscopy (IETS) of molecular wire junctions. We examine the circumstances under which this spectroscopy (that has no rigorous selection rules) obeys well defined propensity rules based on the molecular symmetry and on the topology of the molecule in the junction. Focusing on conjugated molecules of C(2h) symmetry, semiquantitative arguments suggest that the IETS is dominated by a(g) vibrations in the high energy region and by out of plane modes (a(u) and b(g)) in the low energy region. Realistic computations verify that the proposed propensity rules are strictly obeyed by medium to large-sized conjugated molecules but are subject to some exceptions when small molecules are considered. The propensity rules facilitate the use of IETS to help characterize the molecular geometry within the junction.  相似文献   

17.
18.
Abstract

The TOPological Sub-Structural MOlecular DEsign (TOPS-MODE) approach (Estrada, E. SAR QSAR Environ. Res. 2000, 11, 55–73) has been introduced to the study of toxicological properties. The toxicity of 42 nitrobenzenes was studied with this approach obtaining a good quantitative structure–toxicity model. For the first time we compare the use of eight different weights in the diagonal entries of the bond matrix for selecting the best TOPS-MODE model. TOPS-MODE was used to derive the contribution of different fragments to the toxicity of studied compounds. These contributions were applied to calculate toxicity substituent constants for the groups present in the nitrobenzenes studied.  相似文献   

19.
20.
The identification of interactions between drugs and target proteins plays a key role in the process of genomic drug discovery. It is both consuming and costly to determine drug–target interactions by experiments alone. Therefore, there is an urgent need to develop new in silico prediction approaches capable of identifying these potential drug–target interactions in a timely manner. In this article, we aim at extending current structure–activity relationship (SAR) methodology to fulfill such requirements. In some sense, a drug–target interaction can be regarded as an event or property triggered by many influence factors from drugs and target proteins. Thus, each interaction pair can be represented theoretically by using these factors which are based on the structural and physicochemical properties simultaneously from drugs and proteins. To realize this, drug molecules are encoded with MACCS substructure fingerings representing existence of certain functional groups or fragments; and proteins are encoded with some biochemical and physicochemical properties. Four classes of drug–target interaction networks in humans involving enzymes, ion channels, G-protein-coupled receptors (GPCRs) and nuclear receptors, are independently used for establishing predictive models with support vector machines (SVMs). The SVM models gave prediction accuracy of 90.31%, 88.91%, 84.68% and 83.74% for four datasets, respectively. In conclusion, the results demonstrate the ability of our proposed method to predict the drug–target interactions, and show a general compatibility between the new scheme and current SAR methodology. They open the way to a host of new investigations on the diversity analysis and prediction of drug–target interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号