首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.  相似文献   

2.
Rapid overlay of chemical structures (ROCS) is a method that aligns molecules based on shape and/or chemical similarity. It is often used in 3D ligand-based virtual screening. Given a query consisting of a single conformation of an active molecule ROCS can generate highly enriched hit lists. Typically the chosen query conformation is a minimum energy structure. Can better enrichment be obtained using conformations other than the minimum energy structure? To answer this question a methodology has been developed called CORAL (COnformational analysis, Rocs ALignment). For a given set of molecule conformations it computes optimized conformations for ROCS screening. It does so by clustering all conformations of a chosen molecule set using pairwise ROCS combo scores. The best representative conformation is that which has the highest average overlap with the rest of the conformations in the cluster. It is these best representative conformations that are then used for virtual screening. CORAL was tested by performing virtual screening experiments with the 40 DUD (Directory of Useful Decoys) data sets. Both CORAL and minimum energy queries were used. The recognition capability of each query was quantified as the area under the ROC curve (AUC). Results show that the CORAL AUC values are on average larger than the minimum energy AUC values. This demonstrates that one can indeed obtain better ROCS enrichments with conformations other than the minimum energy structure. As a result, CORAL analysis can be a valuable first step in virtual screening workflows using ROCS.  相似文献   

3.
4.
5.
A new method for analyzing a structure-activity relationship is proposed. By use of a simple quantitative index, one can readily identify "structure-activity cliffs": pairs of molecules which are most similar but have the largest change in activity. We show how this provides a graphical representation of the entire SAR, in a way that allows the salient features of the SAR to be quickly grasped. In addition, the approach allows us view the SARs in a data set at different levels of detail. The method is tested on two data sets that highlight its ability to easily extract SAR information. Finally, we demonstrate that this method is robust using a variety of computational control experiments and discuss possible applications of this technique to QSAR model evaluation.  相似文献   

6.
The characterization of structure-activity relationship (SAR) features of large compound data sets has been a hot topic in recent years, and different methods for large-scale SAR analysis have been introduced. The exploration of local SAR components and prioritization of compound subsets have thus far mostly relied on graphical analysis methods that capture similarity and potency relationships in a systematic manner. A currently unsolved problem in large-scale SAR analysis is how to automatically select those compound subsets from large data sets that carry most SAR information. For this purpose, we introduce a numerical optimization scheme that is based on particle swarm optimization guided by an SAR scoring function. The methodology is applied to four large compound sets. We demonstrate that compound subsets representing the most discontinuous local SARs are consistently selected through particle swarm optimization.  相似文献   

7.
We systematically compare X-ray structures of inhibitor complexes of four well-known enzymes and correlate two- and three-dimensional (2D and 3D) similarity of inhibitors with their potency. The analysis reveals the presence of unexpected systematic relationships between molecular similarity and potency. These findings explain why apparently inconsistent structure-activity relationships (SARs) can coexist in different targets, and they have general implications for compound screening and optimization. The results suggest that (1) even for active sites with significant binding constraints, there is a high probability that structurally diverse ligands with similar activity can be identified, (2) different types of SARs are not mutually exclusive, and (3) the chemical nature of ligands is of comparable importance for SARs as the features of active sites. These insights aid in the understanding of target-specific SARs and their intrinsic degree of variability.  相似文献   

8.
Query expansion is the process of reformulating an original query to improve retrieval performance in information retrieval systems. Relevance feedback is one of the most useful query modification techniques in information retrieval systems. In this paper, we introduce query expansion into ligand-based virtual screening (LBVS) using the relevance feedback technique. In this approach, a few high-ranking molecules of unknown activity are filtered from the outputs of a Bayesian inference network based on a single ligand molecule to form a set of ligand molecules. This set of ligand molecules is used to form a new ligand molecule. Simulated virtual screening experiments with the MDL Drug Data Report and maximum unbiased validation data sets show that the use of ligand expansion provides a very simple way of improving the LBVS, especially when the active molecules being sought have a high degree of structural heterogeneity. However, the effectiveness of the ligand expansion is slightly less when structurally-homogeneous sets of actives are being sought.  相似文献   

9.
We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding "hybrid similarities". SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query "templates" regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain's sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.  相似文献   

10.
11.
12.
The two-year rodent bioassay represents the golden standard for evaluating the carcinogenicity of chemicals. Because of practical and ethical reasons, alternative approaches have been investigated for many years. Among these approaches, the (quantitative) structure-activity relationships [(Q)SARs] offer promising perspectives for quickly screening a large number of chemicals. To increase the acceptance of (Q)SARs among the regulators, their predictive power needs to be scientifically validated. In this article, we tested the capacity of the DEREKfW expert system to qualitatively predict the rodent carcinogenicity and the genotoxic potential of 60 pesticides recently registered in Switzerland. The percentage of false negatives was found to be 31% for carcinogenicity. The associated sensitivity of 69% indicates that most of the pesticides with positive rodent bioassay results were detected by DEREKfW. On the other hand, the low specificity of 47% indicates that many pesticides may be flagged as carcinogenic while rodent bioassays would not confirm this potential. This may lead to unnecessary testing or the unnecessary restriction of a chemical.  相似文献   

13.
The development of structure-activity relationships (SARs) relating to the function of a biological protein is often a long and protracted undertaking when using an iterative medicinal chemistry approach. High throughput screening of ECLiPS (Encoded Combinatorial Libraries on Polymeric Support) libraries can be used to simplify this process. In this paper, we illustrate how a large ECLiPS library of 26,908 compounds, based on a tricyclic core structure, was used to define a multitude of SARs for the oncogenic target, farnesyltransferase (FTase). This library, FT-2, was prepared using a split-and-pool approach in which small molecules are constructed on resin that contains tag/linker constructs to track the synthetic process [1-5] Highly defined SARs were produced from this screen that enhanced our understanding of FTase binding site interactions. The pivotal compounds culled from this library were potent in both cell-free and cell-based FTase assays, selective over the closely related enzyme, geranylgeranyltransferase I (GGTase I), and inhibited the adherent-independent growth of a transformed cell line.  相似文献   

14.
We introduce the QuanSA method for inducing physically meaningful field-based models of ligand binding pockets based on structure-activity data alone. The method is closely related to the QMOD approach, substituting a learned scoring field for a pocket constructed of molecular fragments. The problem of mutual ligand alignment is addressed in a general way, and optimal model parameters and ligand poses are identified through multiple-instance machine learning. We provide algorithmic details along with performance results on sixteen structure-activity data sets covering many pharmaceutically relevant targets. In particular, we show how models initially induced from small data sets can extrapolatively identify potent new ligands with novel underlying scaffolds with very high specificity. Further, we show that combining predictions from QuanSA models with those from physics-based simulation approaches is synergistic. QuanSA predictions yield binding affinities, explicit estimates of ligand strain, associated ligand pose families, and estimates of structural novelty and confidence. The method is applicable for fine-grained lead optimization as well as potent new lead identification.  相似文献   

15.
Modulation of protein-protein interactions (PPI) has emerged as a new concept in rational drug design. Here, we present a computational protocol for identifying potential PPI inhibitors. Relevant regions of interfaces (epitopes) are predicted for three-dimensional protein models and serve as queries for virtual compound screening. We present a computational screening protocol that incorporates two different pharmacophore models. One model is based on the mathematical concept of autocorrelation vectors and the other utilizes fuzzy labeled graphs. In a proof-of-concept study, we were able to identify serine protease inhibitors using a predicted trypsin epitope as query. Our virtual screening framework may be suited for rapid identification of PPI inhibitors and suggesting bioactive tool compounds.  相似文献   

16.
Under the current chemicals legislation, the regulatory use of structure-activity relationships (SARs) and quantitative structure-activity relationships (QSARs), collectively referred to as (Q)SARs, for the assessment of chemicals is limited, partly due to concerns about the extent to which (Q)SAR estimates can be relied upon. On 29 October 2003, the European Commission adopted a legislative proposal that foresees the introduction of a new regulatory system for chemicals called REACH (Registration, Evaluation, and Authorisation of Chemicals), which will impose equivalent information requirements on both new and existing chemicals. For reasons of practicality, cost-effectiveness and animal welfare, it is envisaged that (Q)SARs will play an important role in the assessment of some 30,000 existing chemicals for which further information may be required under the REACH system. It will therefore be essential that the (Q)SAR models used will produce reliable estimates. To overcome the barriers in the acceptance of (Q)SARs for regulatory purposes, it is widely acknowledged that there needs to be international agreement on the principles of (Q)SAR validation, and that the process of (Q)SAR validation should be managed by independent organisations, with a view to providing independent advice to the regulators who make decisions on the acceptability of (Q)SARs. The European Centre for the Validation of Alternative Methods (ECVAM), which is part of the European Commission's Joint Research Centre (JRC), has a well-established role in providing independent scientific and technical advice to European policy makers. This paper describes progress made at an international level regarding the principles of validation, and explains the role of ECVAM regarding the practical validation of (Q)SARs.  相似文献   

17.
18.
Traditional quantitative structure-activity relationship (QSAR) models aim to capture global structure-activity trends present in a data set. In many situations, there may be groups of molecules which exhibit a specific set of features which relate to their activity or inactivity. Such a group of features can be said to represent a local structure-activity relationship. Traditional QSAR models may not recognize such local relationships. In this work, we investigate the use of local lazy regression (LLR), which obtains a prediction for a query molecule using its local neighborhood, rather than considering the whole data set. This modeling approach is especially useful for very large data sets because no a priori model need be built. We applied the technique to three biological data sets. In the first case, the root-mean-square error (RMSE) for an external prediction set was 0.94 log units versus 0.92 log units for the global model. However, LLR was able to characterize a specific group of anomalous molecules with much better accuracy (0.64 log units versus 0.70 log units for the global model). For the second data set, the LLR technique resulted in a decrease in RMSE from 0.36 log units to 0.31 log units for the external prediction set. In the third case, we obtained an RMSE of 2.01 log units versus 2.16 log units for the global model. In all cases, LLR led to a few observations being poorly predicted compared to the global model. We present an analysis of why this was observed and possible improvements to the local regression approach.  相似文献   

19.
20.
Pharmacophore multiplets are useful tools for 3D database searching, with the queries used ordinarily being derived from ensembles of random conformations of active ligands. It seems reasonable to expect that their usefulness can be augmented by instead using queries derived from single ligand conformations obtained from aligned ligands. Comparisons of pharmacophore multiplet searching using random conformations with multiplet searching using single conformations derived from GALAHAD (a genetic algorithm with linear assignment for hypermolecular alignment of datasets) models do indeed show that, while query hypotheses based on random conformations are quite effective, hypotheses based on aligned conformations do a better job of discriminating between active and inactive compounds. In particular, the hypothesis created from a neuraminidase inhibitor model was more similar to half of 18 known actives than all but 0.2% of the compounds in a structurally diverse subset of the World Drug Index. Similarly, a model developed from five angiotensin II antagonists yielded hypotheses that placed 65 known antagonists within the top 0.1–1% of decoy databases. The differences in discriminating power ranged from 2 to 20-fold, depending on the protein target and the type of pharmacophore multiplet used.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号