首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 364 毫秒
1.
A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure-activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.  相似文献   

2.
3.
Similarity-based methods for virtual screening are widely used. However, conventional searching using 2D chemical fingerprints or 2D graphs may retrieve only compounds which are structurally very similar to the original target molecule. Of particular current interest then is scaffold hopping, that is, the ability to identify molecules that belong to different chemical series but which could form the same interactions with a receptor. Reduced graphs provide summary representations of chemical structures and, therefore, offer the potential to retrieve compounds that are similar in terms of their gross features rather than at the atom-bond level. Using only a fingerprint representation of such graphs, we have previously shown that actives retrieved were more diverse than those found using Daylight fingerprints. Maximum common substructures give an intuitively reasonable view of the similarity between two molecules. However, their calculation using graph-matching techniques is too time-consuming for use in practical similarity searching in larger data sets. In this work, we exploit the low cardinality of the reduced graph in graph-based similarity searching. We reinterpret the reduced graph as a fully connected graph using the bond-distance information of the original graph. We describe searches, using both the maximum common induced subgraph and maximum common edge subgraph formulations, on the fully connected reduced graphs and compare the results with those obtained using both conventional chemical and reduced graph fingerprints. We show that graph matching using fully connected reduced graphs is an effective retrieval method and that the actives retrieved are likely to be topologically different from those retrieved using conventional 2D methods.  相似文献   

4.
We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding "hybrid similarities". SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query "templates" regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain's sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.  相似文献   

5.
An activity landscape model of a compound data set can be rationalized as a graphical representation that integrates molecular similarity and potency relationships. Activity landscape representations of different design are utilized to aid in the analysis of structure-activity relationships and the selection of informative compounds. Activity landscape models reported thus far focus on a single target (i.e., a single biological activity) or at most two targets, giving rise to selectivity landscapes. For compounds active against more than two targets, landscapes representing multitarget activities are difficult to conceptualize and have not yet been reported. Herein, we present a first activity landscape design that integrates compound potency relationships across multiple targets in a formally consistent manner. These multitarget activity landscapes are based on a general activity cliff classification scheme and are visualized in graph representations, where activity cliffs are represented as edges. Furthermore, the contributions of individual compounds to structure-activity relationship discontinuity across multiple targets are monitored. The methodology has been applied to derive multitarget activity landscapes for compound data sets active against different target families. The resulting landscapes identify single-, dual-, and triple-target activity cliffs and reveal the presence of hierarchical cliff distributions. From these multitarget activity landscapes, compounds forming complex activity cliffs can be readily selected.  相似文献   

6.
7.
8.
Among the multitude of learning algorithms that can be employed for deriving quantitative structure-activity relationships, regression trees have the advantage of being able to handle large data sets, dynamically perform the key feature selection, and yield readily interpretable models. A conventional method of building a regression tree model is recursive partitioning, a fast greedy algorithm that works well in many, but not all, cases. This work introduces a novel method of data partitioning based on artificial ants. This method is shown to perform better than recursive partitioning on three well-studied data sets.  相似文献   

9.
10.
11.
12.
13.
14.
Fragment‐based searching and abstract representation of molecular features through reduced graphs have separately been used for virtual screening. Here, we combine these two approaches and apply the algorithm RedFrag to virtual screens retrospectively and prospectively. It uses a new type of reduced graph that does not suffer from information loss during its construction and bypasses the necessity of feature definitions. Built upon chemical epitopes resulting from molecule fragmentation, the reduced graph embodies physico‐chemical and 2D‐structural properties of a molecule. Reduced graphs are compared with a continuous‐similarity‐distance‐driven maximal common subgraph algorithm, which calculates similarity at the fragmental and topological levels. The performance of the algorithm is evaluated by retrieval experiments utilizing precompiled validation sets. By predicting and experimentally testing ligands for endothiapepsin, a challenging model protease, the method is assessed in a prospective setting. Here, we identified five novel ligands with affinities as low as 2.08 μM. © 2015 Wiley Periodicals, Inc.  相似文献   

15.
On the basis of the recently introduced reduced graph concept of ErG (extending reduced graphs), a straightforward weighting approach to include additional (e.g., structural or SAR) knowledge into similarity searching procedures for virtual screening (wErG) is proposed. This simple procedure is exemplified with three data sets, for which interaction patterns available from X-ray structures of native or peptidomimetic ligands with their target protein are used to significantly improve retrieval rates of known actives from the MDL Drug Report database. The results are compared to those of other virtual screening techniques such as Daylight fingerprints, FTrees, UNITY, and various FlexX docking protocols. Here, it is shown that wErG exhibits a very good and stable performance independent of the target structure. On the basis of this (and the fact that ErG retrieves structurally more dissimilar compounds due to its potential to perform scaffold-hopping), the combination of wErG and FlexX is successfully explored. Overall, wErG is not only an easily applicable weighting procedure that efficiently identifies actives in large data sets but it is also straightforward to understand for both medicinal and computational chemists and can, therefore, be driven by several aspects of project-related knowledge (e.g., X-ray, NMR, SAR, and site-directed mutagenesis) in a very early stage of the hit identification process.  相似文献   

16.
17.
A new method for analyzing a structure-activity relationship is proposed. By use of a simple quantitative index, one can readily identify "structure-activity cliffs": pairs of molecules which are most similar but have the largest change in activity. We show how this provides a graphical representation of the entire SAR, in a way that allows the salient features of the SAR to be quickly grasped. In addition, the approach allows us view the SARs in a data set at different levels of detail. The method is tested on two data sets that highlight its ability to easily extract SAR information. Finally, we demonstrate that this method is robust using a variety of computational control experiments and discuss possible applications of this technique to QSAR model evaluation.  相似文献   

18.
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.  相似文献   

19.
20.
As the use of high-throughput screening systems becomes more routine in the drug discovery process, there is an increasing need for fast and reliable analysis of the massive amounts of the resulting data. At the forefront of the methods used is data reduction, often assisted by cluster analysis. Activity thresholds reduce the data set under investigation to manageable sizes while clustering enables the detection of natural groups in that reduced subset, thereby revealing families of compounds that exhibit increased activity toward a specific biological target. The above process, designed to handle primarily data sets of sizes much smaller than the ones currently produced by high-throughput screening systems, has become one of the main bottlenecks of the modern drug discovery process. In addition to being fragmented and heavily dependent on human experts, it also ignores all screening information related to compounds with activity less than the threshold chosen and thus, in the best case, can only hope to discover a subset of the knowledge available in the screening data sets. To address the deficiencies of the current screening data analysis process the authors have developed a new method that analyzes thoroughly large screening data sets. In this report we describe in detail this new approach and present its main differences with the methods currently in use. Further, we analyze a well-known, publicly available data set using the proposed method. Our experimental results show that the proposed method can improve significantly both the ease of extraction and amount of knowledge discovered from screening data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号