首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
A hierarchical clustering algorithm--NIPALSTREE--was developed that is able to analyze large data sets in high-dimensional space. The result can be displayed as a dendrogram. At each tree level the algorithm projects a data set via principle component analysis onto one dimension. The data set is sorted according to this one dimension and split at the median position. To avoid distortion of clusters at the median position, the algorithm identifies a potentially more suited split point left or right of the median. The procedure is recursively applied on the resulting subsets until the maximal distance between cluster members exceeds a user-defined threshold. The approach was validated in a retrospective screening study for angiotensin converting enzyme (ACE) inhibitors. The resulting clusters were assessed for their purity and enrichment in actives belonging to this ligand class. Enrichment was observed in individual branches of the dendrogram. In further retrospective virtual screening studies employing the MDL Drug Data Report (MDDR), COBRA, and the SPECS catalog, NIPALSTREE was compared with the hierarchical k-means clustering approach. Results show that both algorithms can be used in the context of virtual screening. Intersecting the result lists obtained with both algorithms improved enrichment factors while losing only few chemotypes.  相似文献   

2.
3.
4.
5.
Small molecule aggregators non‐specifically inhibit multiple unrelated proteins, rendering them therapeutically useless. They frequently appear as false hits and thus need to be eliminated in high‐throughput screening campaigns. Computational methods have been explored for identifying aggregators, which have not been tested in screening large compound libraries. We used 1319 aggregators and 128,325 non‐aggregators to develop a support vector machines (SVM) aggregator identification model, which was tested by four methods. The first is five fold cross‐validation, which showed comparable aggregator and significantly improved non‐aggregator identification rates against earlier studies. The second is the independent test of 17 aggregators discovered independently from the training aggregators, 71% of which were correctly identified. The third is retrospective screening of 13M PUBCHEM and 168K MDDR compounds, which predicted 97.9% and 98.7% of the PUBCHEM and MDDR compounds as non‐aggregators. The fourth is retrospective screening of 5527 MDDR compounds similar to the known aggregators, 1.14% of which were predicted as aggregators. SVM showed slightly better overall performance against two other machine learning methods based on five fold cross‐validation studies of the same settings. Molecular features of aggregation, extracted by a feature selection method, are consistent with published profiles. SVM showed substantial capability in identifying aggregators from large libraries at low false‐hit rates. © 2009 Wiley Periodicals, Inc.J Comput Chem, 2010  相似文献   

6.
Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.  相似文献   

7.
In order to identify novel chemical classes of factor Xa inhibitors, five scoring functions (FlexX, DOCK, GOLD, ChemScore and PMF) were engaged to evaluate the multiple docking poses generated by FlexX. The compound collection was composed of confirmed potent factor Xa inhibitors and a subset of the LeadQuest screening compound library. Except for PMF the other four scoring functions succeeded in reproducing the crystal complex (PDB code: 1FAX). During virtual screening the highest hit rate (80%) was demonstrated by FlexX at an energy cutoff of -40 kJ/mol, which is about 40-fold over random screening (2.06%). Limited results suggest that presenting more poses of a single molecule to the scoring functions could deteriorate their enrichment factors. A series of promising scaffolds with favorable binding scores was retrieved from LeadQuest. Consensus scoring by pair-wise intersection failed to enrich the hit rate yielded by single scorings (i.e. FlexX). We note that reported successes of consensus scoring in hit rate enrichment could be artificial because their comparisons were based on a selected subset of single scoring and a markedly reduced subset of double or triple scoring. The findings presented in this report are based upon a single biological system and support further studies.  相似文献   

8.
A virtual screening method is presented that is grounded on a receptor-derived pharmacophore model termed "virtual ligand" or "pseudo-ligand". The model represents an idealized constellation of potential ligand sites that interact with residues of the binding pocket. For rapid virtual screening of compound libraries the potential pharmacophore points of the virtual ligand are encoded as an alignment-free correlation vector, avoiding spatial alignment of pharmacophore features between the pharmacophore query (i.e., the virtual ligand) and the candidate molecule. The method was successfully applied to retrieving factor Xa inhibitors from a Ugi three-component combinatorial library, and yielded high enrichment of actives in a retrospective search for cyclooxygenase-2 (COX-2) inhibitors. The approach provides a concept for "de-orphanizing" potential drug targets and identifying ligands for hitherto unexplored or allosteric binding pockets.  相似文献   

9.
Annotation efforts in biosciences have focused in past years mainly on the annotation of genomic sequences. Only very limited effort has been put into annotation schemes for pharmaceutical ligands. Here we propose annotation schemes for the ligands of four major target classes, enzymes, G protein-coupled receptors (GPCRs), nuclear receptors (NRs), and ligand-gated ion channels (LGICs), and outline their usage for in silico screening and combinatorial library design. The proposed schemes cover ligand functionality and hierarchical levels of target classification. The classification schemes are based on those established by the EC, GPCRDB, NuclearDB, and LGICDB. The ligands of the MDL Drug Data Report (MDDR) database serve as a reference data set of known pharmacologically active compounds. All ligands were annotated according to the schemes when attribution was possible based on the activity classification provided by the reference database. The purpose of the ligand-target classification schemes is to allow annotation-based searching of the ligand database. In addition, the biological sequence information of the target is directly linkable to the ligand, hereby allowing sequence similarity-based identification of ligands of next homologous receptors. Ligands of specified levels can easily be retrieved to serve as comprehensive reference sets for cheminformatics-based similarity searches and for design of target class focused compound libraries. Retrospective in silico screening experiments within the MDDR01.1 database, searching for structures binding to dopamine D2, all dopamine receptors and all amine-binding class A GPCRs using known dopamine D2 binding compounds as a reference set, have shown that such reference sets are in particular useful for the identification of ligands binding to receptors closely related to the reference system. The potential for ligand identification drops with increasing phylogenetic distance. The analysis of the focus of a tertiary amine based combinatorial library compared to known amine binding class A GPCRs, peptide binding class A GPCRs, and LGIC ligands constitutes a second application scenario which illustrates how the focus of a combinatorial library can be treated quantitatively. The provided annotation schemes, which bridge chem- and bioinformatics by linking ligands to sequences, are expected to be of key utility for further systematic chemogenomics exploration of previously well explored target families.  相似文献   

10.
As a result of the recent developments of high-throughput screening in drug discovery, the number of available screening compounds has been growing rapidly. Chemical vendors provide millions of compounds; however, these compounds are highly redundant. Clustering analysis, a technique that groups similar compounds into families, can be used to analyze such redundancy. Many available clustering methods focus on accurate classification of compounds; they are slow and are not suitable for very large compound libraries. Here is described a fast clustering method based on an incremental clustering algorithm and the 2D fingerprints of compounds. This method can cluster a very large data set with millions of compounds in hours on a single computer. A program implemented with this method, called cd-hit-fp, is available from http://chemspace.org.  相似文献   

11.
12.
Polypharmacology describes the binding of a ligand to multiple protein targets (a promiscuous ligand) or multiple diverse ligands binding to a given target (a promiscuous target). Pharmaceutical companies are discovering increasing numbers of both promiscuous drugs and drug targets. Hence, polypharmacology is now recognized as an important aspect of drug design. Here, we describe a new and fast way to predict polypharmacological relationships between drug classes quantitatively, which we call Gaussian Ensemble Screening (GES). This approach represents a cluster of molecules with similar spherical harmonic surface shapes as a Gaussian distribution with respect to a selected center molecule. Calculating the Gaussian overlap between pairs of such clusters allows the similarity between drug classes to be calculated analytically without requiring thousands of bootstrap comparisons, as in current promiscuity prediction approaches. We find that such cluster similarity scores also follow a Gaussian distribution. Hence, a cluster similarity score may be transformed into a probability value, or "p-value", in order to quantify the relationships between drug classes. We present results obtained when using the GES approach to predict relationships between drug classes in a subset of the MDL Drug Data Report (MDDR) database. Our results indicate that GES is a useful way to study polypharmacology relationships, and it could provide a novel way to propose new targets for drug repositioning.  相似文献   

13.
A spectral clustering method is presented and applied to two-dimensional molecular structures, where it has been found particularly useful in the analysis of screening data. The method provides a means to quantify (1) the degree of intermolecular similarity within a cluster and (2) the contribution that the features of a molecule make to a cluster. In an application of the spectral clustering method to an example data set of 125 COX-2 inhibitors, these two criteria were used to place the molecules into clusters of chemically related two-dimensional structures.  相似文献   

14.
15.
The papain-family cathepsins are cysteine proteases that are emerging as promising therapeutic targets for a number of human disease conditions ranging from osteoporosis to cancer. Relatively few selective inhibitors for this family exist, and the in vivo selectivity of most existing compounds is unclear. We present here the synthesis of focused libraries of epoxysuccinyl-based inhibitors and their screening in crude tissue extracts. We identified a number of potent inhibitors that display selectivity for endogenous cathepsin targets both in vitro and in vivo. Importantly, the selectivity patterns observed in crude extracts were generally retained in vivo, as assessed by active-site labeling of tissues from treated animals. Overall, this study identifies several important compound classes and highlights the use of activity-based probes to assess pharmacodynamic properties of small-molecule inhibitors in vivo.  相似文献   

16.
17.
A computational method to rapidly assess and visualize the diversity in molecular shape associated with a given compound set has been developed. Normalized ratios of principal moments of inertia are plotted into two-dimensional triangular graphs and then used to compare the shape space covered by different compound sets, such as combinatorial libraries of varying size and composition. We have further developed a computational method to analyze interset similarity in terms of shape space coverage, which allows the shape redundancy between the different subsets of a given compound collection to be analyzed in a quantitative way. The shape space coverage has been found to originate mainly from the nature and the 3D-geometry (but not the size) of the central scaffold, while the number and nature of the peripheral substituents and conformational aspects were shown to be of minor importance. Substantial shape space coverage has been correlated with broad biological activity by applying the same shape analysis to collections of known bioactive compounds, such as MDDR and the GOLD-set. The aggregate of our results corroborates the intuitive notion that molecular shape is intimately linked to biological activity and that a high degree of shape (hence scaffold) diversity in screening collections will increase the odds of addressing a broad range of biological targets.  相似文献   

18.
The scaffold diversity of 7 representative commercial and proprietary compound libraries is explored for the first time using both Murcko frameworks and Scaffold Trees. We show that Level 1 of the Scaffold Tree is useful for the characterization of scaffold diversity in compound libraries and offers advantages over the use of Murcko frameworks. This analysis also demonstrates that the majority of compounds in the libraries we analyzed contain only a small number of well represented scaffolds and that a high percentage of singleton scaffolds represent the remaining compounds. We use Tree Maps to clearly visualize the scaffold space of representative compound libraries, for example, to display highly populated scaffolds and clusters of structurally similar scaffolds. This study further highlights the need for diversification of compound libraries used in hit discovery by focusing library enrichment on the synthesis of compounds with novel or underrepresented scaffolds.  相似文献   

19.
A molecular equivalence number (meqnum) classifies a molecule with respect to a class of structural features or topological shapes such as its cyclic system or its set of functional groups. Meqnums can be used to organize molecular structures into nonoverlapping, yet highly relatable classes. We illustrate the construction of some different types of meqnums and present via examples some methods of comparing diverse chemical libraries based on meqnums. In the examples we compare a library which is a random sample from the MDL Drug Data Report (MDDR) with a library which is a random sample from the Available Chemical Directory (ACD). In our analyses, we discover some interesting features of the topological shape of a molecule and its set of functional groups that are strongly linked with compounds occurring in the MDDR but not in the ACD. We also illustrate the utility of molecular equivalence indices in delineating the structural domain over which an SAR conclusion is valid.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号