首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For a systematic exploration of structural relationships between molecular scaffolds, ~24,000 unique scaffolds were extracted from 458 different target sets. Substructure relationships between these scaffolds were systematically determined. The scaffold tree data structure was utilized to study structural relationships between original scaffolds and derivative scaffolds obtained by rule-based decomposition. Leaf-to-root substructure relationships that resulted from rule-based decomposition were compared to leaf-to-leaf relationships between original scaffolds most of which were not part of the scaffold tree hierarchy. Decomposed scaffolds not contained in active target set compounds were prioritized on the basis of hierarchical scaffold patterns and additional substructure relationships. For high-priority virtual scaffolds, activity predictions were carried out, and these scaffolds were often found in external test compounds having the predicted activity. Taken together, our results suggest that leaf-to-root substructure relationships in scaffold trees should best be complemented with additional substructure relationships to determine high-priority virtual scaffolds for activity prediction.  相似文献   

2.
High-throughput screening (HTS) campaigns in pharmaceutical companies have accumulated a large amount of data for several million compounds over a couple of hundred assays. Despite the general awareness that rich information is hidden inside the vast amount of data, little has been reported for a systematic data mining method that can reliably extract relevant knowledge of interest for chemists and biologists. We developed a data mining approach based on an algorithm called ontology-based pattern identification (OPI) and applied it to our in-house HTS database. We identified nearly 1500 scaffold families with statistically significant structure-HTS activity profile relationships. Among them, dozens of scaffolds were characterized as leading to artifactual results stemming from the screening technology employed, such as assay format and/or readout. Four types of compound scaffolds can be characterized based on this data mining effort: tumor cytotoxic, general toxic, potential reporter gene assay artifact, and target family specific. The OPI-based data mining approach can reliably identify compounds that are not only structurally similar but also share statistically significant biological activity profiles. Statistical tests such as Kruskal-Wallis test and analysis of variance (ANOVA) can then be applied to the discovered scaffolds for effective assignment of relevant biological information. The scaffolds identified by our HTS data mining efforts are an invaluable resource for designing SAR-robust diversity libraries, generating in silico biological annotations of compounds on a scaffold basis, and providing novel target family specific scaffolds for focused compound library design.  相似文献   

3.
The evaluation of the scaffold hopping potential of computational methods is of high relevance for virtual screening. For benchmark calculations, classes of known active compounds are utilized. Ideally, such classes should have a well-defined content of structurally diverse scaffolds. However, in reported benchmark investigations, the choice of activity classes is often difficult to rationalize. To provide a compendium of well-characterized test cases for the assessment of scaffold hopping potential, structural distances between scaffolds were systematically calculated for compound classes available in the ChEMBL database. Nearly seven million scaffold pairs were evaluated. On the basis of the global scaffold distance distribution, a threshold value for large scaffold distances was determined. Compound data sets were ranked based on the proportion of scaffold pairs with large distances they contained, taking additional criteria into account that are relevant for virtual screening. A set of 50 activity classes is provided that represent attractive test cases for scaffold hopping analysis and benchmark calculations.  相似文献   

4.

An activity cliff (AC) is formed by a pair of structurally similar compounds with a large difference in potency. Accordingly, ACs reveal structure–activity relationship (SAR) discontinuity and provide SAR information for compound optimization. Herein, we have investigated the question if ACs could be predicted from image data. Therefore, pairs of structural analogs were extracted from different compound activity classes that formed or did not form ACs. From these compound pairs, consistently formatted images were generated. Image sets were used to train and test convolutional neural network (CNN) models to systematically distinguish between ACs and non-ACs. The CNN models were found to predict ACs with overall high accuracy, as assessed using alternative performance measures, hence establishing proof-of-principle. Moreover, gradient weights from convolutional layers were mapped to test compounds and identified characteristic structural features that contributed to successful predictions. Weight-based feature visualization revealed the ability of CNN models to learn chemistry from images at a high level of resolution and aided in the interpretation of model decisions with intrinsic black box character.

  相似文献   

5.
Computational scaffold hopping aims to identify core structure replacements in active compounds. To evaluate scaffold hopping potential from a principal point of view, regardless of the computational methods that are applied, a global analysis of conventional scaffolds in analog series from compound activity classes was carried out. The majority of analog series was found to contain multiple scaffolds, thus enabling the detection of intra-series scaffold hops among closely related compounds. More than 1000 activity classes were found to contain increasing proportions of multi-scaffold analog series. Thus, using such activity classes for scaffold hopping analysis is likely to overestimate the scaffold hopping (core structure replacement) potential of computational methods, due to an abundance of artificial scaffold hops that are possible within analog series.  相似文献   

6.
Dual and triple activity-difference (DAD/TAD) maps are tools for the systematic characterization of structure-activity relationships (SAR) of compound data sets screened against two or three targets. DAD and TAD maps are two- and three- dimensional representations of the pairwise activity differences of compound data sets, respectively. Adding pairwise structural similarity information into these maps readily reveals activity cliff regions in the SAR for one, two, or three targets. In addition, pairs of compounds in the smooth regions of the SAR and scaffold hops are also easily identified in these maps. Herein, DAD and TAD maps are employed for the systematic characterization of the SAR of a benchmark set of 299 compounds screened against dopamine, norepinephrine, and serotonin transporters. To reduce the well-known dependence of the activity landscape on the structural representation, five selected 2D and 3D structure representations were used to characterize the SAR. Systematic analysis of the DAD and TAD maps reveals regions in the landscape with similar SAR for two or the three targets as well as regions with inverse SAR, i.e., changes in structure that increase activity for one target, but decrease activity for the other target. Focusing the analysis on pairs of compounds with high structure similarity revealed the presence of single-, dual-, and triple-target activity cliffs, i.e., small changes in structure with high changes in potency for one, two, or the three targets, respectively. Triple-target scaffold hops are also discussed. Activity cliffs and scaffold hops were also quantified and represented using two recently proposed approaches namely, mean Structure Activity Landscape Index (mean SALI) and Consensus Structure-Activity Similarity (SAS) maps.  相似文献   

7.
The extraction of SAR information from structurally diverse compound data sets is a challenging task. One of the focal points of systematic SAR analysis is the search for activity cliffs, that is, structurally similar compounds having large potency differences, from which SAR determinants can be deduced. The assessment of SAR information is usually based on pairwise similarity and potency comparisons of data set compounds. As a consequence, activity cliffs are mostly evaluated at a compound pair level. Here, we present an extension of the activity cliff concept by introducing "activity ridges" that are formed by overlapping "combinatorial" activity cliffs between participating compounds, giving rise to ridge-like structures in activity landscapes. Activity ridges are rich in SAR information. In a systematic analysis of 242 compound data sets, we have identified well-defined activity ridges in 71 different sets. In addition, an information-theoretic approach has been devised to characterize the structural composition of activity ridges. Taken together, our results show that activity ridges frequently occur in sets of active compounds and that different categories of ridges can be distinguished on the basis of their structural content. The computational identification of activity ridges provides access to compound subsets having high priority for SAR analysis.  相似文献   

8.
The scaffold diversity of 7 representative commercial and proprietary compound libraries is explored for the first time using both Murcko frameworks and Scaffold Trees. We show that Level 1 of the Scaffold Tree is useful for the characterization of scaffold diversity in compound libraries and offers advantages over the use of Murcko frameworks. This analysis also demonstrates that the majority of compounds in the libraries we analyzed contain only a small number of well represented scaffolds and that a high percentage of singleton scaffolds represent the remaining compounds. We use Tree Maps to clearly visualize the scaffold space of representative compound libraries, for example, to display highly populated scaffolds and clusters of structurally similar scaffolds. This study further highlights the need for diversification of compound libraries used in hit discovery by focusing library enrichment on the synthesis of compounds with novel or underrepresented scaffolds.  相似文献   

9.
From a medicinal chemistry point of view, one of the primary goals of high throughput screening (HTS) hit list assessment is the identification of chemotypes with an informative structure-activity relationship (SAR). Such chemotypes may enable optimization of the primary potency, as well as selectivity and phamacokinetic properties. A common way to prioritize them is molecular clustering of the hits. Typical clustering techniques, however, rely on a general notion of chemical similarity or standard rules of scaffold decomposition and are thus insensitive to molecular features that are enriched in biologically active compounds. This hinders SAR analysis, because compounds sharing the same pharmacophore might not end up in the same cluster and thus are not directly compared to each other by the medicinal chemist. Similarly, common chemotypes that are not related to activity may contaminate clusters, distracting from important chemical motifs. We combined molecular similarity and Bayesian models and introduce (I) a robust, activity-aware clustering approach and (II) a feature mapping method for the elucidation of distinct SAR determinants in polypharmacologic compounds. We evaluated the method on 462 dose-response assays from the Pubchem Bioassay repository. Activity-aware clustering grouped compounds sharing molecular cores that were specific for the target or pathway at hand, rather than grouping inactive scaffolds commonly found in compound series. Many of these core structures we also found in literature that discussed SARs of the respective targets. A numerical comparison of cores allowed for identification of the structural prerequisites for polypharmacology, i.e., distinct bioactive regions within a single compound, and pointed toward selectivity-conferring medchem strategies. The method presented here is generally applicable to any type of activity data and may help bridge the gap between hit list assessment and designing a medchem strategy.  相似文献   

10.
We introduce a method to determine a structural distance between any pair of molecular scaffolds. The development of this approach was motivated by the need to accurately evaluate scaffold hopping studies in virtual screening and medicinal chemistry and assess the degree of difficulty involved in facilitating a transition from one structure to another. In order to consistently derive structural distances, scaffolds of different composition and topology are subjected to molecular editing procedures that abstract from original scaffolds in a defined manner until compositional and topological equivalence can be established. Pairs of corresponding scaffold representations are transformed into one-dimensional atom sequences that are aligned using approaches adapted from biological sequence comparison. From best scoring atom sequence alignments, interscaffold distances are derived. The algorithm is evaluated at different levels including the analysis of a series of model scaffolds with defined chemical changes, a scaffold library, and scaffolds from reference compounds and hits of successful virtual screening applications. It is demonstrated that chemically intuitive scaffold distances are obtained for pairs of scaffolds with varying composition and topology. Distance threshold values for close and remote structural relationships between scaffolds are also determined. The methodology is made publicly available in order to provide a basis for a consistent assessment of scaffold hopping ability and to aid in the evaluation and comparison of virtual screening methods.  相似文献   

11.
Publicly available compound activity data have been analyzed to distinguish between compounds for which single or multiple potency measurements were available and gain insight into data confidence levels. Different potency measurements with defined end points and alternative ways to represent multiple potency values for active compounds have been evaluated in the context of SAR analysis. Approximately 78% of all compounds with multiple potency measurements were found to represent high-confidence data, which corresponded to ~10% of all activity data. The use of different types of potency measurements and alternative representations of multiple potency values changed the SAR information content of compound data sets and resulted in different activity cliff distributions. Thus, the types of activity measurements that were available and how they were used substantially impacted SAR analysis. Compounds with multiple K(i) measurements provided the most reliable basis for SAR exploration.  相似文献   

12.
We have aimed to systematically extract analog series with related core structures from multi-target activity space to explore target promiscuity of closely related analogous. Therefore, a previously introduced SAR matrix structure was adapted and further extended for large-scale data mining. These matrices organize analog series with related yet distinct core structures in a consistent manner. High-confidence compound activity data yielded more than 2,300 non-redundant matrices capturing 5,821 analog series that included 4,288 series with multi-target and 735 series with multi-family activities. Many matrices captured more than three analog series with activity against more than five targets. The matrices revealed a variety of promiscuity patterns. Compound series matrices also contain virtual compounds, which provide suggestions for compound design focusing on desired activity profiles.  相似文献   

13.
In pharmaceutical research, collections of active compounds directed against specific therapeutic targets usually evolve over time. Small molecule discovery is an iterative process. New compounds are discovered, alternative compound series explored, some series discontinued, and others prioritized. The design of new compounds usually takes into consideration prior chemical and structure-activity relationship (SAR) knowledge. Hence, historically grown compound collections represent a viable source of chemical and SAR information that might be utilized to retrospectively analyze roadblocks in compound optimization and further guide discovery projects. However, SAR analysis of large and heterogeneous sets of active compounds is also principally complicated. We have subjected evolving compound data sets to SAR monitoring using activity landscape models in order to evaluate how composition and SAR characteristics might change over time. Chemotype and potency distributions in evolving data sets directed against different therapeutic targets were analyzed and alternative activity landscape representations generated at different points in time to monitor the progression of global and local SAR features. Our results show that the evolving data sets studied here have predominantly grown around seed clusters of active compounds that often emerged early on, while other SAR islands remained largely unexplored. Moreover, increasing scaffold diversity in evolving data sets did not necessarily yield new SAR patterns, indicating a rather significant influence of "me-too-ism" (i.e., introducing new chemotypes that are similar to already known ones) on the composition and SAR information content of the data sets.  相似文献   

14.
The ongoing coronavirus pandemic has been a burden on the worldwide population, with mass fatalities and devastating socioeconomic consequences. It has particularly drawn attention to the lack of approved small-molecule drugs to inhibit SARS coronaviruses. Importantly, lessons learned from the SARS outbreak of 2002–2004, caused by severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1), can be applied to current drug discovery ventures. SARS-CoV-1 and SARS-CoV-2 both possess two cysteine proteases, the main protease (Mpro) and the papain-like protease (PLpro), which play a significant role in facilitating viral replication, and are important drug targets. The non-covalent inhibitor, GRL-0617, which was found to inhibit replication of SARS-CoV-1, and more recently SARS-CoV-2, is the only PLpro inhibitor co-crystallised with the recently solved SARS-CoV-2 PLpro crystal structure. Therefore, the GRL-0617 structural template and pharmacophore features are instrumental in the design and development of more potent PLpro inhibitors. In this work, we conducted scaffold hopping using GRL-0617 as a reference to screen over 339,000 ligands in the chemical space using the ChemDiv, MayBridge, and Enamine screening libraries. Twenty-four distinct scaffolds with structural and electrostatic similarity to GRL-0617 were obtained. These proceeded to molecular docking against PLpro using the AutoDock tools. Of two compounds that showed the most favourable predicted binding affinities to the target site, as well as comparable protein-ligand interactions to GRL-0617, one was chosen for further analogue-based work. Twenty-seven analogues of this compound were further docked against the PLpro, which resulted in two additional hits with promising docking profiles. Our in silico pipeline consisted of an integrative four-step approach: (1) ligand-based virtual screening (scaffold-hopping), (2) molecular docking, (3) an analogue search, and, (4) evaluation of scaffold drug-likeness, to identify promising scaffolds and eliminate those with undesirable properties. Overall, we present four novel, and lipophilic, scaffolds obtained from an exhaustive search of diverse and uncharted regions of chemical space, which may be further explored in vitro through structure-activity relationship (SAR) studies in the search for more potent inhibitors. Furthermore, these scaffolds were predicted to have fewer off-target interactions than GRL-0617. Lastly, to our knowledge, this work contains the largest ligand-based virtual screen performed against GRL-0617.  相似文献   

15.
BACKGROUND: The trypanosomal diseases including Chagas' disease, African sleeping sickness and Nagana have a substantial impact on human and animal health worldwide. Classes of effective therapeutics are needed owing to the emergence of drug resistance as well as the toxicity of existing agents. The cysteine proteases of two trypanosomes, Trypanosoma cruzi (cruzain) and Trypanosoma brucei (rhodesain), have been targeted for a structure-based drug design program as mechanistic inhibitors that target these enzymes are effective in cell-based and animal models of trypanosomal infection. RESULTS: We have used computational methods to identify new lead scaffolds for non-covalent inhibitors of cruzain and rhodesain, have demonstrated the efficacy of these compounds in cell-based and animal assays, and have synthesized analogs to explore structure activity relationships. Nine compounds with varied scaffolds identified by DOCK4.0.1 were found to be active at concentrations below 10 microM against cruzain and rhodesain in enzymatic studies. All hits were calculated to have substantial hydrophobic interactions with cruzain. Two of the scaffolds, the urea scaffold and the aroyl thiourea scaffold, exhibited activity against T. cruzi in vivo and both enzymes in vitro. They also have predicted pharmacokinetic properties that meet Lipinski's 'rule of 5'. These scaffolds are synthetically tractable and lend themselves to combinatorial chemistry efforts. One of the compounds, 5'(1-methyl-3-trifluoromethylpyrazol-5-yl)-thiophene 3'-trifluoromethylphenyl urea (D16) showed a 3.1 microM IC(50) against cruzain and a 3 microM IC(50) against rhodesain. Infected cells treated with D16 survived 22 days in culture compared with 6 days for their untreated counterparts. The mechanism of the inhibitors of these two scaffolds is confirmed to be competitive and reversible.Conclusions: The urea scaffold and the thiourea scaffold are promising leads for the development of new effective chemotherapy for trypanosomal diseases. Libraries of compounds of both scaffolds need to be synthesized and screened against a series of homologous parasitic cysteine proteases to optimize the potency of the initial leads.  相似文献   

16.
This article described how further extensive variation of the substituents on the purine scaffold of adenosine triphosphate (ATP), and the human anti-platelet aggregation activities were modified in order to find exploitation of the structure–activity relationships (SAR). A series of novel designed 6-alkylamino-2-alkylthio-9-hydroxyalkyl(carbalkoxy) purine derivatives were synthesized via a modification procedure, and the human anti-platelet aggregation activities were evaluated. The SAR of these compounds were analyzed in detail, and the results of the structural requirements of the substituents to improve potency may provide a basis for the development of potent P2Y12 antagonists.  相似文献   

17.
This publication describes processes for the selection of chemical compounds for the building of a high-throughput screening (HTS) collection for drug discovery, using the currently implemented process in the Discovery Technologies Unit of the Novartis Institute for Biomedical Research, Basel Switzerland as reference. More generally, the currently existing compound acquisition models and practices are discussed. Our informatics, chemistry and biology-driven compound selection consists of two steps: 1) The individual compounds are filtered and grouped into three priority classes on the basis of their individual structural properties. Substructure filters are used to eliminate or penalize compounds based on unwanted structural properties. The similarity of the structures to reference ligands of the main proven druggable target families is computed, and drug-similar compounds are prioritized for the following diversity analysis. 2) The compounds are compared to the archive compounds and a diversity analysis is performed. This is done separately for the prioritized, regular and penalized compounds with increasingly stringent dissimilarity criterion. The process includes collecting vendor catalogues and monitoring the availability of samples together with the selection and purchase decision points. The development of a corporate vendor catalogue database is described. In addition to the selection methods on a per single molecule basis, selection criteria for scaffold and combinatorial chemistry projects in collaboration with compound vendors are discussed.  相似文献   

18.
It is well appreciated that the results of ligand-based virtual screening (LBVS) are much influenced by methodological details, given the generally strong compound class dependence of LBVS methods. It is less well understood to what extent structure-activity relationship (SAR) characteristics might influence the outcome of LBVS. We have assessed the hypothesis that the success of prospective LBVS depends on the SAR tolerance of screening targets, in addition to methodological aspects. In this context, SAR tolerance is rationalized as the ability of a target protein to specifically interact with series of structurally diverse active compounds. In compound data sets, SAR tolerance articulates itself as SAR continuity, i.e., the presence of structurally diverse compounds having similar potency. In order to analyze the role of SAR tolerance for LBVS, activity landscape representations of compounds active against 16 different target proteins were generated for which successful LBVS applications were reported. In all instances, the activity landscapes of known active compounds contained multiple regions of local SAR continuity. When analyzing the location of newly identified LBVS hits and their SAR environments, we found that these hits almost exclusively mapped to regions of distinct local SAR continuity. Taken together, these findings indicate the presence of a close link between SAR tolerance at the target level, SAR continuity at the ligand level, and the probability of LBVS success.  相似文献   

19.
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (~76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.  相似文献   

20.
Protein-ligand interaction fingerprints have been used to postprocess docking poses of three ligand data sets: a set of 40 low-molecular-weight compounds from the Protein Data Bank, a collection of 40 scaffolds from pharmaceutically relevant protein ligands, and a database of 19 scaffolds extracted from true cdk2 inhibitors seeded in 2230 scaffold decoys. Four popular docking tools (FlexX, Glide, Gold, and Surflex) were used to generate poses for ligands of the three data sets. In all cases, scoring by the similarity of interaction fingerprints to a given reference was statistically superior to conventional scoring functions in posing low-molecular-weight fragments, predicting protein-bound scaffold coordinates according to the known binding mode of related ligands, and screening a scaffold library to enrich a hit list in true cdk2-targeted scaffolds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号