期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mini-fingerprints for virtual screening: Design principles and generation of novel prototypes based on information theory

L. Xue J.W. Godden J. Bajorath 《SAR and QSAR in environmental research》2013,24(1):27-40

相似文献

2.

Mini-fingerprints for virtual screening: design principles and generation of novel prototypes based on information theory

Xue L Godden JW Bajorath J 《SAR and QSAR in environmental research》2003,14(1):27-40

相似文献

3.

Exploration of 3D activity cliffs on the basis of compound binding modes and comparison of 2D and 3D cliffs

Hu Y Bajorath J 《Journal of chemical information and modeling》2012,52(3):670-677

Activity cliffs are formed by pairs or groups of structurally similar compounds having large differences in potency and are focal points of structure-activity relationship (SAR) analysis. The choice of molecular representations is a critically important aspect of activity cliffs analysis. Thus far, activity cliffs have predominantly been defined on the basis of molecular graph or fingerprint representations. Herein we introduce 3D activity cliffs derived from comparisons of experimentally determined compound binding modes. The analysis of 3D activity cliffs is generally applicable to target proteins for which structures of multiple ligand complexes are available. For two popular targets, β-secretase 1 (BACE1) and factor Xa (FXa), public domain X-ray structures with bound inhibitors were collected. Crystallographic binding modes of inhibitors were systematically compared using a 3D similarity method taking conformational, positional, and atomic property differences into account. In addition, standard 2D similarity relationships were also determined. SAR information associated with individual compounds substantially changed when either bioactive conformations or 2D molecular graphs were used for similarity evaluation. 3D activity cliffs were identified for BACE1 and FXa inhibitor sets and systematically compared to 2D cliffs. It was found that less than 40% of 3D activity cliffs were conserved when 2D similarity was applied. The limited conservation of 3D and 2D cliffs provides further evidence for the strong molecule representation dependence of activity cliffs. Moreover, 3D cliffs represent a new class of activity cliffs that convey SAR information in ways that differ from graph-based similarity measures. In cases where sufficient structural information is available, the comparison of 3D and 2D cliffs is expected to aid in SAR analysis and mapping of critical binding determinants. 相似文献

4.

Drug discovery using very large numbers of patents. General strategy with extensive use of match and edit operations

Robson B Li J Dettinger R Peters A Boyer SK 《Journal of computer-aided molecular design》2011,25(5):427-441

A patent data base of 6.7 million compounds generated by a very high performance computer (Blue Gene) requires new techniques for exploitation when extensive use of chemical similarity is involved. Such exploitation includes the taxonomic classification of chemical themes, and data mining to assess mutual information between themes and companies. Importantly, we also launch candidates that evolve by “natural selection” as failure of partial match against the patent data base and their ability to bind to the protein target appropriately, by simulation on Blue Gene. An unusual feature of our method is that algorithms and workflows rely on dynamic interaction between match-and-edit instructions, which in practice are regular expressions. Similarity testing by these uses SMILES strings and, less frequently, graph or connectivity representations. Examining how this performs in high throughput, we note that chemical similarity and novelty are human concepts that largely have meaning by utility in specific contexts. For some purposes, mutual information involving chemical themes might be a better concept. 相似文献

5.

Exploiting structural information in patent specifications for key compound prediction

Tyrchan C Boström J Giordanetto F Winter J Muresan S 《Journal of chemical information and modeling》2012,52(6):1480-1489

Patent specifications are one of many information sources needed to progress drug discovery projects. Understanding compound prior art and novelty checking, validation of biological assays, and identification of new starting points for chemical explorations are a few areas where patent analysis is an important component. Cheminformatics methods can be used to facilitate the identification of so-called key compounds in patent specifications. Such methods, relying on structural information extracted from documents by expert curation or text mining, can complement or in some cases replace the traditional manual approach of searching for clues in the text. This paper describes and compares three different methods for the automatic prediction of key compounds in patent specifications using structural information alone. For this data set, the cluster seed analysis described by Hattori et al. (Hattori, K.; Wakabayashi, H.; Tamaki, K. Predicting key example compounds in competitors' patent applications using structural information alone. J. Chem. Inf. Model.2008, 48, 135-142) is superior in terms of prediction accuracy with 26 out of 48 drugs (54%) correctly predicted from their corresponding patents. Nevertheless, the two new methods, based on frequency of R-groups (FOG) and maximum common substructure (MCS) similarity measures, show significant advantages due to their inherent ability to visualize relevant structural features. The results of the FOG method can be enhanced by manual selection of the scaffolds used in the analysis. Finally, a successful example of applying FOG analysis for designing potent ATP-competitive AXL kinase inhibitors with improved properties is described. 相似文献

6.

Application of belief theory to similarity data fusion for use in analog searching and lead hopping

Muchmore SW Debe DA Metz JT Brown SP Martin YC Hajduk PJ 《Journal of chemical information and modeling》2008,48(5):941-948

A wide variety of computational algorithms have been developed that strive to capture the chemical similarity between two compounds for use in virtual screening and lead discovery. One limitation of such approaches is that, while a returned similarity value reflects the perceived degree of relatedness between any two compounds, there is no direct correlation between this value and the expectation or confidence that any two molecules will in fact be equally active. A lack of a common framework for interpretation of similarity measures also confounds the reliable fusion of information from different algorithms. Here, we present a probabilistic framework for interpreting similarity measures that directly correlates the similarity value to a quantitative expectation that two molecules will in fact be equipotent. The approach is based on extensive benchmarking of 10 different similarity methods (MACCS keys, Daylight fingerprints, maximum common subgraphs, rapid overlay of chemical structures (ROCS) shape similarity, and six connectivity-based fingerprints) against a database of more than 150,000 compounds with activity data against 23 protein targets. Given this unified and probabilistic framework for interpreting chemical similarity, principles derived from decision theory can then be applied to combine the evidence from different similarity measures in such a way that both capitalizes on the strengths of the individual approaches and maintains a quantitative estimate of the likelihood that any two molecules will exhibit similar biological activity. 相似文献

7.

Design of an activity landscape view taking compound-based feature probabilities into account

Bijun Zhang Martin Vogt Jürgen Bajorath 《Journal of computer-aided molecular design》2014,28(9):919-926

Activity landscapes (ALs) of compound data sets are rationalized as graphical representations that integrate similarity and potency relationships between active compounds. ALs enable the visualization of structure–activity relationship (SAR) information and are thus computational tools of interest for medicinal chemistry. For AL generation, similarity and potency relationships are typically evaluated in a pairwise manner and major AL features are assessed at the level of compound pairs. In this study, we add a conditional probability formalism to AL design that makes it possible to quantify the probability of individual compounds to contribute to characteristic AL features. Making this information graphically accessible in a molecular network-based AL representation is shown to further increase AL information content and helps to quickly focus on SAR-informative compound subsets. This feature probability-based AL variant extends the current spectrum of AL representations for medicinal chemistry applications. 相似文献

8.

Congenericity of Claimed Compounds in Patent Applications

Maria J. Falaguera Jordi Mestres 《Molecules (Basel, Switzerland)》2021,26(17)

A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set of compounds around a more vaguely described patent claim. An application to 750 common patents available in SureChEMBL, SureChEMBLccs and ChEMBL is presented and the congenericity of patent compounds in those different sources discussed. 相似文献

9.

Consensus models of activity landscapes with multiple chemical, conformer, and property representations

Yongye AB Byler K Santos R Martínez-Mayorga K Maggiora GM Medina-Franco JL 《Journal of chemical information and modeling》2011,51(6):1259-1270

相似文献

10.

Design of multitarget activity landscapes that capture hierarchical activity cliff distributions

Dimova D Wawer M Wassermann AM Bajorath J 《Journal of chemical information and modeling》2011,51(2):258-266

An activity landscape model of a compound data set can be rationalized as a graphical representation that integrates molecular similarity and potency relationships. Activity landscape representations of different design are utilized to aid in the analysis of structure-activity relationships and the selection of informative compounds. Activity landscape models reported thus far focus on a single target (i.e., a single biological activity) or at most two targets, giving rise to selectivity landscapes. For compounds active against more than two targets, landscapes representing multitarget activities are difficult to conceptualize and have not yet been reported. Herein, we present a first activity landscape design that integrates compound potency relationships across multiple targets in a formally consistent manner. These multitarget activity landscapes are based on a general activity cliff classification scheme and are visualized in graph representations, where activity cliffs are represented as edges. Furthermore, the contributions of individual compounds to structure-activity relationship discontinuity across multiple targets are monitored. The methodology has been applied to derive multitarget activity landscapes for compound data sets active against different target families. The resulting landscapes identify single-, dual-, and triple-target activity cliffs and reveal the presence of hierarchical cliff distributions. From these multitarget activity landscapes, compounds forming complex activity cliffs can be readily selected. 相似文献

11.

Predicting key example compounds in competitors' patent applications using structural information alone

Hattori K Wakabayashi H Tamaki K 《Journal of chemical information and modeling》2008,48(1):135-142

In drug discovery programs, predicting key example compounds in competitors' patent applications is important work for scientists working in the same or in related research areas. In general, medicinal chemists are responsible for this work, and they attempt to guess the identity of key compounds based on information provided in patent applications, such as biological data, scale of reaction, and/or optimization of the salt form for a particular compound. However, this is sometimes made difficult by the lack of such information. This paper describes a method for predicting key compounds in competitors' patent applications by using only structural information of example compounds. Based on the assumption that medicinal chemists usually carry out extensive structure--activity relationship (SAR) studies around key compounds, the method identifies compounds located at the centers of densely populated regions in the patent examples' chemical space, as represented by Extended Connectivity Fingerprints (ECFPs). For the validation of the method, a total of 30 patents containing structures of launched drugs were selected to test whether or not the method is able to predict key compounds (the launched drugs). In 17 out of the 30 patents (57%), the method was able to successfully predict the key compounds. The result indicates that our method could provide an alternative approach to predicting key compounds in cases where the conventional medicinal chemist's approach does not work well. This method could also be used as a complement to the traditional medicinal chemist's approach. 相似文献

12.

Xue L Godden JW Stahura FL Bajorath J 《Journal of chemical information and computer sciences》2004,44(4):1275-1281

An analysis method termed similarity search profiling has been developed to evaluate fingerprint-based virtual screening calculations. The analysis is based on systematic similarity search calculations using multiple template compounds over the entire value range of a similarity coefficient. In graphical representations, numbers of correctly identified hits and other detected database compounds are separately monitored. The resulting profiles make it possible to determine whether a virtual screening trial can in principle succeed for a given compound class, search tool, similarity metric, and selection criterion. As a test case, we have analyzed virtual screening calculations using a recently designed fingerprint on 23 different biological activity classes in a compound source database containing approximately 1.3 million molecules. Based on our predefined selection criteria, we found that virtual screening analysis was successful for 19 of 23 compound classes. Profile analysis also makes it possible to determine compound class-specific similarity threshold values for similarity searching. 相似文献

13.

Polypharmacology directed compound data mining: identification of promiscuous chemotypes with different activity profiles and comparison to approved drugs

Hu Y Bajorath J 《Journal of chemical information and modeling》2010,50(12):2112-2118

Increasing evidence that many pharmaceutically relevant compounds elicit their effects through binding to multiple targets, so-called polypharmacology, is beginning to change conventional drug discovery and design strategies. In light of this paradigm shift, we have mined publicly available compound and bioactivity data for promiscuous chemotypes. For this purpose, a hierarchy of active compounds, atomic property based scaffolds, and unique molecular topologies were generated, and activity annotations were analyzed using this framework. Starting from ～35?000 compounds active against human targets with at least 1 μM potency, 33 chemotypes with distinct topology were identified that represented molecules active against at least 3 different target families. Network representations were utilized to study scaffold-target family relationships and activity profiles of scaffolds corresponding to promiscuous chemotypes. A subset of promiscuous chemotypes displayed a significant enrichment in drugs over bioactive compounds. A total of 190 drugs were identified that had on average only 2 known target annotations but belonged to the 7 most promiscuous chemotypes that were active against 8-15 target families. These drugs should be attractive candidates for polypharmacological profiling. 相似文献

14.

Phytochemical databases of Chinese herbal constituents and bioactive plant compounds with known target specificities

Ehrman TM Barlow DJ Hylands PJ 《Journal of chemical information and modeling》2007,47(2):254-263

Two databases have been constructed to facilitate applications of cheminformatics and molecular modeling to medicinal plants. The first contains data on known chemical constituents of 240 commonly used Chinese herbs, the other contains information on target specificities of bioactive plant compounds. Structures are available for all compounds. In the case of the Chinese herbal constituents database, further details include trivial and systematic names, compound class and skeletal type, botanical and Chinese (pinyin) names of associated herb(s), CAS registry number, chirality, pharmacological and toxicological information, and chemical references. For the bioactive plant compounds database, details of molecular target(s), IC50 and related measures, and associated botanical species are given. For Chinese herbs, approximately 7000 unique compounds are listed, though some are found in more than one herb, the total number for all herbs being 8264. For bioactive plant compounds, 2597 compounds active against 78 molecular targets are covered. Statistical relationships within and between the two databases are explored. 相似文献

15.

Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure

Medina-Franco JL Yongye AB Pérez-Villanueva J Houghten RA Martínez-Mayorga K 《Journal of chemical information and modeling》2011,51(9):2427-2439

Dual and triple activity-difference (DAD/TAD) maps are tools for the systematic characterization of structure-activity relationships (SAR) of compound data sets screened against two or three targets. DAD and TAD maps are two- and three- dimensional representations of the pairwise activity differences of compound data sets, respectively. Adding pairwise structural similarity information into these maps readily reveals activity cliff regions in the SAR for one, two, or three targets. In addition, pairs of compounds in the smooth regions of the SAR and scaffold hops are also easily identified in these maps. Herein, DAD and TAD maps are employed for the systematic characterization of the SAR of a benchmark set of 299 compounds screened against dopamine, norepinephrine, and serotonin transporters. To reduce the well-known dependence of the activity landscape on the structural representation, five selected 2D and 3D structure representations were used to characterize the SAR. Systematic analysis of the DAD and TAD maps reveals regions in the landscape with similar SAR for two or the three targets as well as regions with inverse SAR, i.e., changes in structure that increase activity for one target, but decrease activity for the other target. Focusing the analysis on pairs of compounds with high structure similarity revealed the presence of single-, dual-, and triple-target activity cliffs, i.e., small changes in structure with high changes in potency for one, two, or the three targets, respectively. Triple-target scaffold hops are also discussed. Activity cliffs and scaffold hops were also quantified and represented using two recently proposed approaches namely, mean Structure Activity Landscape Index (mean SALI) and Consensus Structure-Activity Similarity (SAS) maps. 相似文献

16.

Molecular topology analysis of the differences between drugs, clinical candidate compounds, and bioactive molecules

Chen H Yang Y Engkvist O 《Journal of chemical information and modeling》2010,50(12):2141-2150

A new method to decompose molecules is proposed and used to analyze drugs, clinical candidate compounds and bioactive molecules. The method classifies a set of molecules into a few well-defined classes based on their molecular framework. It is then possible to use these classes to investigate differences between drugs, clinical candidates and bioactive molecules. The analysis shows that in comparison with clinical candidates and bioactive compounds, drugs have a higher fraction of compounds with only one ring system. This conclusion is still valid after correcting for lipophilicity (ClogP) and molecular size, as well as any potential protein target bias in the data sets. Furthermore the molecular bridge part of compounds in the drug set has on average fewer ring systems than molecules from the other sets. The ring system complexity (RSC) was also investigated and for most topological classes drugs have a lower RSC than the clinical candidates and bioactive molecules. Hence, this study highlights differences in topology between drugs, clinical candidate compounds and bioactive molecules. 相似文献

17.

Comparison of ranking methods for virtual screening in lead-discovery programs

Wilton D Willett P Lawson K Mullier G 《Journal of chemical information and computer sciences》2003,43(2):469-474

This paper discusses the use of several rank-based virtual screening methods for prioritizing compounds in lead-discovery programs, given a training set for which both structural and bioactivity data are available. Structures from the NCI AIDS data set and from the Syngenta corporate database were represented by two types of fragment bit-string and by sets of high-level molecular features. These representations were processed using binary kernel discrimination, similarity searching, substructural analysis, support vector machine, and trend vector analysis, with the effectiveness of the methods being judged by the extent to which active test set molecules were clustered toward the top of the resultant rankings. The binary kernel discrimination approach yielded consistently superior rankings and would appear to have considerable potential for chemical screening applications. 相似文献

18.

Target family-directed exploration of scaffolds with different SAR profiles

Hu Y Bajorath J 《Journal of chemical information and modeling》2011,51(12):3138-3148

The scaffold concept is widely applied in chemoinformatics and medicinal chemistry to organize bioactive compounds according to common core structures or associate compound classes with specific biological activities. A variety of scaffold analyses have been carried out to derive statistics for scaffold distributions, generate structural organization schemes, or identify scaffolds that preferentially occur in given compound activity classes. Herein we further extend scaffold analysis by identifying scaffolds that display defined SAR profiles consisting of multiple properties. A structural relationship-based scaffold network has been designed as the basic data structure underlying our analysis. From network representations of scaffolds extracted from compounds active against 32 different target families, scaffolds with different SAR profiles have been extracted on the basis of decision trees that capture structural and functional characteristics of scaffolds in different ways. More than 600 scaffolds and 100 scaffold clusters were assigned to 10 SAR profiles. These scaffold sets represent different activity and target selectivity profiles and are provided for further SAR investigations including, for example, the exploration of alternative analog series for a given target of target family or the design of novel compounds on the basis of scaffold(s) with desired SAR profiles. 相似文献

19.

Scaffold hopping using clique detection applied to reduced graphs

Barker EJ Buttar D Cosgrove DA Gardiner EJ Kitts P Willett P Gillet VJ 《Journal of chemical information and modeling》2006,46(2):503-511

Similarity-based methods for virtual screening are widely used. However, conventional searching using 2D chemical fingerprints or 2D graphs may retrieve only compounds which are structurally very similar to the original target molecule. Of particular current interest then is scaffold hopping, that is, the ability to identify molecules that belong to different chemical series but which could form the same interactions with a receptor. Reduced graphs provide summary representations of chemical structures and, therefore, offer the potential to retrieve compounds that are similar in terms of their gross features rather than at the atom-bond level. Using only a fingerprint representation of such graphs, we have previously shown that actives retrieved were more diverse than those found using Daylight fingerprints. Maximum common substructures give an intuitively reasonable view of the similarity between two molecules. However, their calculation using graph-matching techniques is too time-consuming for use in practical similarity searching in larger data sets. In this work, we exploit the low cardinality of the reduced graph in graph-based similarity searching. We reinterpret the reduced graph as a fully connected graph using the bond-distance information of the original graph. We describe searches, using both the maximum common induced subgraph and maximum common edge subgraph formulations, on the fully connected reduced graphs and compare the results with those obtained using both conventional chemical and reduced graph fingerprints. We show that graph matching using fully connected reduced graphs is an effective retrieval method and that the actives retrieved are likely to be topologically different from those retrieved using conventional 2D methods. 相似文献

20.

Intuitive patent Markush structure visualization tool for medicinal chemists

Deng W Berthel SJ So WV 《Journal of chemical information and modeling》2011,51(3):511-520

A Markush, or generic structure, is a widely used convention in chemical and pharmaceutical patents. The flexibility and complexity of this format, however, preclude an easy understanding and analysis of chemical space. In this paper, an application package called MarVis (Markush Visualization) is introduced to help chemists visualize Markush structures in chemical patents. MarVis can output a report with the Markush structure showing the query substructure and also an R-group table of all the possible R-groups described in the patent. MarVis also has a unique interactive interface that allows chemists to explore and zoom in the chemical space to find a subset of interest. SMILES, with minimal extensions, was used to facilitate a variety of patent Markush structure studies. 相似文献