首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
Benchmark calculations are essential for the evaluation of virtual screening (VS) methods. Typically, classes of known active compounds taken from the medicinal chemistry literature are divided into reference molecules (search templates) and potential hits that are added to background databases assumed to consist of compounds not sharing this activity. Then VS calculations are carried out, and the recall of known active compounds is determined. However, conventional benchmarking is affected by a number of problems that reduce its value for method evaluation. In addition to often insufficient statistical validation and the lack of generally accepted evaluation standards, the artificial nature of typical benchmark settings is often criticized. Retrospective benchmark calculations generally overestimate the potential of VS methods and do not scale with their performance in prospective applications. In order to provide additional opportunities for benchmarking that more closely resemble practical VS conditions, we have designed a publicly available compound database (DB) of reproducible virtual screens (REPROVIS-DB) that organizes information from successful ligand-based VS applications including reference compounds, screening databases, compound selection criteria, and experimentally confirmed hits. Using the currently available 25 hand-selected compound data sets, one can attempt to reproduce successful virtual screens with other than the originally applied methods and assess their potential for practical applications.  相似文献   

2.
Fingerprint scaling is a method to increase the performance of similarity search calculations. It is based on the detection of bit patterns in keyed fingerprints that are signatures of specific compound classes. Application of scaling factors to consensus bits that are mostly set on emphasizes signature bit patterns during similarity searching and has been shown to improve search results for different fingerprints. Similarity search profiling has recently been introduced as a method to analyze similarity search calculations. Profiles separately monitor correctly identified hits and other detected database compounds as a function of similarity threshold values and make it possible to estimate whether virtual screening calculations can be successful or to evaluate why they fail. This similarity search profile technique has been applied here to study fingerprint scaling in detail and better understand effects that are responsible for its performance. In particular, we have focused on the qualitative and quantitative analysis of similarity search profiles under scaling conditions. Therefore, we have carried out systematic similarity search calculations for 23 biological activity classes under scaling conditions over a wide range of scaling factors in a compound database containing approximately 1.3 million molecules and monitored these calculations in similarity search profiles. Analysis of these profiles confirmed increases in hit rates as a consequence of scaling and revealed that scaling influences similarity search calculations in different ways. Based on scaled similarity search profiles, compound sets could be divided into different categories. In a number of cases, increases in search performance under scaling conditions were due to a more significant relative increase in correctly identified hits than detected false-positives. This was also consistent with the finding that preferred similarity threshold values increased due to fingerprint scaling, which was well illustrated by similarity search profiling.  相似文献   

3.
In this review, we discuss a number of computational methods that have been developed or adapted for molecule classification and virtual screening (VS) of compound databases. In particular, we focus on approaches that are complementary to high-throughput screening (HTS). The discussion is limited to VS methods that operate at the small molecular level, which is often called ligand-based VS (LBVS), and does not take into account docking algorithms or other structure-based screening tools. We describe areas that greatly benefit from combining virtual and biological screening and discuss computational methods that are most suitable to contribute to the integration of screening technologies. Relevant approaches range from established methods such as clustering or similarity searching to techniques that have only recently been introduced for LBVS applications such as statistical methods or support vector machines. Finally, we discuss a number of representative applications at the interface between VS and HTS.  相似文献   

4.
Ideally, a team of biologists, medicinal chemists and information specialists will evaluate the hits from high throughput screening. In practice, it often falls to nonmedicinal chemists to make the initial evaluation of HTS hits. Chemical genetics and high content screening both rely on screening in cells or animals where the biological target may not be known. There is a need to place active compounds into a context to suggest potential biological mechanisms. Our idea is to build an operating environment to help the biologist make the initial evaluation of HTS data. To this end the operating environment provides viewing of compound structure files, computation of basic biologically relevant chemical properties and searching against biologically annotated chemical structure databases. The benefit is to help the nonmedicinal chemist, biologist and statistician put compounds into a potentially informative biological context. Although there are several similar public and private programs used in the pharmaceutical industry to help evaluate hits, these programs are often built for computational chemists. Our program is designed for use by biologists and statisticians.  相似文献   

5.
6.
Integration of flexible data-analysis tools with cheminformatics methods is a prerequisite for successful identification and validation of “hits” in high-throughput screening (HTS) campaigns. We have designed, developed, and implemented a suite of robust yet flexible cheminformatics tools to support HTS activities at the Broad Institute, three of which are described herein. The “hit-calling” tool allows a researcher to set a hit threshold that can be varied during downstream analysis. The results from the hit-calling exercise are reported to a database for record keeping and further data analysis. The “cherry-picking” tool enables creation of an optimized list of hits for confirmatory and follow-up assays from an HTS hit list. This tool allows filtering by computed chemical property and by substructure. In addition, similarity searches can be performed on hits of interest and sets of related compounds can be selected. The third tool, an “S/SAR viewer,” has been designed specifically for the Broad Institute’s diversity-oriented synthesis (DOS) collection. The compounds in this collection are rich in chiral centers and the full complement of all possible stereoisomers of a given compound are present in the collection. The S/SAR viewer allows rapid identification of both structure/activity relationships and stereo-structure/activity relationships present in HTS data from the DOS collection. Together, these tools enable the prioritization and analysis of hits from diverse compound collections, and enable informed decisions for follow-up biology and chemistry efforts.  相似文献   

7.
An analysis method termed similarity search profiling has been developed to evaluate fingerprint-based virtual screening calculations. The analysis is based on systematic similarity search calculations using multiple template compounds over the entire value range of a similarity coefficient. In graphical representations, numbers of correctly identified hits and other detected database compounds are separately monitored. The resulting profiles make it possible to determine whether a virtual screening trial can in principle succeed for a given compound class, search tool, similarity metric, and selection criterion. As a test case, we have analyzed virtual screening calculations using a recently designed fingerprint on 23 different biological activity classes in a compound source database containing approximately 1.3 million molecules. Based on our predefined selection criteria, we found that virtual screening analysis was successful for 19 of 23 compound classes. Profile analysis also makes it possible to determine compound class-specific similarity threshold values for similarity searching.  相似文献   

8.
9.
In this investigation, we describe the discovery of novel potent Pim-1 inhibitors by employing a proposed hierarchical multistage virtual screening (VS) approach, which is based on support vector machine-based (SVM-based VS or SB-VS), pharmacophore-based VS (PB-VS), and docking-based VS (DB-VS) methods. In this approach, the three VS methods are applied in an increasing order of complexity so that the first filter (SB-VS) is fast and simple, while successive ones (PB-VS and DB-VS) are more time-consuming but are applied only to a small subset of the entire database. Evaluation of this approach indicates that it can be used to screen a large chemical library rapidly with a high hit rate and a high enrichment factor. This approach was then applied to screen several large chemical libraries, including PubChem, Specs, and Enamine as well as an in-house database. From the final hits, 47 compounds were selected for further in vitro Pim-1 inhibitory assay, and 15 compounds show nanomolar level or low micromolar inhibition potency against Pim-1. In particular, four of them were found to have new scaffolds which have potential for the chemical development of Pim-1 inhibitors.  相似文献   

10.
The main goal of high-throughput screening (HTS) is to identify active chemical series rather than just individual active compounds. In light of this goal, a new method (called compound set enrichment) to identify active chemical series from primary screening data is proposed. The method employs the scaffold tree compound classification in conjunction with the Kolmogorov-Smirnov statistic to assess the overall activity of a compound scaffold. The application of this method to seven PubChem data sets (containing between 9389 and 263679 molecules) is presented, and the ability of this method to identify compound classes with only weakly active compounds (potentially latent hits) is demonstrated. The analysis presented here shows how methods based on an activity cutoff can distort activity information, leading to the incorrect activity assignment of compound series. These results suggest that this method might have utility in the rational selection of active classes of compounds (and not just individual active compounds) for followup and validation.  相似文献   

11.
A process for objective identification and filtering of undesirable compounds that contribute to high-throughput screening (HTS) deck promiscuity is described. Two methods of mapping hit promiscuity have been developed linking SMARTS-based structural queries with historical primary HTS data. The first compares an expected assay hit rate to actual hit rates. The second examines the propensity of an individual compound to hit multiple assays. Statistical evaluation of the data indicates a correlation between the resultant functional group filters and compound promiscuity. These data corroborate a number of commonly applied filters as well as producing some unexpected results. Application of these models to HTS collection triage reduced the number of in-house compounds considered for screening by 12%. The implications of these findings are further discussed in the context of the HTS screening set and combinatorial library design as well as compound acquisition.  相似文献   

12.
13.
A methodology is introduced to assign energy-based scores to two-dimensional (2D) structural features based on three-dimensional (3D) ligand-target interaction information and utilize interaction-annotated features in virtual screening. Database molecules containing such fragments are assigned cumulative scores that serve as a measure of similarity to active reference compounds. The Interaction Annotated Structural Features (IASF) method is applied to mine five high-throughput screening (HTS) data sets and often identifies more hits than conventional fragment-based similarity searching or ligand-protein docking.  相似文献   

14.
High-throughput screening (HTS) of large compound collections typically results in numerous small molecule hits that must be carefully evaluated to identify valid drug leads. Although several filtering mechanisms and other tools exist that can assist the chemist in this process, it is often the case that costly synthetic resources are expended in pursuing false positives. We report here a rapid and reliable NMR-based method for identifying reactive false positives including those that oxidize or alkylate a protein target. Importantly, the reactive species need not be the parent compound, as both reactive impurities and breakdown products can be detected. The assay is called ALARM NMR (a La assay to detect reactive molecules by nuclear magnetic resonance) and is based on monitoring DTT-dependent (13)C chemical shift changes of the human La antigen in the presence of a test compound or mixture. Extensive validation has been performed to demonstrate the reliability and utility of using ALARM NMR to assess thiol reactivity. This included comparing ALARM NMR to a glutathione-based fluorescence assay, as well as testing a collection of more than 3500 compounds containing HTS hits from 23 drug targets. The data show that current in silico filtering tools fail to identify more than half of the compounds that can act via reactive mechanisms. Significantly, we show how ALARM NMR data has been critical in identifying reactive compounds that would otherwise have been prioritized for lead optimization. In addition, a new filtering tool has been developed on the basis of the ALARM NMR data that can augment current in silico programs for identifying nuisance compounds and improving the process of hit triage.  相似文献   

15.
High-throughput ligand-based NMR screening with competition binding experiments is extended to (19)F detection. Fluorine is a favorable nucleus for these experiments because of the significant contribution of the Chemical Shift Anisotropy (CSA) to the (19)F transverse relaxation of the ligand signal when bound to a macromolecular target. A low to moderate affinity ligand containing a fluorine atom is used as a reference molecule for the detection and characterization of new ligands. Titration NMR experiments with the selected reference compound are performed for finding the optimal set-up conditions for HTS and for deriving the binding constants of the identified NMR hits. Rapid HTS of large chemical mixtures and plant or fungi extracts against the receptor of interest is possible due to the high sensitivity of the (19)F nucleus and the absence of overlap with the signals of the mixtures to be screened. Finally, a novel approach for HTS using a reference molecule in combination with a control molecule is presented.  相似文献   

16.
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.  相似文献   

17.
Non-specific chemical modification of protein thiol groups continues to be a significant source of false positive hits from high-throughput screening campaigns and can even plague certain protein targets and chemical series well into lead optimization. While experimental tools exist to assess the risk and promiscuity associated with the chemical reactivity of existing compounds, computational tools are desired that can reliably identify substructures that are associated with chemical reactivity to aid in triage of HTS hit lists, external compound purchases, and library design. Here we describe a Bayesian classification model derived from more than 8,800 compounds that have been experimentally assessed for their potential to covalently modify protein targets. The resulting model can be implemented in the large-scale assessment of compound libraries for purchase or design. In addition, the individual substructures identified as highly reactive in the model can be used as look-up tables to guide chemists during hit-to-lead and lead optimization campaigns.  相似文献   

18.
In the last decade mass screening strategies became the main source of leads in drug discovery settings. Although high throughput (HTS) and virtual screening (VS) realize the same concept the different nature of these lead discovery strategies (experimental vs theoretical) results that they are typically applied separately. The majority of drug leads are still identified by hit-to-lead optimization of screening hits. Structural information on the target as well as on bound ligands, however, make structure-based and ligand-based virtual screening available for the identification of alternative chemical starting points. Although, the two techniques have rarely been used together on the same target, here we review the existing prominent studies on their true integration. Various approaches have been shown to apply the combination of HTS and VS and to better use them in lead generation. Although several attempts on their integration have only been considered at a conceptual level, there are numerous applications underlining its relevance that early-stage pharmaceutical drug research could benefit from a combined approach.  相似文献   

19.
A statistical approach named the conditional correlated Bernoulli model is introduced for modeling of similarity scores and predicting the potential of fingerprint search calculations to identify active compounds. Fingerprint features are rationalized as dependent Bernoulli variables and conditional distributions of Tanimoto similarity values of database compounds given a reference molecule are assessed. The conditional correlated Bernoulli model is utilized in the context of virtual screening to estimate the position of a compound obtaining a certain similarity value in a database ranking. Through the generation of receiver operating characteristic curves from cumulative distribution functions of conditional similarity values for known active and random database compounds, one can predict how successful a fingerprint search might be. The comparison of curves for different fingerprints makes it possible to identify fingerprints that are most likely to identify new active molecules in a database search given a set of known reference molecules.  相似文献   

20.
Using a data set comprised of literature compounds and structure-activity data for cyclin dependent kinase 2, several pharmacophore hypotheses were generated using Catalyst and evaluated using several criteria. The two best were used in retrospective searches of 10 three-dimensional databases containing over 1,000,000 proprietary compounds. The results were then analyzed for the efficiency with which the hypotheses performed in the areas of compound prioritization, library prioritization, and library design. First as a test of their compound prioritization capabilities, the pharmacophore models were used to search combinatorial libraries that were known to contain CDK active compounds to see if the pharmacophore models could selectively choose the active compounds over the inactive compounds. Second as a test of their utility in library design again the pharmacophore models were used to search the active combinatorial libraries to see if the key synthons were over represented in the hits from the pharmacophore searches. Finally as a test of their ability to prioritize combinatorial libraries, several inactive libraries were searched in addition to the active libraries in order to see if the active libraries produced significantly more hits than the inactive libraries. For this study the pharmacophore models showed potential in all three areas. For compound prioritization, one of the models selected active compounds at a rate nearly 11 times that of random compound selection though in other cases models missed the active compounds entirely. For library design, most of the key fragments were over represented in the hits from at least one of the searches though again some key fragments were missed. Finally, for library prioritization, the two active libraries both produced a significant number of hits with both pharmacophore models, whereas none of the eight inactive libraries produced a significant number of hits for both models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号