首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Virtual screening is becoming an important tool for drug discovery. However, the application of virtual screening has been limited by the lack of accurate scoring functions. Here, we present a novel scoring function, MedusaScore, for evaluating protein-ligand binding. MedusaScore is based on models of physical interactions that include van der Waals, solvation, and hydrogen bonding energies. To ensure the best transferability of the scoring function, we do not use any protein-ligand experimental data for parameter training. We then test the MedusaScore for docking decoy recognition and binding affinity prediction and find superior performance compared to other widely used scoring functions. Statistical analysis indicates that one source of inaccuracy of MedusaScore may arise from the unaccounted entropic loss upon ligand binding, which suggests avenues of approach for further MedusaScore improvement.  相似文献   

2.
Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening (VS), which is most frequently manifested in the scoring functions' inability to discriminate between true ligands vs known nonbinders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from VS. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of VS in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in VS studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE::HMSCORE, ChemScore, PLP, and Chemgauss3, in 6 out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP::LBX). We also compare our method with FLAP::RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP::RBLB, hinting effective directions for best VS applications. We suggest that this integrative VS approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.  相似文献   

3.
4.

Drug Design Data Resource (D3R) Grand Challenge 4 (GC4) offered a unique opportunity for designing and testing novel methodology for accurate docking and affinity prediction of ligands in an open and blinded manner. We participated in the beta-secretase 1 (BACE) Subchallenge which is comprised of cross-docking and redocking of 20 macrocyclic ligands to BACE and predicting binding affinity for 154 macrocyclic ligands. For this challenge, we developed machine learning models trained specifically on BACE. We developed a deep neural network (DNN) model that used a combination of both structure and ligand-based features that outperformed simpler machine learning models. According to the results released by D3R, we achieved a Spearman's rank correlation coefficient of 0.43(7) for predicting the affinity of 154 ligands. We describe the formulation of our machine learning strategy in detail. We compared the performance of DNN with linear regression, random forest, and support vector machines using ligand-based, structure-based, and combining both ligand and structure-based features. We compared different structures for our DNN and found that performance was highly dependent on fine optimization of the L2 regularization hyperparameter, alpha. We also developed a novel metric of ligand three-dimensional similarity inspired by crystallographic difference density maps to match ligands without crystal structures to similar ligands with known crystal structures. This report demonstrates that detailed parameterization, careful data training and implementation, and extensive feature analysis are necessary to obtain strong performance with more complex machine learning methods. Post hoc analysis shows that scoring functions based only on ligand features are competitive with those also using structural features. Our DNN approach tied for fifth in predicting BACE-ligand binding affinities.

  相似文献   

5.
Fourteen popular scoring functions, i.e., X-Score, DrugScore, five scoring functions in the Sybyl software (D-Score, PMF-Score, G-Score, ChemScore, and F-Score), four scoring functions in the Cerius2 software (LigScore, PLP, PMF, and LUDI), two scoring functions in the GOLD program (GoldScore and ChemScore), and HINT, were tested on the refined set of the PDBbind database, a set of 800 diverse protein-ligand complexes with high-resolution crystal structures and experimentally determined Ki or Kd values. The focus of our study was to assess the ability of these scoring functions to predict binding affinities based on the experimentally determined high-resolution crystal structures of proteins in complex with their ligands. The quantitative correlation between the binding scores produced by each scoring function and the known binding constants of the 800 complexes was computed. X-Score, DrugScore, Sybyl::ChemScore, and Cerius2::PLP provided better correlations than the other scoring functions with standard deviations of 1.8-2.0 log units. These four scoring functions were also found to be robust enough to carry out computation directly on unaltered crystal structures. To examine how well scoring functions predict the binding affinities for ligands bound to the same target protein, the performance of these 14 scoring functions were evaluated on three subsets of protein-ligand complexes from the test set: HIV-1 protease complexes (82 entries), trypsin complexes (45 entries), and carbonic anhydrase II complexes (40 entries). Although the results for the HIV-1 protease subset are less than desirable, several scoring functions are able to satisfactorily predict the binding affinities for the trypsin and the carbonic anhydrase II subsets with standard deviation as low as 1.0 log unit (corresponding to 1.3-1.4 kcal/mol at room temperature). Our results demonstrate the strengths as well as the weaknesses of current scoring functions for binding affinity prediction.  相似文献   

6.
We have developed an iterative knowledge-based scoring function (ITScore) to describe protein-ligand interactions. Here, we assess ITScore through extensive tests on native structure identification, binding affinity prediction, and virtual database screening. Specifically, ITScore was first applied to a test set of 100 protein-ligand complexes constructed by Wang et al. (J Med Chem 2003, 46, 2287), and compared with 14 other scoring functions. The results show that ITScore yielded a high success rate of 82% on identifying native-like binding modes under the criterion of rmsd < or = 2 A for each top-ranked ligand conformation. The success rate increased to 98% if the top five conformations were considered for each ligand. In the case of binding affinity prediction, ITScore also obtained a good correlation for this test set (R = 0.65). Next, ITScore was used to predict binding affinities of a second diverse test set of 77 protein-ligand complexes prepared by Muegge and Martin (J Med Chem 1999, 42, 791), and compared with four other widely used knowledge-based scoring functions. ITScore yielded a high correlation of R2 = 0.65 (or R = 0.81) in the affinity prediction. Finally, enrichment tests were performed with ITScore against four target proteins using the compound databases constructed by Jacobsson et al. (J Med Chem 2003, 46, 5781). The results were compared with those of eight other scoring functions. ITScore yielded high enrichments in all four database screening tests. ITScore can be easily combined with the existing docking programs for the use of structure-based drug design.  相似文献   

7.
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 ?. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).  相似文献   

8.
Fast and accurate predicting of the binding affinities of large sets of diverse protein?ligand complexes is an important, yet extremely challenging, task in drug discovery. The development of knowledge-based scoring functions exploiting structural information of known protein?ligand complexes represents a valuable contribution to such a computational prediction. In this study, we report a scoring function named IPMF that integrates additional experimental binding affinity information into the extracted potentials, on the assumption that a scoring function with the "enriched" knowledge base may achieve increased accuracy in binding affinity prediction. In our approach, the functions and atom types of PMF04 were inherited to implicitly capture binding effects that are hard to model explicitly, and a novel iteration device was designed to gradually tailor the initial potentials. We evaluated the performance of the resultant IPMF with a diverse set of 219 protein-ligand complexes and compared it with seven scoring functions commonly used in computer-aided drug design, including GLIDE, AutoDock4, VINA, PLP, LUDI, PMF, and PMF04. While the IPMF is only moderately successful in ranking native or near native conformations, it yields the lowest mean error of 1.41 log K(i)/K(d) units from measured inhibition affinities and the highest Pearson's correlation coefficient of R(p)2 0.40 for the test set. These results corroborate our initial supposition about the role of "enriched" knowledge base. With the rapid growing volume of high-quality structural and interaction data in the public domain, this work marks a positive step toward improving the accuracy of knowledge-based scoring functions in binding affinity prediction.  相似文献   

9.
Improving the scoring functions for small molecule-protein docking is a highly challenging task in current computational drug design. Here we present a novel consensus scoring concept for the prediction of binding modes for multiple known active ligands. Similar ligands are generally believed to bind to their receptor in a similar fashion. The presumption of our approach was that the true binding modes of similar ligands should be more similar to each other compared to false positive binding modes. The number of conserved (consensus) interactions between similar ligands was used as a docking score. Patterns of interactions were modeled using ligand receptor interaction fingerprints. Our approach was evaluated for four different data sets of known cocrystal structures (CDK-2, dihydrofolate reductase, HIV-1 protease, and thrombin). Docking poses were generated with FlexX and rescored by our approach. For comparison the CScore scoring functions from Sybyl were used, and consensus scores were calculated thereof. Our approach performed better than individual scoring functions and was comparable to consensus scoring. Analysis of the distribution of docking poses by self-organizing maps (SOM) and interaction fingerprints confirmed that clusters of docking poses composed of multiple ligands were preferentially observed near the native binding mode. Being conceptually unrelated to commonly used docking scoring functions our approach provides a powerful method to complement and improve computational docking experiments.  相似文献   

10.
A recently introduced new methodology based on ultrashort (50-100 ps) molecular dynamics simulations with a quantum-refined force-field (QRFF-MD) is here evaluated in its ability both to predict protein-ligand binding affinities and to discriminate active compounds from inactive ones. Physically based scoring functions are derived from this approach, and their performance is compared to that of several standard knowledge-based scoring functions. About 40 inhibitors of cyclin-dependent kinase 2 (CDK2) representing a broad chemical diversity were considered. The QRFF-MD method achieves a correlation coefficient, R(2), of 0.55, which is significantly better than that obtained by a number of traditional approaches in virtual screening but only slightly better than that obtained by consensus scoring (R(2) = 0.50). Compounds from the Available Chemical Directory, along with the known active compounds, were docked into the ATP binding site of CDK2 using the program Glide, and the 650 ligands from the top scored poses were considered for a QRFF-MD analysis. Combined with structural information extracted from the simulations, the QRFF-MD methodology results in similar enrichment of known actives compared to consensus scoring. Moreover, a new scoring function is introduced that combines a QRFF-MD based scoring function with consensus scoring, which results in substantial improvement on the enrichment profile.  相似文献   

11.
12.
13.
An extensive evaluation of the linear interaction energy (LIE) method for the prediction of binding affinity of docked compounds has been performed, with an emphasis on its applicability in lead optimization. An automated setup is presented, which allows for the use of the method in an industrial setting. Calculations are performed for four realistic examples, retinoic acid receptor gamma, matrix metalloprotease 3, estrogen receptor alpha, and dihydrofolate reductase, focusing on different aspects of the procedure. The obtained LIE models are evaluated in terms of the root-mean-square (RMS) errors from experimental binding free energies and the ability to rank compounds appropriately. The results are compared to the best empirical scoring function, selected from a set of 10 scoring functions. In all cases, good LIE models can be obtained in terms of free-energy RMS errors, although reasonable ranking of the ligands of dihydrofolate reductase proves difficult for both the LIE method and scoring functions. For the other proteins, the LIE model results in better predictions than the best performing scoring function. These results indicate that the LIE approach, as a tool to evaluate docking results, can be a valuable asset in computational lead optimization programs.  相似文献   

14.
Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.  相似文献   

15.
We continued prospective assessments of the Wilma–solvated interaction energy (SIE) platform for pose prediction, binding affinity prediction, and virtual screening on the challenging SAMPL4 data sets including the HIV-integrase inhibitor and two host–guest systems. New features of the docking algorithm and scoring function are tested here prospectively for the first time. Wilma–SIE provides good correlations with actual binding affinities over a wide range of binding affinities that includes strong binders as in the case of SAMPL4 host–guest systems. Absolute binding affinities are also reproduced with appropriate training of the scoring function on available data sets or from comparative estimation of the change in target’s vibrational entropy. Even when binding modes are known, SIE predictions lack correlation with experimental affinities within dynamic ranges below 2 kcal/mol as in the case of HIV-integrase ligands, but they correctly signaled the narrowness of the dynamic range. Using a common protein structure for all ligands can reduce the noise, while incorporating a more sophisticated solvation treatment improves absolute predictions. The HIV-integrase virtual screening data set consists of promiscuous weak binders with relatively high flexibility and thus it falls outside of the applicability domain of the Wilma–SIE docking platform. Despite these difficulties, unbiased docking around three known binding sites of the enzyme resulted in over a third of ligands being docked within 2 Å from their actual poses and over half of the ligands docked in the correct site, leading to better-than-random virtual screening results.  相似文献   

16.
The Drug Design Data Resource (D3R) Grand Challenges are blind contests organized to assess the state-of-the-art methods accuracy in predicting binding modes and relative binding free energies of experimentally validated ligands for a given target. The second stage of the D3R Grand Challenge 2 (GC2) was focused on ranking 102 compounds according to their predicted affinity for Farnesoid X Receptor. In this task, our workflow was ranked 5th out of the 77 submissions in the structure-based category. Our strategy consisted in (1) a combination of molecular docking using AutoDock 4.2 and manual edition of available structures for binding poses generation using SeeSAR, (2) the use of HYDE scoring for pose selection, and (3) a hierarchical ranking using HYDE and MM/GBSA. In this report, we detail our pose generation and ligands ranking protocols and provide guidelines to be used in a prospective computer aided drug design program.  相似文献   

17.
We present the performance of HADDOCK, our information-driven docking software, in the second edition of the D3R Grand Challenge. In this blind experiment, participants were requested to predict the structures and binding affinities of complexes between the Farnesoid X nuclear receptor and 102 different ligands. The models obtained in Stage1 with HADDOCK and ligand-specific protocol show an average ligand RMSD of 5.1 Å from the crystal structure. Only 6/35 targets were within 2.5 Å RMSD from the reference, which prompted us to investigate the limiting factors and revise our protocol for Stage2. The choice of the receptor conformation appeared to have the strongest influence on the results. Our Stage2 models were of higher quality (13 out of 35 were within 2.5 Å), with an average RMSD of 4.1 Å. The docking protocol was applied to all 102 ligands to generate poses for binding affinity prediction. We developed a modified version of our contact-based binding affinity predictor PRODIGY, using the number of interatomic contacts classified by their type and the intermolecular electrostatic energy. This simple structure-based binding affinity predictor shows a Kendall’s Tau correlation of 0.37 in ranking the ligands (7th best out of 77 methods, 5th/25 groups). Those results were obtained from the average prediction over the top10 poses, irrespective of their similarity/correctness, underscoring the robustness of our simple predictor. This results in an enrichment factor of 2.5 compared to a random predictor for ranking ligands within the top 25%, making it a promising approach to identify lead compounds in virtual screening.  相似文献   

18.
To understand the activity and cross reactivity of ligands and G protein-coupled receptors, we take stock of relevant existing receptor mutation, sequence, and structural data to develop a statistically robust and transparent scoring system. Our method evaluates the viability of binding of any ligand for any GPCR sequence of amino acids. This enabled us to explore the binding repertoire of both receptors and ligands, relying solely on correlations between carefully identified receptor features and without requiring any chemical information about ligands. This study suggests that sequence similarity at specific binding pockets can predict relative affinity of ligands; enabling recovery of over 80% of known ligands for a withheld receptor and almost 80% of known receptors for a ligand. The method enables qualitative prediction of ligand binding for all nonredundant human G protein-coupled receptors.  相似文献   

19.
A central problem in de novo drug design is determining the binding affinity of a ligand with a receptor. A new scoring algorithm is presented that estimates the binding affinity of a protein-ligand complex given a three-dimensional structure. The method, LISA (Ligand Identification Scoring Algorithm), uses an empirical scoring function to describe the binding free energy. Interaction terms have been designed to account for van der Waals (VDW) contacts, hydrogen bonding, desolvation effects, and metal chelation to model the dissociation equilibrium constants using a linear model. Atom types have been introduced to differentiate the parameters for VDW, H-bonding interactions, and metal chelation between different atom pairs. A training set of 492 protein-ligand complexes was selected for the fitting process. Different test sets have been examined to evaluate its ability to predict experimentally measured binding affinities. By comparing with other well-known scoring functions, the results show that LISA has advantages over many existing scoring functions in simulating protein-ligand binding affinity, especially metalloprotein-ligand binding affinity. Artificial Neural Network (ANN) was also used in order to demonstrate that the energy terms in LISA are well designed and do not require extra cross terms.  相似文献   

20.
We present a novel scoring function for docking of small molecules to protein binding sites. The scoring function is based on a combination of two main approaches used in the field, the empirical and knowledge-based approaches. To calibrate the scoring function we used an iterative procedure in which a ligand's position and its score were determined self-consistently at each iteration. The scoring function demonstrated superiority in prediction of ligand positions in docking tests against the commonly used Dock, FlexX and Gold docking programs. It also demonstrated good accuracy of binding affinity prediction for the docked ligands.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号