首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Drug discovery research often relies on the use of virtual screening via molecular docking to identify active hits in compound libraries. An area for improvement among many state-of-the-art docking methods is the accuracy of the scoring functions used to differentiate active from nonactive ligands. Many contemporary scoring functions are influenced by the physical properties of the docked molecule. This bias can cause molecules with certain physical properties to incorrectly score better than others. Since variation in physical properties is inevitable in large screening libraries, it is desirable to account for this bias. In this paper, we present a method of normalizing docking scores using virtually generated decoy sets with matched physical properties. First, our method generates a set of property-matched decoys for every molecule in the screening library. Each library molecule and its decoy set are docked using a state-of-the-art method, producing a set of raw docking scores. Next, the raw docking score of each library molecule is normalized against the scores of its decoys. The normalized score represents the probability that the raw docking score was drawn from the background distribution of nonactive property-matched decoys. Assuming that the distribution of scores of active molecules differs from the nonactive score distribution, we expect that the score of an active compound will have a low probability of having been drawn from the nonactive score distribution. In addition to the use of decoys in normalizing docking scores, we suggest that decoy sets may be a useful tool to evaluate, improve, or develop scoring functions. We show that by analyzing docking scores of library molecules with respect to the docking scores of their virtually generated property-matched decoys, one can gain insight into the advantages, limitations, and reliability of scoring functions.  相似文献   

2.
3.
We developed a new method to improve the accuracy of molecular interaction data using a molecular interaction matrix. This method was applied to enhance the database enrichment of in silico drug screening and in silico target protein screening using a protein-compound affinity matrix calculated by a protein-compound docking software. Our assumption was that the protein-compound binding free energy of a compound could be improved by a linear combination of its docking scores with many different proteins. We proposed two approaches to determine the coefficients of the linear combination. The first approach is based on similarity among the proteins, and the second is a machine-learning approach based on the known active compounds. These methods were applied to in silico screening of the active compounds of several target proteins and in silico target protein screening.  相似文献   

4.
Virtual screening benchmarking studies were carried out on 11 targets to evaluate the performance of three commonly used approaches: 2D ligand similarity (Daylight, TOPOSIM), 3D ligand similarity (SQW, ROCS), and protein structure-based docking (FLOG, FRED, Glide). Active and decoy compound sets were assembled from both the MDDR and the Merck compound databases. Averaged over multiple targets, ligand-based methods outperformed docking algorithms. This was true for 3D ligand-based methods only when chemical typing was included. Using mean enrichment factor as a performance metric, Glide appears to be the best docking method among the three with FRED a close second. Results for all virtual screening methods are database dependent and can vary greatly for particular targets.  相似文献   

5.
Due to the large number of different docking programs and scoring functions available, researchers are faced with the problem of selecting the most suitable one when starting a structure-based drug discovery project. To guide the decision process, several studies comparing different docking and scoring approaches have been published. In the context of comparing scoring function performance, it is common practice to use a predefined, computer-generated set of ligand poses (decoys) and to reevaluate their score using the set of scoring functions to be compared. But are predefined decoy sets able to unambiguously evaluate and rank different scoring functions with respect to pose prediction performance? This question arose when the pose prediction performance of our piecewise linear potential derived scoring functions (Korb et al. in J Chem Inf Model 49:84–96, 2009) was assessed on a standard decoy set (Cheng et al. in J Chem Inf Model 49:1079–1093, 2009). While they showed excellent pose identification performance when they were used for rescoring of the predefined decoy conformations, a pronounced degradation in performance could be observed when they were directly applied in docking calculations using the same test set. This implies that on a discrete set of ligand poses only the rescoring performance can be evaluated. For comparing the pose prediction performance in a more rigorous manner, the search space of each scoring function has to be sampled extensively as done in the docking calculations performed here. We were able to identify relative strengths and weaknesses of three scoring functions (ChemPLP, GoldScore, and Astex Statistical Potential) by analyzing the performance for subsets of the complexes grouped by different properties of the active site. However, reasons for the overall poor performance of all three functions on this test set compared to other test sets of similar size could not be identified.  相似文献   

6.
The performance of all four GOLD scoring functions has been evaluated for pose prediction and virtual screening under the standardized conditions of the comparative docking and scoring experiment reported in this Edition. Excellent pose prediction and good virtual screening performance was demonstrated using unmodified protein models and default parameter settings. The best performing scoring function for both pose prediction and virtual screening was demonstrated to be the recently introduced scoring function ChemPLP. We conclude that existing docking programs already perform close to optimally in the cognate pose prediction experiments currently carried out and that more stringent pose prediction tests should be used in the future. These should employ cross-docking sets. Evaluation of virtual screening performance remains problematic and much remains to be done to improve the usefulness of publically available active and decoy sets for virtual screening. Finally we suggest that, for certain target/scoring function combinations, good enrichment may sometimes be a consequence of 2D property recognition rather than a modelling of the correct 3D interactions.  相似文献   

7.
Benchmarks for molecular docking have historically focused on re-docking the cognate ligand of a well-determined protein-ligand complex to measure geometric pose prediction accuracy, and measurement of virtual screening performance has been focused on increasingly large and diverse sets of target protein structures, cognate ligands, and various types of decoy sets. Here, pose prediction is reported on the Astex Diverse set of 85 protein ligand complexes, and virtual screening performance is reported on the DUD set of 40 protein targets. In both cases, prepared structures of targets and ligands were provided by symposium organizers. The re-prepared data sets yielded results not significantly different than previous reports of Surflex-Dock on the two benchmarks. Minor changes to protein coordinates resulting from complex pre-optimization had large effects on observed performance, highlighting the limitations of cognate ligand re-docking for pose prediction assessment. Docking protocols developed for cross-docking, which address protein flexibility and produce discrete families of predicted poses, produced substantially better performance for pose prediction. Performance on virtual screening performance was shown to benefit by employing and combining multiple screening methods: docking, 2D molecular similarity, and 3D molecular similarity. In addition, use of multiple protein conformations significantly improved screening enrichment.  相似文献   

8.
We developed a new protocol for in silico drug screening for G-protein-coupled receptors (GPCRs) using a set of "universal active probes" (UAPs) with an ensemble docking procedure. UAPs are drug-like compounds, which are actual active compounds of a variety of known proteins. The current targets were nine human GPCRs whose three-dimensional (3D) structures are unknown, plus three GPCRs, namely β(2)-adrenergic receptor (ADRB2), A(2A) adenosine receptor (A(2A)), and dopamine D3 receptor (D(3)), whose 3D structures are known. Homology-based models of the GPCRs were constructed based on the crystal structures with careful sequence inspection. After subsequent molecular dynamics (MD) simulation taking into account the explicit lipid membrane molecules with periodic boundary conditions, we obtained multiple model structures of the GPCRs. For each target structure, docking-screening calculations were carried out via the ensemble docking procedure, using both true active compounds of the target proteins and the UAPs with the multiple target screening (MTS) method. Consequently, the multiple model structures showed various screening results with both poor and high hit ratios, the latter of which could be identified as promising for use in in silico screening to find candidate compounds to interact with the proteins. We found that the hit ratio of true active compounds showed a positive correlation to that of the UAPs. Thus, we could retrieve appropriate target structures from the GPCR models by applying the UAPs, even if no active compound is known for the GPCRs. Namely, the screening result that showed a high hit ratio for the UAPs could be used to identify actual hit compounds for the target GPCRs.  相似文献   

9.
We compiled a G protein-coupled receptor (GPCR) ligand library (GLL) for 147 targets, selecting for each ligand 39 decoy molecules, collected in the GPCR Decoy Database (GDD). Decoys were chosen ensuring a ligand-decoy similarity of six physical properties, while enforcing ligand-decoy chemical dissimilarity. The performance in docking of the GDD was evaluated on 19 GPCRs, showing a marked decrease in enrichment compared to bias-uncorrected decoy sets. Both the GLL and GDD are freely available for the scientific community.  相似文献   

10.
A possible way of tackling the molecular docking problem arising in computer- aided drug design is the use of the incremental construction method. This method consists of three steps: the selection of a part of a molecule, a so- called base fragment, the placement of the base fragment into the active site of a protein, and the subsequent reconstruction of the complete drug molecule. Assuming that a part of a drug molecule is known, which is specific enough to be a good base fragment, the method is proven to be successful for a large set of docking examples. In addition, it leads to the fastest algorithms for flexible docking published so far. In most real-world applications of docking, large sets of ligands have to be tested for affinity to a given protein. Thus, manual selection of a base fragment is not practical. On the other hand, the selection of a base fragment is critical in that only few selections lead to a low-energy structure. We overcome this limitation by selecting a representative set of base fragments instead of a single one. In this paper, we present a set of rules and algorithms to automate this selection. In addition, we extend the incremental construction method to deal with multiple fragmentations of the drug molecule. Our results show that with multiple automated base selection, the quality of the docking predictions is almost as good as with one manually preselected base fragment. In addition, the set of solutions is more diverse and alternative binding modes with low scores are found. Although the run time of the overall algorithm increases, the method remains fast enough to search through large ligand data sets.  相似文献   

11.
We present a method called MOPED for optimizing energetic and structural parameters in computational models, including all-atom energy functions, when native structures and decoys are given. The present method goes beyond previous approaches in treating energy functions that are nonlinear in the parameters and continuous in the degrees of freedom. We illustrate the method by improving solvation parameters in the energy function EEF1, which consists of the CHARMM19 polar hydrogen force field augmented by a Gaussian solvation term. Although the published parameters for EEF1 correctly discriminate the native from decoys in the decoy sets of Levitt et al., they fail on several of the more difficult decoy sets of Baker et al. MOPED successfully finds improved parameters that allow EEF1 to discriminate native from decoy structures on all protein structures that do not have metals or prosthetic groups.  相似文献   

12.
Empirical scoring functions used in protein-ligand docking calculations are typically trained on a dataset of complexes with known affinities with the aim of generalizing across different docking applications. We report a novel method of scoring-function optimization that supports the use of additional information to constrain scoring function parameters, which can be used to focus a scoring function’s training towards a particular application, such as screening enrichment. The approach combines multiple instance learning, positive data in the form of ligands of protein binding sites of known and unknown affinity and binding geometry, and negative (decoy) data of ligands thought not to bind particular protein binding sites or known not to bind in particular geometries. Performance of the method for the Surflex-Dock scoring function is shown in cross-validation studies and in eight blind test cases. Tuned functions optimized with a sufficient amount of data exhibited either improved or undiminished screening performance relative to the original function across all eight complexes. Analysis of the changes to the scoring function suggest that modifications can be learned that are related to protein-specific features such as active-site mobility.  相似文献   

13.
Summary Structure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast structure-based screening method, which utilizes docking of a limited number of compounds to build a 2D QSAR model used to rapidly score the rest of the database. We compare here a model based on radial basis functions and a Bayesian categorization model. The number of compounds that need to be actually docked depends on the number of docking hits found. In our case studies reasonable quality models are built after docking of the number of molecules containing 50 docking hits. The rest of the library is screened by the QSAR model. Optionally a fraction of the QSAR-prioritized library can be docked in order to find the true docking hits. The quality of the model only depends on the training set size – not on the size of the library to be screened. Therefore, for larger libraries the method yields higher gain in speed no change in performance. Prioritizing a large library with these models provides a significant enrichment with docking hits: it attains the values of 13 and 35 at the beginning of the score-sorted libraries in our two case studies: screening of the NCI collection and a combinatorial libraries on CDK2 kinase structure. With such enrichments, only a fraction of the database must actually be docked to find many of the true hits. The throughput of the method allows its use in screening of large compound collections and in the design of large combinatorial libraries. The strategy proposed has an important effect on efficiency but does not affect retrieval of actives, the latter being determined by the quality of the docking method itself. Electronic supplementary material is available at http://dx.doi.org/10.1007/s10822-005-9002-6.  相似文献   

14.
In the current pandemic, finding an effective drug to prevent or treat the infection is the highest priority. A rapid and safe approach to counteract COVID-19 is in silico drug repurposing. The SARS-CoV-2 PLpro promotes viral replication and modulates the host immune system, resulting in inhibition of the host antiviral innate immune response, and therefore is an attractive drug target. In this study, we used a combined in silico virtual screening for candidates for SARS-CoV-2 PLpro protease inhibitors. We used the Informational spectrum method applied for Small Molecules for searching the Drugbank database followed by molecular docking. After in silico screening of drug space, we identified 44 drugs as potential SARS-CoV-2 PLpro inhibitors that we propose for further experimental testing.  相似文献   

15.
The ability to accurately predict biological affinity on the basis of in silico docking to a protein target remains a challenging goal in the CADD arena. Typically, "standard" scoring functions have been employed that use the calculated docking result and a set of empirical parameters to calculate a predicted binding affinity. To improve on this, we are exploring novel strategies for rapidly developing and tuning "customized" scoring functions tailored to a specific need. In the present work, three such customized scoring functions were developed using a set of 129 high-resolution protein-ligand crystal structures with measured Ki values. The functions were parametrized using N-PLS (N-way partial least squares), a multivariate technique well-known in the 3D quantitative structure-activity relationship field. A modest correlation between observed and calculated pKi values using a standard scoring function (r2 = 0.5) could be improved to 0.8 when a customized scoring function was applied. To mimic a more realistic scenario, a second scoring function was developed, not based on crystal structures but exclusively on several binding poses generated with the Flo+ docking program. Finally, a validation study was conducted by generating a third scoring function with 99 randomly selected complexes from the 129 as a training set and predicting pKi values for a test set that comprised the remaining 30 complexes. Training and test set r2 values were 0.77 and 0.78, respectively. These results indicate that, even without direct structural information, predictive customized scoring functions can be developed using N-PLS, and this approach holds significant potential as a general procedure for predicting binding affinity on the basis of in silico docking.  相似文献   

16.
Docking scoring functions are notoriously weak predictors of binding affinity. They typically assign a common set of weights to the individual energy terms that contribute to the overall energy score; however, these weights should be gene family dependent. In addition, they incorrectly assume that individual interactions contribute toward the total binding affinity in an additive manner. In reality, noncovalent interactions often depend on one another in a nonlinear manner. In this paper, we show how the use of support vector machines (SVMs), trained by associating sets of individual energy terms retrieved from molecular docking with the known binding affinity of each compound from high-throughput screening experiments, can be used to improve the correlation between known binding affinities and those predicted by the docking program eHiTS. We construct two prediction models: a regression model trained using IC(50) values from BindingDB, and a classification model trained using active and decoy compounds from the Directory of Useful Decoys (DUD). Moreover, to address the issue of overrepresentation of negative data in high-throughput screening data sets, we have designed a multiple-planar SVM training procedure for the classification model. The increased performance that both SVMs give when compared with the original eHiTS scoring function highlights the potential for using nonlinear methods when deriving overall energy scores from their individual components. We apply the above methodology to train a new scoring function for direct inhibitors of Mycobacterium tuberculosis (M.tb) InhA. By combining ligand binding site comparison with the new scoring function, we propose that phosphodiesterase inhibitors can potentially be repurposed to target M.tb InhA. Our methodology may be applied to other gene families for which target structures and activity data are available, as demonstrated in the work presented here.  相似文献   

17.
Performance of small molecule automated docking programs has conceptually been divided into docking -, scoring -, ranking - and screening power, which focuses on the crystal pose prediction, affinity prediction, ligand ranking and database screening capabilities of the docking program, respectively. Benchmarks show that different docking programs can excel in individual benchmarks which suggests that the scoring function employed by the programs can be optimized for a particular task. Here the scoring function of Smina is re-optimized towards enhancing the docking power using a supervised machine learning approach and a manually curated database of ligands and cross docking receptor pairs. The optimization method does not need associated binding data for the receptor-ligand examples used in the data set and works with small train sets. The re-optimization of the weights for the scoring function results in a similar docking performance with regard to docking power towards a cross docking test set. A ligand decoy based benchmark indicates a better discrimination between poses with high and low RMSD. The reported parameters for Smina are compatible with Autodock Vina and represent ready-to-use alternative parameters for researchers who aim at pose prediction rather than affinity prediction.  相似文献   

18.
Structure-based virtual screening (SBVS) utilizing docking algorithms has become an essential tool in the drug discovery process, and significant progress has been made in successfully applying the technique to a wide range of receptor targets. In silico validation of virtual screening protocols before application to a receptor target using a corporate or commercially available compound collection is key to establishing a successful process. Ultimately, retrieval of a set of active compounds from a database of inactives is required, and the metric of enrichment (E) is habitually used to discern the quality of separation of the two. Numerous reports have addressed the performance of docking algorithms with regard to the quality of binding mode prediction and the issue of postprocessing "hit lists" of docked ligands. However, the impact of ligand database preprocessing has yet to be examined in the context of virtual screening and prioritization of compounds for biological evaluation. We provide an insight into the implications of cheminformatic preprocessing of a validation database of compounds where multiple protonated, tautomeric, stereochemical, and conformational states have been enumerated. Several commonly used methods for the generation of ligand conformations and conformational ensembles are examined, paired with an exhaustive rigid-body algorithm for the docking of different "multimeric" compound representations to the ligand binding site of the human estrogen receptor alpha. Chemgauss, a shapegaussian scoring function with intrinsic chemical knowledge, was combined with PLP as a consensus-scoring scheme to rank output from the docking protocol and enrichment rates calculated for each screen. The overheads of CPU consumption and the effect on relative database size (disk requirement) for each of the protocols employed are considered. Assessment of these parameters indicates that SBVS enrichments are highly dependent on the initial cheminformatic treatment(s) used in database construction. The interplay of SMILES representations, stereochemical information, protonation state enumeration, and ligand conformation ensembles are critical in achieving optimum enrichment rates in such screening.  相似文献   

19.
We present ElectroShape, a novel ligand-based virtual screening method, that combines shape and electrostatic information into a single, unified framework. Building on the ultra-fast shape recognition (USR) approach for fast non-superpositional shape-based virtual screening, it extends the method by representing partial charge information as a fourth dimension. It also incorporates the chiral shape recognition (CSR) method, which distinguishes enantiomers. It has been validated using release 2 of the Directory of useful decoys (DUD), and shows a near doubling in enrichment ratio at 1% over USR and CSR, and improvements as measured by Receiver Operating Characteristic curves. These improvements persisted even after taking into account the chemotype redundancy in the sets of active ligands in DUD. During the course of its development, ElectroShape revealed a difference in the charge allocation of the DUD ligand and decoy sets, leading to several new versions of DUD being generated as a result. ElectroShape provides a significant addition to the family of ultra-fast ligand-based virtual screening methods, and its higher-dimensional shape recognition approach has great potential for extension and generalisation.  相似文献   

20.
The success of ligand docking calculations typically depends on the quality of the receptor structure. Given improvements in protein structure prediction approaches, approximate protein models now can be routinely obtained for the majority of gene products in a given proteome. Structure‐based virtual screening of large combinatorial libraries of lead candidates against theoretically modeled receptor structures requires fast and reliable docking techniques capable of dealing with structural inaccuracies in protein models. Here, we present Q‐DockLHM, a method for low‐resolution refinement of binding poses provided by FINDSITELHM, a ligand homology modeling approach. We compare its performance to that of classical ligand docking approaches in ligand docking against a representative set of experimental (both holo and apo) as well as theoretically modeled receptor structures. Docking benchmarks reveal that unlike all‐atom docking, Q‐DockLHM exhibits the desired tolerance to the receptor's structure deformation. Our results suggest that the use of an evolution‐based approach to ligand homology modeling followed by fast low‐resolution refinement is capable of achieving satisfactory performance in ligand‐binding pose prediction with promising applicability to proteome‐scale applications. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号