首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds.  相似文献   

2.
Recent explosive growth of ‘make-on-demand’ chemical libraries brought unprecedented opportunities but also significant challenges to the field of computer-aided drug discovery. To address this expansion of the accessible chemical universe, molecular docking needs to accurately rank billions of chemical structures, calling for the development of automated hit-selecting protocols to minimize human intervention and error. Herein, we report the development of an artificial intelligence-driven virtual screening pipeline that utilizes Deep Docking with Autodock GPU, Glide SP, FRED, ICM and QuickVina2 programs to screen 40 billion molecules against SARS-CoV-2 main protease (Mpro). This campaign returned a significant number of experimentally confirmed inhibitors of Mpro enzyme, and also enabled to benchmark the performance of twenty-eight hit-selecting strategies of various degrees of stringency and automation. These findings provide new starting scaffolds for hit-to-lead optimization campaigns against Mpro and encourage the development of fully automated end-to-end drug discovery protocols integrating machine learning and human expertise.

Deep learning-accelerated docking coupled with computational hit selection strategies enable the identification of inhibitors for the SARS-CoV-2 main protease from a chemical library of 40 billion small molecules.  相似文献   

3.
We present an end-to-end computational system for autonomous materials discovery. The system aims for cost-effective optimization in large, high-dimensional search spaces of materials by adopting a sequential, agent-based approach to deciding which experiments to carry out. In choosing next experiments, agents can make use of past knowledge, surrogate models, logic, thermodynamic or other physical constructs, heuristic rules, and different exploration–exploitation strategies. We show a series of examples for (i) how the discovery campaigns for finding materials satisfying a relative stability objective can be simulated to design new agents, and (ii) how those agents can be deployed in real discovery campaigns to control experiments run externally, such as the cloud-based density functional theory simulations in this work. In a sample set of 16 campaigns covering a range of binary and ternary chemistries including metal oxides, phosphides, sulfides and alloys, this autonomous platform found 383 new stable or nearly stable materials with no intervention by the researchers.

We present an end-to-end computational system for autonomous materials discovery.  相似文献   

4.
Intrinsically disordered proteins or intrinsically disordered regions (IDPs) have gained much attention in recent years due to their vital roles in biology and prevalence in various human diseases. Although IDPs are perceived as attractive therapeutic targets, rational drug design targeting IDPs remains challenging because of their conformational heterogeneity. Here, we propose a hierarchical computational strategy for IDP drug virtual screening (IDPDVS) and applied it in the discovery of p53 transactivation domain I (TAD1) binding compounds. IDPDVS starts from conformation sampling of the IDP target, then it combines stepwise conformational clustering with druggability evaluation to identify potential ligand binding pockets, followed by multiple docking screening runs and selection of compounds that can bind multi-conformations. p53 is an important tumor suppressor and restoration of its function provides an opportunity to inhibit cancer cell growth. TAD1 locates at the N-terminus of p53 and plays key roles in regulating p53 function. No compounds that directly bind to TAD1 have been reported due to its highly disordered structure. We successfully used IDPDVS to identify two compounds that bind p53 TAD1 and restore wild-type p53 function in cancer cells. Our study demonstrates that IDPDVS is an efficient strategy for IDP drug discovery and p53 TAD1 can be directly targeted by small molecules.

A hierarchical computational strategy for IDP drug virtual screening (IDPDVS) was proposed and successfully applied to identify compounds that bind p53 TAD1 and restore wild-type p53 function in cancer cells.  相似文献   

5.
DNA-encoded library technology (DELT) employs DNA as a barcode to track the sequence of chemical reactions and enables the design and synthesis of libraries with billions of small molecules through combinatorial expansion. This powerful technology platform has been successfully demonstrated for hit identification and target validation for many types of diseases. As a highly integrated technology platform, DEL is capable of accelerating the translation of synthetic chemistry by using on-DNA compatible reactions or off-DNA scaffold synthesis. Herein, we report the development of a series of novel on-DNA transformations based on oxindole scaffolds for the design and synthesis of diversity-oriented DNA-encoded libraries for screening. Specifically, we have developed 1,3-dipolar cyclizations, cyclopropanations, ring-opening of reactions of aziridines and Claisen–Schmidt condensations to construct diverse oxindole derivatives. The majority of these transformations enable a diversity-oriented synthesis of DNA-encoded oxindole libraries which have been used in the successful hit identification for three protein targets. We have demonstrated that a diversified strategy for DEL synthesis could accelerate the application of synthetic chemistry for drug discovery.

Constructing DNA-encoded oxindole libraries by a diversified strategy.  相似文献   

6.
Macrocycles provide an attractive modality for drug development, but generating ligands for new targets is hampered by the limited availability of large macrocycle libraries. We have established a solution-phase macrocycle synthesis strategy in which three building blocks are coupled sequentially in efficient alkylation reactions that eliminate the need for product purification. We demonstrate the power of the approach by combinatorially reacting 15 bromoacetamide-activated tripeptides, 42 amines, and 6 bis-electrophile cyclization linkers to generate a 3780-compound library with minimal effort. Screening against thrombin yielded a potent and selective inhibitor (Ki = 4.2 ± 0.8 nM) that efficiently blocked blood coagulation in human plasma. Structure–activity relationship and X-ray crystallography analysis revealed that two of the three building blocks acted synergistically and underscored the importance of combinatorial screening in macrocycle development. The three-component library synthesis approach is general and offers a promising avenue to generate macrocycle ligands to other targets.

Combination of three efficient chemical reactions allows for solution-phase synthesis of 3780 macrocycles and identification of potent thrombin inhibitor.  相似文献   

7.
Organic synthesis underpins the evolution of weak fragment hits into potent lead compounds. Deficiencies within current screening collections often result in the requirement of significant synthetic investment to enable multidirectional fragment growth, limiting the efficiency of the hit evolution process. Diversity-oriented synthesis (DOS)-derived fragment libraries are constructed in an efficient and modular fashion and thus are well-suited to address this challenge. To demonstrate the effective nature of such libraries within fragment-based drug discovery, we herein describe the screening of a 40-member DOS library against three functionally distinct biological targets using X-Ray crystallography. Firstly, we demonstrate the importance for diversity in aiding hit identification with four fragment binders resulting from these efforts. Moreover, we also exemplify the ability to readily access a library of analogues from cheap commercially available materials, which ultimately enabled the exploration of a minimum of four synthetic vectors from each molecule. In total, 10–14 analogues of each hit were rapidly accessed in three to six synthetic steps. Thus, we showcase how DOS-derived fragment libraries enable efficient hit derivatisation and can be utilised to remove the synthetic limitations encountered in early stage fragment-based drug discovery.

Fragment-based screening of a shape-diverse collection yielded four hits against three proteins. Up to 14 analogues of each hit were rapidly generated, enabling four fragment growth vectors to be explored using inexpensive materials and reliable synthetic transformations.  相似文献   

8.
Developing high-performance advanced materials requires a deeper insight and search into the chemical space. Until recently, exploration of materials space using chemical intuitions built upon existing materials has been the general strategy, but this direct design approach is often time and resource consuming and poses a significant bottleneck to solve the materials challenges of future sustainability in a timely manner. To accelerate this conventional design process, inverse design, which outputs materials with pre-defined target properties, has emerged as a significant materials informatics platform in recent years by leveraging hidden knowledge obtained from materials data. Here, we summarize the latest progress in machine-enabled inverse materials design categorized into three strategies: high-throughput virtual screening, global optimization, and generative models. We analyze challenges for each approach and discuss gaps to be bridged for further accelerated and rational data-driven materials design.

The grand challenge of materials science, discovery of novel materials with target properties, can be greatly accelerated by machine-learned inverse design strategies.  相似文献   

9.
Machine learning has been increasingly applied to the field of computer-aided drug discovery in recent years, leading to notable advances in binding-affinity prediction, virtual screening, and QSAR. Surprisingly, it is less often applied to lead optimization, the process of identifying chemical fragments that might be added to a known ligand to improve its binding affinity. We here describe a deep convolutional neural network that predicts appropriate fragments given the structure of a receptor/ligand complex. In an independent benchmark of known ligands with missing (deleted) fragments, our DeepFrag model selected the known (correct) fragment from a set over 6500 about 58% of the time. Even when the known/correct fragment was not selected, the top fragment was often chemically similar and may well represent a valid substitution. We release our trained DeepFrag model and associated software under the terms of the Apache License, Version 2.0.

DeepFrag is a machine-learning model designed to assist with lead optimization. It recommends appropriate fragment additions given the 3D structures of a protein receptor and bound small-molecule ligand.  相似文献   

10.
As part of the SAMPL4 blind challenge, filtered AutoDock Vina ligand docking predictions and large scale binding energy distribution analysis method binding free energy calculations have been applied to the virtual screening of a focused library of candidate binders to the LEDGF site of the HIV integrase protein. The computational protocol leveraged docking and high level atomistic models to improve enrichment. The enrichment factor of our blind predictions ranked best among all of the computational submissions, and second best overall. This work represents to our knowledge the first example of the application of an all-atom physics-based binding free energy model to large scale virtual screening. A total of 285 parallel Hamiltonian replica exchange molecular dynamics absolute protein-ligand binding free energy simulations were conducted starting from docked poses. The setup of the simulations was fully automated, calculations were distributed on multiple computing resources and were completed in a 6-weeks period. The accuracy of the docked poses and the inclusion of intramolecular strain and entropic losses in the binding free energy estimates were the major factors behind the success of the method. Lack of sufficient time and computing resources to investigate additional protonation states of the ligands was a major cause of mispredictions. The experiment demonstrated the applicability of binding free energy modeling to improve hit rates in challenging virtual screening of focused ligand libraries during lead optimization.  相似文献   

11.
Proteins interact with small molecules through specific molecular recognition, which is central to essential biological functions in living systems. Therefore, understanding such interactions is crucial for basic sciences and drug discovery. Here, we present S tructure t emplate-based a b initio li gand design s olution (Stalis), a knowledge-based approach that uses structure templates from the Protein Data Bank libraries of whole ligands and their fragments and generates a set of molecules (virtual ligands) whose structures represent the pocket shape and chemical features of a given target binding site. Our benchmark performance evaluation shows that ligand structure-based virtual screening using virtual ligands from Stalis outperforms a receptor structure-based virtual screening using AutoDock Vina, demonstrating reliable overall screening performance applicable to computational high-throughput screening. However, virtual ligands from Stalis are worse in recognizing active compounds at the small fraction of a rank-ordered list of screened library compounds than crystal ligands, due to the low resolution of the virtual ligand structures. In conclusion, Stalis can facilitate drug discovery research by designing virtual ligands that can be used for fast ligand structure-based virtual screening. Moreover, Stalis provides actual three-dimensional ligand structures that likely bind to a target protein, enabling to gain structural insight into potential ligands. Stalis can be an efficient computational platform for high-throughput ligand design for fundamental biological study and drug discovery research at the proteomic level. © 2019 Wiley Periodicals, Inc.  相似文献   

12.
Non-specific chemical modification of protein thiol groups continues to be a significant source of false positive hits from high-throughput screening campaigns and can even plague certain protein targets and chemical series well into lead optimization. While experimental tools exist to assess the risk and promiscuity associated with the chemical reactivity of existing compounds, computational tools are desired that can reliably identify substructures that are associated with chemical reactivity to aid in triage of HTS hit lists, external compound purchases, and library design. Here we describe a Bayesian classification model derived from more than 8,800 compounds that have been experimentally assessed for their potential to covalently modify protein targets. The resulting model can be implemented in the large-scale assessment of compound libraries for purchase or design. In addition, the individual substructures identified as highly reactive in the model can be used as look-up tables to guide chemists during hit-to-lead and lead optimization campaigns.  相似文献   

13.
Structure‐based virtual screening usually involves docking of a library of chemical compounds onto the functional pocket of the target receptor so as to discover novel classes of ligands. However, the overall success rate remains low and screening a large library is computationally intensive. An alternative to this “ab initio” approach is virtual screening by binding homology search. In this approach, potential ligands are predicted based on similar interaction pairs (similarity in receptors and ligands). SPOT‐Ligand is an approach that integrates ligand similarity by Tanimoto coefficient and receptor similarity by protein structure alignment program SPalign. The method was found to yield a consistent performance in DUD and DUD‐E docking benchmarks even if model structures were employed. It improves over docking methods (DOCK6 and AUTODOCK Vina) and has a performance comparable to or better than other binding‐homology methods (FINDsite and PoLi) with higher computational efficiency. The server is available at http://sparks-lab.org . © 2016 Wiley Periodicals, Inc.  相似文献   

14.
15.
Summary Structure-based screening using fully flexible docking is still too slow for large molecular libraries. High quality docking of a million molecule library can take days even on a cluster with hundreds of CPUs. This performance issue prohibits the use of fully flexible docking in the design of large combinatorial libraries. We have developed a fast structure-based screening method, which utilizes docking of a limited number of compounds to build a 2D QSAR model used to rapidly score the rest of the database. We compare here a model based on radial basis functions and a Bayesian categorization model. The number of compounds that need to be actually docked depends on the number of docking hits found. In our case studies reasonable quality models are built after docking of the number of molecules containing 50 docking hits. The rest of the library is screened by the QSAR model. Optionally a fraction of the QSAR-prioritized library can be docked in order to find the true docking hits. The quality of the model only depends on the training set size – not on the size of the library to be screened. Therefore, for larger libraries the method yields higher gain in speed no change in performance. Prioritizing a large library with these models provides a significant enrichment with docking hits: it attains the values of 13 and 35 at the beginning of the score-sorted libraries in our two case studies: screening of the NCI collection and a combinatorial libraries on CDK2 kinase structure. With such enrichments, only a fraction of the database must actually be docked to find many of the true hits. The throughput of the method allows its use in screening of large compound collections and in the design of large combinatorial libraries. The strategy proposed has an important effect on efficiency but does not affect retrieval of actives, the latter being determined by the quality of the docking method itself. Electronic supplementary material is available at http://dx.doi.org/10.1007/s10822-005-9002-6.  相似文献   

16.
Accurate and efficient calculations of absorption spectra of molecules and materials are essential for the understanding and rational design of broad classes of systems. Solving the Bethe–Salpeter equation (BSE) for electron–hole pairs usually yields accurate predictions of absorption spectra, but it is computationally expensive, especially if thermal averages of spectra computed for multiple configurations are required. We present a method based on machine learning to evaluate a key quantity entering the definition of absorption spectra: the dielectric screening. We show that our approach yields a model for the screening that is transferable between multiple configurations sampled during first principles molecular dynamics simulations; hence it leads to a substantial improvement in the efficiency of calculations of finite temperature spectra. We obtained computational gains of one to two orders of magnitude for systems with 50 to 500 atoms, including liquids, solids, nanostructures, and solid/liquid interfaces. Importantly, the models of dielectric screening derived here may be used not only in the solution of the BSE but also in developing functionals for time-dependent density functional theory (TDDFT) calculations of homogeneous and heterogeneous systems. Overall, our work provides a strategy to combine machine learning with electronic structure calculations to accelerate first principles simulations of excited-state properties.

Machine learning can circumvent explicit calculation of dielectric response in first principles methods and accelerate simulations of optical properties of complex materials at finite temperature.  相似文献   

17.
Predicting relative protein–ligand binding affinities is a central pillar of lead optimization efforts in structure-based drug design. The site identification by ligand competitive saturation (SILCS) methodology is based on functional group affinity patterns in the form of free energy maps that may be used to compute protein–ligand binding poses and affinities. Presented are results obtained from the SILCS methodology for a set of eight target proteins as reported originally in Wang et al. (J. Am. Chem. Soc., 2015, 137, 2695–2703) using free energy perturbation (FEP) methods in conjunction with enhanced sampling and cycle closure corrections. These eight targets have been subsequently studied by many other authors to compare the efficacy of their method while comparing with the outcomes of Wang et al. In this work, we present results for a total of 407 ligands on the eight targets and include specific analysis on the subset of 199 ligands considered previously. Using the SILCS methodology we can achieve an average accuracy of up to 77% and 74% when considering the eight targets with their 199 and 407 ligands, respectively, for rank-ordering ligand affinities as calculated by the percent correct metric. This accuracy increases to 82% and 80%, respectively, when the SILCS atomic free energy contributions are optimized using a Bayesian Markov-chain Monte Carlo approach. We also report other metrics including Pearson''s correlation coefficient, Pearlman''s predictive index, mean unsigned error, and root mean square error for both sets of ligands. The results obtained for the 199 ligands are compared with the outcomes of Wang et al. and other published works. Overall, the SILCS methodology yields similar or better-quality predictions without a priori need for known ligand orientations in terms of the different metrics when compared to current FEP approaches with significant computational savings while additionally offering quantitative estimates of individual atomic contributions to binding free energies. These results further validate the SILCS methodology as an accurate, computationally efficient tool to support lead optimization and drug discovery.

Predicting relative protein–ligand binding affinities is a central pillar of lead optimization efforts in structure-based drug design.  相似文献   

18.
Fragment-based drug discovery (FBDD) is a powerful strategy for the identification of new bioactive molecules. FBDD relies on fragment libraries, generally of modest size, but of high chemical diversity. Although good chemical diversity in FBDD libraries has been achieved in many respects, achieving shape diversity – particularly fragments with three-dimensional (3D) structures – has remained challenging. A recent analysis revealed that >75% of all conventional, organic fragments are predominantly 1D or 2D in shape. However, 3D fragments are desired because molecular shape is one of the most important factors in molecular recognition by a biomolecule. To address this challenge, the use of inert metal complexes, so-called ‘metallofragments’ (mFs), to construct a 3D fragment library is introduced. A modest library of 71 compounds has been prepared with rich shape diversity as gauged by normalized principle moment of inertia (PMI) analysis. PMI analysis shows that these metallofragments occupy an area of fragment space that is unique and highly underrepresented when compared to conventional organic fragment libraries that are comprised of orders of magnitude more molecules. The potential value of this metallofragment library is demonstrated by screening against several different types of proteins, including an antiviral, an antibacterial, and an anticancer target. The suitability of the metallofragments for future hit-to-lead development was validated through the determination of IC50 and thermal shift values for select fragments against several proteins. These findings demonstrate the utility of metallofragment libraries as a means of accessing underutilized 3D fragment space for FBDD against a variety of protein targets.

Fragment-based drug discovery (FBDD) using 3-dimensional metallofragments is a new strategy for the identification of bioactive molecules.  相似文献   

19.
Endolysins are bacteriophage-encoded peptidoglycan hydrolases targeting the cell wall of host bacteria via their cell wall-binding domains (CBDs). The molecular basis for selective recognition of surface carbohydrate ligands by CBDs remains elusive. Here, we describe, in atomic detail, the interaction between the Listeria phage endolysin domain CBD500 and its cell wall teichoic acid (WTA) ligands. We show that 3′O-acetylated GlcNAc residues integrated into the WTA polymer chain are the key epitope recognized by a CBD binding cavity located at the interface of tandem copies of beta-barrel, pseudo-symmetric SH3b-like repeats. This cavity consists of multiple aromatic residues making extensive interactions with two GlcNAc acetyl groups via hydrogen bonds and van der Waals contacts, while permitting the docking of the diastereomorphic ligands. Our multidisciplinary approach tackled an extremely challenging protein–glycopolymer complex and delineated a previously unknown recognition mechanism by which a phage endolysin specifically recognizes and targets WTA, suggesting an adaptable model for regulation of endolysin specificity.

Combining genetic, biochemical and computational approaches, we elucidated the molecular mechanisms underlying the recognition of Listeria wall teichoic acid by bacteriophage-encoded SH3b repeats.  相似文献   

20.
Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis routes to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes at least 4500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity.

The retrosynthetic accessibility score (RAscore) is based on AI driven retrosynthetic planning, and is useful for rapid scoring of synthetic feasability and pre-screening of large datasets of virtual/generated molecules.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号