首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 225 毫秒
1.
In this era of artificial intelligence, we urgently want to optimize the current material design methods to come up with a more efficient and more accurate closed-loop system. The approach requires at least three parts including high-throughput screening, automated synthesis platform, and machine learning algorithms. Fortunately, the techniques mentioned above have been substantial developed. We have introduced the common algorithms of machine learning. Then, several machine learning-based design of carbon-based electrocatalysts are discussed. We tried to illustrate the research norms involving machine learning. Besides, other paper structures and details have been also discussed.  相似文献   

2.
Virtual screening—predicting which compounds within a specified compound library bind to a target molecule, typically a protein—is a fundamental task in the field of drug discovery. Doing virtual screening well provides tangible practical benefits, including reduced drug development costs, faster time to therapeutic viability, and fewer unforeseen side effects. As with most applied computational tasks, the algorithms currently used to perform virtual screening feature inherent tradeoffs between speed and accuracy. Furthermore, even theoretically rigorous, computationally intensive methods may fail to account for important effects relevant to whether a given compound will ultimately be usable as a drug. Here we investigate the virtual screening performance of the recently released Gnina molecular docking software, which uses deep convolutional networks to score protein-ligand structures. We find, on average, that Gnina outperforms conventional empirical scoring. The default scoring in Gnina outperforms the empirical AutoDock Vina scoring function on 89 of the 117 targets of the DUD-E and LIT-PCBA virtual screening benchmarks with a median 1% early enrichment factor that is more than twice that of Vina. However, we also find that issues of bias linger in these sets, even when not used directly to train models, and this bias obfuscates to what extent machine learning models are achieving their performance through a sophisticated interpretation of molecular interactions versus fitting to non-informative simplistic property distributions.  相似文献   

3.
Finding novel lead molecules is one of the primary goals in early phases of drug discovery projects. However, structurally dissimilar compounds may exhibit similar biological activity, and finding new and structurally diverse lead compounds is difficult for computer algorithms. Molecular energy fields are appropriate for finding structurally novel molecules, but they are demanding to calculate and this limits their usefulness in virtual screening of large chemical databases. In our approach, energy fields are computed only once per superposition and a simple interpolation scheme is devised to allow coarse energy field lattices having fewer grid points to be used without any significant loss of accuracy. The resulting processing speed of about 0.25 s per conformation on a 2.4 GHz Intel Pentium processor allows the method to be used for virtual screening on commonly available desktop machines. Moreover, the results indicate that grid-based superposition methods could be efficiently used for the virtual screening of compound libraries.  相似文献   

4.
5.
Domain-aware artificial intelligence has been increasingly adopted in recent years to expedite molecular design in various applications, including drug design and discovery. Recent advances in areas such as physics-informed machine learning and reasoning, software engineering, high-end hardware development, and computing infrastructures are providing opportunities to build scalable and explainable AI molecular discovery systems. This could improve a design hypothesis through feedback analysis, data integration that can provide a basis for the introduction of end-to-end automation for compound discovery and optimization, and enable more intelligent searches of chemical space. Several state-of-the-art ML architectures are predominantly and independently used for predicting the properties of small molecules, their high throughput synthesis, and screening, iteratively identifying and optimizing lead therapeutic candidates. However, such deep learning and ML approaches also raise considerable conceptual, technical, scalability, and end-to-end error quantification challenges, as well as skepticism about the current AI hype to build automated tools. To this end, synergistically and intelligently using these individual components along with robust quantum physics-based molecular representation and data generation tools in a closed-loop holds enormous promise for accelerated therapeutic design to critically analyze the opportunities and challenges for their more widespread application. This article aims to identify the most recent technology and breakthrough achieved by each of the components and discusses how such autonomous AI and ML workflows can be integrated to radically accelerate the protein target or disease model-based probe design that can be iteratively validated experimentally. Taken together, this could significantly reduce the timeline for end-to-end therapeutic discovery and optimization upon the arrival of any novel zoonotic transmission event. Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery.  相似文献   

6.
Drug discovery efforts rely increasingly on the identification of quality lead compounds through high-throughput synthesis and screening. However, large-scale random libraries have yielded only a low number of quality lead molecules. To address this shortcoming researchers have paid more attention to the concept of "drug-likeness" of molecules in combinatorial and screening libraries. Database profiling and analysis methods have been employed to identify the structural features of known drug molecules. Neural networks and machine learning methods help to distinguish between drugs and nondrugs. More recently, database-independent pharmacophore filters have been introduced that provide simple intuitive rules to classify potential drugs.  相似文献   

7.
In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformatics methods. On top of this data, we have built classifiers based on machine learning methods. Data sets with different class distributions lead to the effect that conventional machine learning methods are biased toward the larger class. To overcome this problem and to obtain sensitive but also accurate classifiers we combine machine learning and feature selection methods with techniques addressing the problem of unbalanced classification, such as oversampling and threshold moving. We have used our own implementation of a support vector machine algorithm as well as the maximum entropy method. Our feature selection is based on the unsupervised McCabe method. The classification results from our test set are compared structurally with compounds from the training set. We show that the applied algorithms enable the effective high throughput in silico classification of potential drug candidates.  相似文献   

8.
Chemical structure searching based on databases and machine learning has attracted great attention recently for fast screening materials with target functionalities. To this end, we established a high-performance chemical structure database based on MYSQL engines, named MYDB. More than 160000 metal-organic frameworks (MOFs) have been collected and stored by using new retrieval algorithms for efficient searching and recommendation. The evaluations results show that MYDB could realize fast and efficient keyword searching against millions of records and provide real-time recommendations for similar structures. Combining machine learning method and materials database, we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks toward argon and hydrogen under certain conditions. We expect that MYDB together with the developed machine learning techniques could support large-scale, low-cost, and highly convenient structural research towards accelerating discovery of materials with target functionalities in the field of computational materials research.  相似文献   

9.
Analyses of known protein–ligand interactions play an important role in designing novel and efficient drugs, contributing to drug discovery and development. Recently, machine learning methods have proven useful in the design of novel drugs, which utilize intelligent techniques to predict the outcome of unknown protein–ligand interactions by learning from the physical and geometrical properties of known protein–ligand interactions. The aim of this study is to work through a specific example of a novel computational method, namely compressed images for affinity prediction (CIFAP), in which binding affinities for structurally related ligands in complexes with human checkpoint kinase 1 (CHK1) are predicted. The CIFAP algorithm presented here relates published pIC 50 values of 57 compounds, derived from a thienopyridine pharmacophore, in complexes with CHK1 to their two‐dimensional (2D) electrostatic potential images compressed in orthogonal dimensions. Patterns obtained from the 2D images are then used as inputs in regression and learning algorithms such as support vector regression (SVR) and adaptive neuro‐fuzzy inference system (ANFIS) methods to validate the experimental pIC 50 values. This study revealed that the 2D image pixels in the vicinity of bound ligand surfaces provide more relevant information to make correlations with the empirical pIC 50 values. As compared with ANFIS, SVR gave rise to the lowest root mean square errors and the greatest correlations, suggesting that SVR could be a plausible choice of machine learning methods in predicting binding affinities by CIFAP. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
In silico methods play an essential role in modern drug discovery methods. Virtual screening, an in silico method, is used to filter out the chemical space on which actual wet lab experiments are need to be conducted. Ligand based virtual screening is a computational strategy using which one can build a model of the target protein based on the knowledge of the ligands that bind successfully to the target. This model is then used to predict if the new molecule is likely to bind to the target. Support vector machine, a supervised learning algorithm used for classification, can be utilized for virtual screening the ligand data. When used for virtual screening purpose, SVM could produce interesting results. But since we have a huge ligand data, the time taken for training the SVM model is quite high compared to other learning algorithms. By parallelizing these algorithms on multi-core processors, one can easily expedite these discoveries. In this paper, a GPU based ligand based virtual screening tool (GpuSVMScreen) which uses SVM have been proposed and bench-marked. This data parallel virtual screening tool provides high throughput by running in short time. The proposed GpuSVMScreen can successfully screen large number of molecules (billions) also. The source code of this tool is available at http://ccc.nitc.ac.in/project/GPUSVMSCREEN.  相似文献   

11.
BackgroundIdentification of potential drug-target interaction pairs is very important for pharmaceutical innovation and drug discovery. Numerous machine learning-based and network-based algorithms have been developed for predicting drug-target interactions. However, large-scale pharmacological, genomic and chemical datum emerged recently provide new opportunity for further heightening the accuracy of drug-target interactions prediction.ResultsIn this work, based on the assumption that similar drugs tend to interact with similar proteins and vice versa, we developed a novel computational method (namely MKLC-BiRW) to predict new drug-target interactions. MKLC-BiRW integrates diverse drug-related and target-related heterogeneous information source by using the multiple kernel learning and clustering methods to generate the drug and target similarity matrices, in which the low similarity elements are set to zero to build the drug and target similarity correction networks. By incorporating these drug and target similarity correction networks with known drug-target interaction bipartite graph, MKLC-BiRW constructs the heterogeneous network on which Bi-random walk algorithm is adopted to infer the potential drug-target interactions.ConclusionsCompared with other existing state-of-the-art methods, MKLC-BiRW achieves the best performance in terms of AUC and AUPR. MKLC-BiRW can effectively predict the potential drug-target interactions.  相似文献   

12.
Molecular latent representations, derived from autoencoders (AEs), have been widely used for drug or material discovery over the past couple of years. In particular, a variety of machine learning methods based on latent representations have shown excellent performance on quantitative structure–activity relationship (QSAR) modeling. However, the sequence feature of them has not been considered in most cases. In addition, data scarcity is still the main obstacle for deep learning strategies, especially for bioactivity datasets. In this study, we propose the convolutional recurrent neural network and transfer learning (CRNNTL) method inspired by the applications of polyphonic sound detection and electrocardiogram classification. Our model takes advantage of both convolutional and recurrent neural networks for feature extraction, as well as the data augmentation method. According to QSAR modeling on 27 datasets, CRNNTL can outperform or compete with state-of-art methods in both drug and material properties. In addition, the performances on one isomers-based dataset indicate that its excellent performance results from the improved ability in global feature extraction when the ability of the local one is maintained. Then, the transfer learning results show that CRNNTL can overcome data scarcity when choosing relative source datasets. Finally, the high versatility of our model is shown by using different latent representations as inputs from other types of AEs.  相似文献   

13.
The completion of the human genome project has opened novel scientific avenues in functional genomics, structural genomics and proteomics. These areas have a common goal: the identification of all the proteins acting and cross-talking in a single cell at a defined moment of its lifecycle. The expansion of these areas in bioscience has been facilitated by the rapid development of high throughput screening (HTS) methods which has, in turn, attracted the business community to make investments in this novel business segment of biotechnology. By using these HTS methods, the hope is that novel targets will be validated much more rapidly speeding up the development of novel drugs. Numerous techniques and tools have emerged over the past decade for the identification of small target-specific molecular ligands that exploit a common feature: the exploration of molecular diversity using combinatorial methods. While chemists developed new methods for rapidly and efficiently synthesising and screening large collections of small molecules, biologists used recombinant DNA techniques for selecting displayed repertoires. To this end, the discovery of new low molecular weight peptides is becoming increasingly important, not only as molecular tools for the understanding of protein-protein interactions but also for the generation of lead compounds.  相似文献   

14.
15.
The discovery/development of novel drug candidates has witnessed dramatic changes over the last two decades. Old methods to identify lead compounds are not suitable to screen wide libraries generated by combinatorial chemistry techniques. High throughput screening (HTS) has become irreplaceable and hundreds of different approaches have been described. Assays based on purified components are flanked by whole cell-based assays, in which reporter genes are used to monitor, directly or indirectly, the influence of a chemical over the metabolism of living cells. The most convenient and widely used reporters for real-time measurements are luciferases, light emitting enzymes from evolutionarily distant organisms. Autofluorescent proteins have been also extensively employed, but proved to be more suitable for end-point measurements, in situ applications - such as the localization of fusion proteins in specific subcellular compartments - or environmental studies on microbial populations. The trend toward miniaturization and the technical advances in detection and liquid handling systems will allow to reach an ultra high throughput screening (uHTS), with 100,000 of compounds routinely screened each day. Here we show how similar approaches may be applied also to the search for new and potent antimicrobial agents.  相似文献   

16.
The rapid development of new machine learning techniques led to significant progress in the area of computer-aided drug design. However, despite the enormous predictive power of new methods, they lack explainability and are often used as black boxes. The most important decisions in drug discovery are still made by human experts who rely on intuitions and simplified representation of the field. We used D3R Grand Challenge 4 to model contributions of human experts during the prediction of the structure of protein–ligand complexes, and prediction of binding affinities for series of ligands in the context of absence or abundance of experimental data. We demonstrated that human decisions have a series of biases: a tendency to focus on easily identifiable protein–ligand interactions such as hydrogen bonds, and neglect for a more distributed and complex electrostatic interactions and solvation effects. While these biases still allow human experts to compete with blind algorithms in some areas, the underutilization of the information leads to significantly worse performance in data-rich tasks such as binding affinity prediction.  相似文献   

17.
In the last decade mass screening strategies became the main source of leads in drug discovery settings. Although high throughput (HTS) and virtual screening (VS) realize the same concept the different nature of these lead discovery strategies (experimental vs theoretical) results that they are typically applied separately. The majority of drug leads are still identified by hit-to-lead optimization of screening hits. Structural information on the target as well as on bound ligands, however, make structure-based and ligand-based virtual screening available for the identification of alternative chemical starting points. Although, the two techniques have rarely been used together on the same target, here we review the existing prominent studies on their true integration. Various approaches have been shown to apply the combination of HTS and VS and to better use them in lead generation. Although several attempts on their integration have only been considered at a conceptual level, there are numerous applications underlining its relevance that early-stage pharmaceutical drug research could benefit from a combined approach.  相似文献   

18.
19.
Over the last few years, machine learning is gradually becoming an essential approach for the investigation of heterogeneous catalysis. As one of the important catalysts, binary alloys have attracted extensive attention for the screening of bifunctional catalysts. Here we present a holistic framework for machine learning approach to rapidly predict adsorption energies on the surfaces of metals and binary alloys. We evaluate different machine-learning methods to understand their applicability to the problem and combine a tree-ensemble method with a compressed-sensing method to construct decision trees for about 60000 adsorption data. Compared to linear scaling relations, our approach enables to make more accurate predictions lowering predictive root-mean-square error by a factor of two and more general to predict adsorption energies of various adsorbates on thousands of binary alloys surfaces, thus paving the way for the discovery of novel bimetallic catalysts.  相似文献   

20.
Over the years numerous papers have presented the effectiveness of various machine learning methods in analyzing drug discovery biological screening data. The predictive performance of models developed using these methods has traditionally been evaluated by assessing performance of the developed models against a portion of the data randomly selected for holdout. It has been our experience that such assessments, while widely practiced, result in an optimistic assessment. This paper describes the development of a series of ensemble-based decision tree models, shares our experience at various stages in the model development process, and presents the impact of such models when they are applied to vendor offerings and the forecasted compounds are acquired and screened in the relevant assays. We have seen that well developed models can significantly increase the hit-rates observed in HTS campaigns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号