首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
2.
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein–ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein–ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.

We generate 3D molecules conditioned on receptor binding sites by training a deep generative model on protein–ligand complexes. Our model uses the conditional receptor information to make chemically relevant changes to the generated molecules.  相似文献   

3.
Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem—classifying reactions into distinct families—and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.

Contrastive pretraining of chemical reactions by matching augmented reaction representations to improve machine learning performance on small reaction datasets.  相似文献   

4.
Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users'' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization.

DeepLigBuilder, a novel deep generative model for structure-based de novo drug design, directly generates 3D structures of drug-like compounds in the target binding site.  相似文献   

5.
Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks. In addition, carefully-designed active learning strategies are incorporated to substantially reduce the number of necessary experiments for model training. We demonstrate the universality and high efficiency of DeepReac+ by achieving the state-of-the-art results with a minimum of labeled data on three diverse chemical reaction datasets in several scenarios. Collectively, DeepReac+ has great potential and utility in the development of AI-aided chemical synthesis. DeepReac+ is freely accessible at https://github.com/bm2-lab/DeepReac.

Based on GNNs and active learning, DeepReac+ is designed as a universal framework for quantitative modeling of chemical reactions. It takes molecular structures as inputs directly and adapts to various prediction tasks with fewer training data.  相似文献   

6.
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds.  相似文献   

7.
Machine learning (ML) consists of the recognition of patterns from training data and offers the opportunity to exploit large structure–activity databases for drug design. In the area of peptide drugs, ML is mostly being tested to design antimicrobial peptides (AMPs), a class of biomolecules potentially useful to fight multidrug-resistant bacteria. ML models have successfully identified membrane disruptive amphiphilic AMPs, however mostly without addressing the associated toxicity to human red blood cells. Here we trained recurrent neural networks (RNN) with data from DBAASP (Database of Antimicrobial Activity and Structure of Peptides) to design short non-hemolytic AMPs. Synthesis and testing of 28 generated peptides, each at least 5 mutations away from training data, allowed us to identify eight new non-hemolytic AMPs against Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus (MRSA). These results show that machine learning (ML) can be used to design new non-hemolytic AMPs.

Machine learning models trained with experimental data for antimicrobial activity and hemolysis are shown to produce new non-hemolytic antimicrobial peptides active against multidrug-resistant bacteria.  相似文献   

8.
Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.

Offline recognition of hand-drawn hydrocarbon structures is learned using an image-to-SMILES neural network through the application of synthetic data generation and ensemble learning.  相似文献   

9.
Recent explosive growth of ‘make-on-demand’ chemical libraries brought unprecedented opportunities but also significant challenges to the field of computer-aided drug discovery. To address this expansion of the accessible chemical universe, molecular docking needs to accurately rank billions of chemical structures, calling for the development of automated hit-selecting protocols to minimize human intervention and error. Herein, we report the development of an artificial intelligence-driven virtual screening pipeline that utilizes Deep Docking with Autodock GPU, Glide SP, FRED, ICM and QuickVina2 programs to screen 40 billion molecules against SARS-CoV-2 main protease (Mpro). This campaign returned a significant number of experimentally confirmed inhibitors of Mpro enzyme, and also enabled to benchmark the performance of twenty-eight hit-selecting strategies of various degrees of stringency and automation. These findings provide new starting scaffolds for hit-to-lead optimization campaigns against Mpro and encourage the development of fully automated end-to-end drug discovery protocols integrating machine learning and human expertise.

Deep learning-accelerated docking coupled with computational hit selection strategies enable the identification of inhibitors for the SARS-CoV-2 main protease from a chemical library of 40 billion small molecules.  相似文献   

10.
A general approach to a new generation of spirocyclic molecules – oxa-spirocycles – was developed. The key synthetic step was iodocyclization. More than 150 oxa-spirocyclic compounds were prepared. Incorporation of an oxygen atom into the spirocyclic unit dramatically improved water solubility (by up to 40 times) and lowered lipophilicity. More potent oxa-spirocyclic analogues of antihypertensive drug terazosin were synthesized and studied in vivo.

A general practical approach to a new generation of spirocyclic molecules – oxa-spirocycles – is developed.  相似文献   

11.
We disclose herein the first example of merging photoredox catalysis and copper catalysis for radical 1,4-carbocyanations of 1,3-enynes. Alkyl N-hydroxyphthalimide esters are utilized as radical precursors, and the reported mild and redox-neutral protocol has broad substrate scope and remarkable functional group tolerance. This strategy allows for the synthesis of diverse multi-substituted allenes with high chemo- and regio-selectivities, also permitting late stage allenylation of natural products and drug molecules.

An efficient synthesis of multi-substituted allenes by metallaphotoredox-catalyzed decarboxylative 1,4-carbocyanation of 1,3-enynes is described.  相似文献   

12.
An outstanding challenge in deep learning in chemistry is its lack of interpretability. The inability of explaining why a neural network makes a prediction is a major barrier to deployment of AI models. This not only dissuades chemists from using deep learning predictions, but also has led to neural networks learning spurious correlations that are difficult to notice. Counterfactuals are a category of explanations that provide a rationale behind a model prediction with satisfying properties like providing chemical structure insights. Yet, counterfactuals have been previously limited to specific model architectures or required reinforcement learning as a separate process. In this work, we show a universal model-agnostic approach that can explain any black-box model prediction. We demonstrate this method on random forest models, sequence models, and graph neural networks in both classification and regression.

Generating model agnostic molecular counterfactual explanations to explain model predictions.  相似文献   

13.
Predicting potentially dangerous chemical reactions is a critical task for laboratory safety. However, a traditional experimental investigation of reaction conditions for possible hazardous or explosive byproducts entails substantial time and cost, for which machine learning prediction could accelerate the process and help detailed experimental investigations. Several machine learning models have been developed which allow the prediction of major chemical reaction products with reasonable accuracy. However, these methods may not present sufficiently high accuracy for the prediction of hazardous products which particularly requires a low false negative result for laboratory safety in order not to miss any dangerous reactions. In this work, we propose an explainable artificial intelligence model that can predict the formation of hazardous reaction products in a binary classification fashion. The reactant molecules are transformed into substructure-encoded fingerprints and then fed into a convolutional neural network to make the binary decision of the chemical reaction. The proposed model shows a false negative rate of 0.09, which can be compared with 0.47–0.66 using the existing main product prediction models. To provide explanations for what substructures of the given reactant molecules are important to make a decision for target hazardous product formation, we apply an input attribution method, layer-wise relevance propagation, which computes the contributions of individual inputs per input data. The computed attributions indeed match some of the existing chemical intuitions and mechanisms, and also offer a way to analyze possible data-imbalance issues of the current predictions based on relatively small positive datasets. We expect that the proposed hazardous product prediction model will be complementary to existing main product prediction models and experimental investigations.

An explainable neural network model is developed to predict the formation of hazardous products for chemical reactions. An input attribution method, layer-wise relevance propagation, is used to explain the decision-making process.  相似文献   

14.
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential''s main features, and judge what they could expect from each one.

This article provides a lifeline for those lost in the sea of the molecular machine learning potentials by providing a balanced overview and evaluation of popular potentials.  相似文献   

15.
Mass spectrometry imaging (MSI) is widely used for the label-free molecular mapping of biological samples. The identification of co-localized molecules in MSI data is crucial to the understanding of biochemical pathways. One of key challenges in molecular colocalization is that complex MSI data are too large for manual annotation but too small for training deep neural networks. Herein, we introduce a self-supervised clustering approach based on contrastive learning, which shows an excellent performance in clustering of MSI data. We train a deep convolutional neural network (CNN) using MSI data from a single experiment without manual annotations to effectively learn high-level spatial features from ion images and classify them based on molecular colocalizations. We demonstrate that contrastive learning generates ion image representations that form well-resolved clusters. Subsequent self-labeling is used to fine-tune both the CNN encoder and linear classifier based on confidently classified ion images. This new approach enables autonomous and high-throughput identification of co-localized species in MSI data, which will dramatically expand the application of spatial lipidomics, metabolomics, and proteomics in biological research.

Contrastive learning is used to train a deep convolutional neural network to identify high-level features in mass spectrometry imaging data. These features enable self-supervised clustering of ion images without manual annotation.  相似文献   

16.
Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.

A physically-inspired machine learning model for orbital energies is developed that can be augmented with delta learning to obtain photoemission spectra, ionization potentials, and electron affinities with experimental accuracy.  相似文献   

17.
The field of mechanically interlocked molecules that employ a handcuff component are reviewed. The variety of rotaxane and catenane structures that use the handcuff motif to interlock different components are discussed and a new nomenclature, distilling diverse terminologies to a single approach, is proposed. By unifying the interpretation of this class of molecules we identify new opportunities for employing this structural unit for new architectures.

Mechanically interlocked molecules that employ a handcuff component provide a pathway to highly unusual structures, a new nomenclature is proposed which helps to identify opportunities for employing this structural unit for new architectures.  相似文献   

18.
Light signal transduction pathways are the central components of mechanisms that regulate plant development, in which photoreceptors receive light and participate in light signal transduction. Chemical systems can be designed to mimic these biological processes that have potential applications in smart sensing, drug delivery and synthetic biology. Here, we synthesized a series of simple photoresponsive molecules for use as photoreceptors in artificial light signal transduction. The hydrophobic structures of these molecules facilitate their insertion into vesicular lipid bilayers, and reversible photoisomerization initiates the reciprocating translocation of molecules in the membrane, thus activating or deactivating the hydrolysis reaction of a precatalyst in the transducer for an encapsulated substrate, resulting in a light controllable output signal. This study represents the first example of using simplified synthetic molecules to simulate light signal transduction performed by complex biomolecules.

Photoisomerization chemistry was used to simulate light signal transduction, in which the light-controlled reciprocating translocation of molecules in lipids activates or deactivates the hydrolysis reaction for an encapsulated substrate.  相似文献   

19.
The addition of sulfonyl radicals to alkenes and alkynes is a valuable method for constructing useful highly functionalized sulfonyl compounds. The underexplored alkoxy- and fluorosulfonyl radicals are easily accessed by CF3 radical addition to readily available allylsulfonic acid derivatives and then β-fragmentation. These substituted sulfonyl radicals add to aryl alkyl alkynes to give vinyl radicals that are trapped by trifluoromethyl transfer to provide tetra-substituted alkenes bearing the privileged alkoxy- or fluorosulfonyl group on one carbon and a trifluoromethyl group on the other. This process exhibits broad functional group compatibility and allows for the late-stage functionalization of drug molecules, demonstrating its potential in drug discovery and chemical biology.

An unprecedented method for vicinal addition of alkoxysulfonyl/fluorosulfonyl and trifluoromethyl groups to aryl alkyl alkynes has been developed to afford useful alkenylsulfonate esters and alkenylsulfonyl fluorides.  相似文献   

20.
Predicting drug–target affinity (DTA) is beneficial for accelerating drug discovery. Graph neural networks (GNNs) have been widely used in DTA prediction. However, existing shallow GNNs are insufficient to capture the global structure of compounds. Besides, the interpretability of the graph-based DTA models highly relies on the graph attention mechanism, which can not reveal the global relationship between each atom of a molecule. In this study, we proposed a deep multiscale graph neural network based on chemical intuition for DTA prediction (MGraphDTA). We introduced a dense connection into the GNN and built a super-deep GNN with 27 graph convolutional layers to capture the local and global structure of the compound simultaneously. We also developed a novel visual explanation method, gradient-weighted affinity activation mapping (Grad-AAM), to analyze a deep learning model from the chemical perspective. We evaluated our approach using seven benchmark datasets and compared the proposed method to the state-of-the-art deep learning (DL) models. MGraphDTA outperforms other DL-based approaches significantly on various datasets. Moreover, we show that Grad-AAM creates explanations that are consistent with pharmacologists, which may help us gain chemical insights directly from data beyond human perception. These advantages demonstrate that the proposed method improves the generalization and interpretation capability of DTA prediction modeling.

MGraphDTA is designed to capture the local and global structure of a compound simultaneously for drug–target affinity prediction and can provide explanations that are consistent with pharmacologists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号