期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Combining machine learning and high-throughput experimentation to discover photocatalytically active organic molecules

Xiaobo Li Phillip M. Maffettone Yu Che Tao Liu Linjiang Chen Andrew I. Cooper 《Chemical science》2021,12(32):10742

相似文献

2.

Generating 3D molecules conditional on receptor binding sites with deep generative models

Matthew Ragoza Tomohide Masuda David Ryan Koes 《Chemical science》2022,13(9):2701

The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein–ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein–ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.

We generate 3D molecules conditioned on receptor binding sites by training a deep generative model on protein–ligand complexes. Our model uses the conditional receptor information to make chemically relevant changes to the generated molecules. 相似文献

3.

Improving machine learning performance on small chemical reaction data with unsupervised contrastive pretraining

Mingjian Wen Samuel M. Blau Xiaowei Xie Shyam Dwaraknath Kristin A. Persson 《Chemical science》2022,13(5):1446

Machine learning (ML) methods have great potential to transform chemical discovery by accelerating the exploration of chemical space and drawing scientific insights from data. However, modern chemical reaction ML models, such as those based on graph neural networks (GNNs), must be trained on a large amount of labelled data in order to avoid overfitting the data and thus possessing low accuracy and transferability. In this work, we propose a strategy to leverage unlabelled data to learn accurate ML models for small labelled chemical reaction data. We focus on an old and prominent problem—classifying reactions into distinct families—and build a GNN model for this task. We first pretrain the model on unlabelled reaction data using unsupervised contrastive learning and then fine-tune it on a small number of labelled reactions. The contrastive pretraining learns by making the representations of two augmented versions of a reaction similar to each other but distinct from other reactions. We propose chemically consistent reaction augmentation methods that protect the reaction center and find they are the key for the model to extract relevant information from unlabelled data to aid the reaction classification task. The transfer learned model outperforms a supervised model trained from scratch by a large margin. Further, it consistently performs better than models based on traditional rule-driven reaction fingerprints, which have long been the default choice for small datasets, as well as those based on reaction fingerprints derived from masked language modelling. In addition to reaction classification, the effectiveness of the strategy is tested on regression datasets; the learned GNN-based reaction fingerprints can also be used to navigate the chemical reaction space, which we demonstrate by querying for similar reactions. The strategy can be readily applied to other predictive reaction problems to uncover the power of unlabelled data for learning better models with a limited supply of labels.

Contrastive pretraining of chemical reactions by matching augmented reaction representations to improve machine learning performance on small reaction datasets. 相似文献

4.

Structure-based de novo drug design using 3D deep generative models

Yibo Li Jianfeng Pei Luhua Lai 《Chemical science》2021,12(41):13664

Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users'' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization.

DeepLigBuilder, a novel deep generative model for structure-based de novo drug design, directly generates 3D structures of drug-like compounds in the target binding site. 相似文献

5.

DeepReac+: deep active learning for quantitative modeling of organic chemical reactions

Yukang Gong Dongyu Xue Guohui Chuai Jing Yu Qi Liu 《Chemical science》2021,12(43):14459

Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks. In addition, carefully-designed active learning strategies are incorporated to substantially reduce the number of necessary experiments for model training. We demonstrate the universality and high efficiency of DeepReac+ by achieving the state-of-the-art results with a minimum of labeled data on three diverse chemical reaction datasets in several scenarios. Collectively, DeepReac+ has great potential and utility in the development of AI-aided chemical synthesis. DeepReac+ is freely accessible at https://github.com/bm2-lab/DeepReac.

Based on GNNs and active learning, DeepReac+ is designed as a universal framework for quantitative modeling of chemical reactions. It takes molecular structures as inputs directly and adapts to various prediction tasks with fewer training data. 相似文献

6.

MEMES: Machine learning framework for Enhanced MolEcular Screening

Sarvesh Mehta Siddhartha Laghuvarapu Yashaswi Pathak Aaftaab Sethi Mallika Alvala U. Deva Priyakumar 《Chemical science》2021,12(35):11710

In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds. 相似文献

7.

Machine learning designs non-hemolytic antimicrobial peptides

Alice Capecchi Xingguang Cai Hippolyte Personne Thilo Khler Christian van Delden Jean-Louis Reymond 《Chemical science》2021,12(26):9221

Machine learning (ML) consists of the recognition of patterns from training data and offers the opportunity to exploit large structure–activity databases for drug design. In the area of peptide drugs, ML is mostly being tested to design antimicrobial peptides (AMPs), a class of biomolecules potentially useful to fight multidrug-resistant bacteria. ML models have successfully identified membrane disruptive amphiphilic AMPs, however mostly without addressing the associated toxicity to human red blood cells. Here we trained recurrent neural networks (RNN) with data from DBAASP (Database of Antimicrobial Activity and Structure of Peptides) to design short non-hemolytic AMPs. Synthesis and testing of 28 generated peptides, each at least 5 mutations away from training data, allowed us to identify eight new non-hemolytic AMPs against Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus (MRSA). These results show that machine learning (ML) can be used to design new non-hemolytic AMPs.

Machine learning models trained with experimental data for antimicrobial activity and hemolysis are shown to produce new non-hemolytic antimicrobial peptides active against multidrug-resistant bacteria. 相似文献

8.

ChemPix: automated recognition of hand-drawn hydrocarbon structures using deep learning

Hayley Weir Keiran Thompson Amelia Woodward Benjamin Choi Augustin Braun Todd J. Martínez 《Chemical science》2021,12(31):10622

Inputting molecules into chemistry software, such as quantum chemistry packages, currently requires domain expertise, expensive software and/or cumbersome procedures. Leveraging recent breakthroughs in machine learning, we develop ChemPix: an offline, hand-drawn hydrocarbon structure recognition tool designed to remove these barriers. A neural image captioning approach consisting of a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder learned a mapping from photographs of hand-drawn hydrocarbon structures to machine-readable SMILES representations. We generated a large auxiliary training dataset, based on RDKit molecular images, by combining image augmentation, image degradation and background addition. Additionally, a small dataset of ∼600 hand-drawn hydrocarbon chemical structures was crowd-sourced using a phone web application. These datasets were used to train the image-to-SMILES neural network with the goal of maximizing the hand-drawn hydrocarbon recognition accuracy. By forming a committee of the trained neural networks where each network casts one vote for the predicted molecule, we achieved a nearly 10 percentage point improvement of the molecule recognition accuracy and were able to assign a confidence value for the prediction based on the number of agreeing votes. The ensemble model achieved an accuracy of 76% on hand-drawn hydrocarbons, increasing to 86% if the top 3 predictions were considered.

Offline recognition of hand-drawn hydrocarbon structures is learned using an image-to-SMILES neural network through the application of synthetic data generation and ensemble learning. 相似文献

9.

Automated discovery of noncovalent inhibitors of SARS-CoV-2 main protease by consensus Deep Docking of 40 billion small molecules

Francesco Gentile Michael Fernandez Fuqiang Ban Anh-Tien Ton Hazem Mslati Carl F. Perez Eric Leblanc Jean Charle Yaacoub James Gleave Abraham Stern Bill Wong Franois Jean Natalie Strynadka Artem Cherkasov 《Chemical science》2021,12(48):15960

Recent explosive growth of ‘make-on-demand’ chemical libraries brought unprecedented opportunities but also significant challenges to the field of computer-aided drug discovery. To address this expansion of the accessible chemical universe, molecular docking needs to accurately rank billions of chemical structures, calling for the development of automated hit-selecting protocols to minimize human intervention and error. Herein, we report the development of an artificial intelligence-driven virtual screening pipeline that utilizes Deep Docking with Autodock GPU, Glide SP, FRED, ICM and QuickVina2 programs to screen 40 billion molecules against SARS-CoV-2 main protease (Mpro). This campaign returned a significant number of experimentally confirmed inhibitors of Mpro enzyme, and also enabled to benchmark the performance of twenty-eight hit-selecting strategies of various degrees of stringency and automation. These findings provide new starting scaffolds for hit-to-lead optimization campaigns against Mpro and encourage the development of fully automated end-to-end drug discovery protocols integrating machine learning and human expertise.

Deep learning-accelerated docking coupled with computational hit selection strategies enable the identification of inhibitors for the SARS-CoV-2 main protease from a chemical library of 40 billion small molecules. 相似文献

10.

Oxa-spirocycles: synthesis,properties and applications

Kateryna Fominova Taras Diachuk Dmitry Granat Taras Savchuk Vladyslav Vilchynskyi Oleksiy Svitlychnyi Vladyslav Meliantsev Igor Kovalchuk Eduard Litskan Vadym V. Levterov Valentyn R. Badlo Ruslan I. Vaskevych Alla I. Vaskevych Andrii V. Bolbut Volodymyr V. Semeno Rustam Iminov Kostiantyn Shvydenko Anastasiia S. Kuznetsova Yurii V. Dmytriv Daniil Vysochyn Vasyl Ripenko Andrei A. Tolmachev Olexandra Pavlova Halyna Kuznietsova Iryna Pishel Petro Borysko Pavel K. Mykhailiuk 《Chemical science》2021,12(34):11294

A general approach to a new generation of spirocyclic molecules – oxa-spirocycles – was developed. The key synthetic step was iodocyclization. More than 150 oxa-spirocyclic compounds were prepared. Incorporation of an oxygen atom into the spirocyclic unit dramatically improved water solubility (by up to 40 times) and lowered lipophilicity. More potent oxa-spirocyclic analogues of antihypertensive drug terazosin were synthesized and studied in vivo.

A general practical approach to a new generation of spirocyclic molecules – oxa-spirocycles – is developed. 相似文献

11.

Decarboxylative 1,4-carbocyanation of 1,3-enynes to access tetra-substituted allenes via copper/photoredox dual catalysis

Ya Chen Junjie Wang Yixin Lu 《Chemical science》2021,12(34):11316

We disclose herein the first example of merging photoredox catalysis and copper catalysis for radical 1,4-carbocyanations of 1,3-enynes. Alkyl N-hydroxyphthalimide esters are utilized as radical precursors, and the reported mild and redox-neutral protocol has broad substrate scope and remarkable functional group tolerance. This strategy allows for the synthesis of diverse multi-substituted allenes with high chemo- and regio-selectivities, also permitting late stage allenylation of natural products and drug molecules.

An efficient synthesis of multi-substituted allenes by metallaphotoredox-catalyzed decarboxylative 1,4-carbocyanation of 1,3-enynes is described. 相似文献

12.

Model agnostic generation of counterfactual explanations for molecules

Geemi P. Wellawatte Aditi Seshadri Andrew D. White 《Chemical science》2022,13(13):3697

An outstanding challenge in deep learning in chemistry is its lack of interpretability. The inability of explaining why a neural network makes a prediction is a major barrier to deployment of AI models. This not only dissuades chemists from using deep learning predictions, but also has led to neural networks learning spurious correlations that are difficult to notice. Counterfactuals are a category of explanations that provide a rationale behind a model prediction with satisfying properties like providing chemical structure insights. Yet, counterfactuals have been previously limited to specific model architectures or required reinforcement learning as a separate process. In this work, we show a universal model-agnostic approach that can explain any black-box model prediction. We demonstrate this method on random forest models, sequence models, and graph neural networks in both classification and regression.

Generating model agnostic molecular counterfactual explanations to explain model predictions. 相似文献

13.

Predicting potentially hazardous chemical reactions using an explainable neural network

Juhwan Kim Geun Ho Gu Juhwan Noh Seongun Kim Suji Gim Jaesik Choi Yousung Jung 《Chemical science》2021,12(33):11028

Predicting potentially dangerous chemical reactions is a critical task for laboratory safety. However, a traditional experimental investigation of reaction conditions for possible hazardous or explosive byproducts entails substantial time and cost, for which machine learning prediction could accelerate the process and help detailed experimental investigations. Several machine learning models have been developed which allow the prediction of major chemical reaction products with reasonable accuracy. However, these methods may not present sufficiently high accuracy for the prediction of hazardous products which particularly requires a low false negative result for laboratory safety in order not to miss any dangerous reactions. In this work, we propose an explainable artificial intelligence model that can predict the formation of hazardous reaction products in a binary classification fashion. The reactant molecules are transformed into substructure-encoded fingerprints and then fed into a convolutional neural network to make the binary decision of the chemical reaction. The proposed model shows a false negative rate of 0.09, which can be compared with 0.47–0.66 using the existing main product prediction models. To provide explanations for what substructures of the given reactant molecules are important to make a decision for target hazardous product formation, we apply an input attribution method, layer-wise relevance propagation, which computes the contributions of individual inputs per input data. The computed attributions indeed match some of the existing chemical intuitions and mechanisms, and also offer a way to analyze possible data-imbalance issues of the current predictions based on relatively small positive datasets. We expect that the proposed hazardous product prediction model will be complementary to existing main product prediction models and experimental investigations.

An explainable neural network model is developed to predict the formation of hazardous products for chemical reactions. An input attribution method, layer-wise relevance propagation, is used to explain the decision-making process. 相似文献

14.

Choosing the right molecular machine learning potential

Max Pinheiro Jr Fuchun Ge Nicolas Ferr Pavlo O. Dral Mario Barbatti 《Chemical science》2021,12(43):14396

Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential''s main features, and judge what they could expect from each one.

This article provides a lifeline for those lost in the sea of the molecular machine learning potentials by providing a balanced overview and evaluation of popular potentials. 相似文献

15.

Self-supervised clustering of mass spectrometry imaging data using contrastive learning

Hang Hu Jyothsna Padmakumar Bindu Julia Laskin 《Chemical science》2021,13(1):90

Mass spectrometry imaging (MSI) is widely used for the label-free molecular mapping of biological samples. The identification of co-localized molecules in MSI data is crucial to the understanding of biochemical pathways. One of key challenges in molecular colocalization is that complex MSI data are too large for manual annotation but too small for training deep neural networks. Herein, we introduce a self-supervised clustering approach based on contrastive learning, which shows an excellent performance in clustering of MSI data. We train a deep convolutional neural network (CNN) using MSI data from a single experiment without manual annotations to effectively learn high-level spatial features from ion images and classify them based on molecular colocalizations. We demonstrate that contrastive learning generates ion image representations that form well-resolved clusters. Subsequent self-labeling is used to fine-tune both the CNN encoder and linear classifier based on confidently classified ion images. This new approach enables autonomous and high-throughput identification of co-localized species in MSI data, which will dramatically expand the application of spatial lipidomics, metabolomics, and proteomics in biological research.

Contrastive learning is used to train a deep convolutional neural network to identify high-level features in mass spectrometry imaging data. These features enable self-supervised clustering of ion images without manual annotation. 相似文献

16.

Physically inspired deep learning of molecular excitations and photoemission spectra

Julia Westermayr Reinhard J. Maurer 《Chemical science》2021,12(32):10755

Modern functional materials consist of large molecular building blocks with significant chemical complexity which limits spectroscopic property prediction with accurate first-principles methods. Consequently, a targeted design of materials with tailored optoelectronic properties by high-throughput screening is bound to fail without efficient methods to predict molecular excited-state properties across chemical space. In this work, we present a deep neural network that predicts charged quasiparticle excitations for large and complex organic molecules with a rich elemental diversity and a size well out of reach of accurate many body perturbation theory calculations. The model exploits the fundamental underlying physics of molecular resonances as eigenvalues of a latent Hamiltonian matrix and is thus able to accurately describe multiple resonances simultaneously. The performance of this model is demonstrated for a range of organic molecules across chemical composition space and configuration space. We further showcase the model capabilities by predicting photoemission spectra at the level of the GW approximation for previously unseen conjugated molecules.

A physically-inspired machine learning model for orbital energies is developed that can be augmented with delta learning to obtain photoemission spectra, ionization potentials, and electron affinities with experimental accuracy. 相似文献

17.

Mechanically interlocked molecular handcuffs

Nicholas Pearce Marysia Tarnowska Nathan J. Andersen Alexander Wahrhaftig-Lewis Ben S. Pilgrim Neil R. Champness 《Chemical science》2022,13(14):3915

The field of mechanically interlocked molecules that employ a handcuff component are reviewed. The variety of rotaxane and catenane structures that use the handcuff motif to interlock different components are discussed and a new nomenclature, distilling diverse terminologies to a single approach, is proposed. By unifying the interpretation of this class of molecules we identify new opportunities for employing this structural unit for new architectures.

Mechanically interlocked molecules that employ a handcuff component provide a pathway to highly unusual structures, a new nomenclature is proposed which helps to identify opportunities for employing this structural unit for new architectures. 相似文献

18.

A system for artificial light signal transduction via molecular translocation in a lipid membrane

Huiting Yang Shengjie Du Zhicheng Ye Xuebin Wang Zexin Yan Cheng Lian Chunyan Bao Linyong Zhu 《Chemical science》2022,13(8):2487

Light signal transduction pathways are the central components of mechanisms that regulate plant development, in which photoreceptors receive light and participate in light signal transduction. Chemical systems can be designed to mimic these biological processes that have potential applications in smart sensing, drug delivery and synthetic biology. Here, we synthesized a series of simple photoresponsive molecules for use as photoreceptors in artificial light signal transduction. The hydrophobic structures of these molecules facilitate their insertion into vesicular lipid bilayers, and reversible photoisomerization initiates the reciprocating translocation of molecules in the membrane, thus activating or deactivating the hydrolysis reaction of a precatalyst in the transducer for an encapsulated substrate, resulting in a light controllable output signal. This study represents the first example of using simplified synthetic molecules to simulate light signal transduction performed by complex biomolecules.

Photoisomerization chemistry was used to simulate light signal transduction, in which the light-controlled reciprocating translocation of molecules in lipids activates or deactivates the hydrolysis reaction for an encapsulated substrate. 相似文献

19.

Radical-mediated vicinal addition of alkoxysulfonyl/fluorosulfonyl and trifluoromethyl groups to aryl alkyl alkynes

Xinrui Dong Wenhua Jiang Dexiang Hua Xiaohui Wang Liang Xu Xiaoxing Wu 《Chemical science》2021,12(35):11762

The addition of sulfonyl radicals to alkenes and alkynes is a valuable method for constructing useful highly functionalized sulfonyl compounds. The underexplored alkoxy- and fluorosulfonyl radicals are easily accessed by CF₃ radical addition to readily available allylsulfonic acid derivatives and then β-fragmentation. These substituted sulfonyl radicals add to aryl alkyl alkynes to give vinyl radicals that are trapped by trifluoromethyl transfer to provide tetra-substituted alkenes bearing the privileged alkoxy- or fluorosulfonyl group on one carbon and a trifluoromethyl group on the other. This process exhibits broad functional group compatibility and allows for the late-stage functionalization of drug molecules, demonstrating its potential in drug discovery and chemical biology.

An unprecedented method for vicinal addition of alkoxysulfonyl/fluorosulfonyl and trifluoromethyl groups to aryl alkyl alkynes has been developed to afford useful alkenylsulfonate esters and alkenylsulfonyl fluorides. 相似文献

20.

MGraphDTA: deep multiscale graph neural network for explainable drug–target binding affinity prediction

Ziduo Yang Weihe Zhong Lu Zhao Calvin Yu-Chian Chen 《Chemical science》2022,13(3):816

Predicting drug–target affinity (DTA) is beneficial for accelerating drug discovery. Graph neural networks (GNNs) have been widely used in DTA prediction. However, existing shallow GNNs are insufficient to capture the global structure of compounds. Besides, the interpretability of the graph-based DTA models highly relies on the graph attention mechanism, which can not reveal the global relationship between each atom of a molecule. In this study, we proposed a deep multiscale graph neural network based on chemical intuition for DTA prediction (MGraphDTA). We introduced a dense connection into the GNN and built a super-deep GNN with 27 graph convolutional layers to capture the local and global structure of the compound simultaneously. We also developed a novel visual explanation method, gradient-weighted affinity activation mapping (Grad-AAM), to analyze a deep learning model from the chemical perspective. We evaluated our approach using seven benchmark datasets and compared the proposed method to the state-of-the-art deep learning (DL) models. MGraphDTA outperforms other DL-based approaches significantly on various datasets. Moreover, we show that Grad-AAM creates explanations that are consistent with pharmacologists, which may help us gain chemical insights directly from data beyond human perception. These advantages demonstrate that the proposed method improves the generalization and interpretation capability of DTA prediction modeling.

MGraphDTA is designed to capture the local and global structure of a compound simultaneously for drug–target affinity prediction and can provide explanations that are consistent with pharmacologists. 相似文献