首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.

A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.  相似文献   

2.
Fast and accurate simulation of complex chemical systems in environments such as solutions is a long standing challenge in theoretical chemistry. In recent years, machine learning has extended the boundaries of quantum chemistry by providing highly accurate and efficient surrogate models of electronic structure theory, which previously have been out of reach for conventional approaches. Those models have long been restricted to closed molecular systems without accounting for environmental influences, such as external electric and magnetic fields or solvent effects. Here, we introduce the deep neural network FieldSchNet for modeling the interaction of molecules with arbitrary external fields. FieldSchNet offers access to a wealth of molecular response properties, enabling it to simulate a wide range of molecular spectra, such as infrared, Raman and nuclear magnetic resonance. Beyond that, it is able to describe implicit and explicit molecular environments, operating as a polarizable continuum model for solvation or in a quantum mechanics/molecular mechanics setup. We employ FieldSchNet to study the influence of solvent effects on molecular spectra and a Claisen rearrangement reaction. Based on these results, we use FieldSchNet to design an external environment capable of lowering the activation barrier of the rearrangement reaction significantly, demonstrating promising venues for inverse chemical design.

A machine learning approach for modeling the influence of external environments and fields on molecules has been developed, which allows the prediction of various types of molecular spectra in vacuum and under implicit and explicit solvation.  相似文献   

3.
The use of machine learning techniques in computational chemistry has gained significant momentum since large molecular databases are now readily available. Predictions of molecular properties using machine learning have advantages over the traditional quantum mechanics calculations because they can be cheaper computationally without losing the accuracy. We present a new extrapolatable and explainable molecular representation based on bonds, angles and dihedrals that can be used to train machine learning models. The trained models can accurately predict the electronic energy and the free energy of small organic molecules with atom types C, H N and O, with a mean absolute error of 1.2 kcal mol−1. The models can be extrapolated to larger organic molecules with an average error of less than 3.7 kcal mol−1 for 10 or fewer heavy atoms, which represent a chemical space two orders of magnitude larger. The rapid energy predictions of multiple molecules, up to 7 times faster than previous ML models of similar accuracy, has been achieved by sampling geometries around the potential energy surface minima. Therefore, the input geometries do not have to be located precisely on the minima and we show that accurate density functional theory energy predictions can be made from force-field optimised geometries with a mean absolute error 2.5 kcal mol−1.

New representations and machine learning calculate DFT minima from force field geometries.  相似文献   

4.
Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels–Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.

An efficient strategy for training Gaussian Approximation Potential (GAP) models to study chemical reactions using hierarchical and active learning.  相似文献   

5.
Synthetic polymers are versatile and widely used materials. Similar to small organic molecules, a large chemical space of such materials is hypothetically accessible. Computational property prediction and virtual screening can accelerate polymer design by prioritizing candidates expected to have favorable properties. However, in contrast to organic molecules, polymers are often not well-defined single structures but an ensemble of similar molecules, which poses unique challenges to traditional chemical representations and machine learning approaches. Here, we introduce a graph representation of molecular ensembles and an associated graph neural network architecture that is tailored to polymer property prediction. We demonstrate that this approach captures critical features of polymeric materials, like chain architecture, monomer stoichiometry, and degree of polymerization, and achieves superior accuracy to off-the-shelf cheminformatics methodologies. While doing so, we built a dataset of simulated electron affinity and ionization potential values for >40k polymers with varying monomer composition, stoichiometry, and chain architecture, which may be used in the development of other tailored machine learning approaches. The dataset and machine learning models presented in this work pave the path toward new classes of algorithms for polymer informatics and, more broadly, introduce a framework for the modeling of molecular ensembles.

A graph representation that captures critical features of polymeric materials and an associated graph neural network achieve superior accuracy to off-the-shelf cheminformatics methodologies.  相似文献   

6.
7.
Subtle variations in the lipid composition of mitochondrial membranes can have a profound impact on mitochondrial function. The inner mitochondrial membrane contains the phospholipid cardiolipin, which has been demonstrated to act as a biomarker for a number of diverse pathologies. Small molecule dyes capable of selectively partitioning into cardiolipin membranes enable visualization and quantification of the cardiolipin content. Here we present a data-driven approach that combines a deep learning-enabled active learning workflow with coarse-grained molecular dynamics simulations and alchemical free energy calculations to discover small organic compounds able to selectively permeate cardiolipin-containing membranes. By employing transferable coarse-grained models we efficiently navigate the all-atom design space corresponding to small organic molecules with molecular weight less than ≈500 Da. After direct simulation of only 0.42% of our coarse-grained search space we identify molecules with considerably increased levels of cardiolipin selectivity compared to a widely used cardiolipin probe 10-N-nonyl acridine orange. Our accumulated simulation data enables us to derive interpretable design rules linking coarse-grained structure to cardiolipin selectivity. The findings are corroborated by fluorescence anisotropy measurements of two compounds conforming to our defined design rules. Our findings highlight the potential of coarse-grained representations and multiscale modelling for materials discovery and design.

We present a data-driven approach combining deep learning-enabled active learning with coarse-grained simulations and alchemical free energy calculations to discover small molecules to selectively permeate cardiolipin membranes.  相似文献   

8.
The automatic recognition of the molecular content of a molecule''s graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.

The automatic recognition of the molecular content of a molecule''s graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research.  相似文献   

9.
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein–ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein–ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.

We generate 3D molecules conditioned on receptor binding sites by training a deep generative model on protein–ligand complexes. Our model uses the conditional receptor information to make chemically relevant changes to the generated molecules.  相似文献   

10.
Various computational methods have been developed for quantitative modeling of organic chemical reactions; however, the lack of universality as well as the requirement of large amounts of experimental data limit their broad applications. Here, we present DeepReac+, an efficient and universal computational framework for prediction of chemical reaction outcomes and identification of optimal reaction conditions based on deep active learning. Under this framework, DeepReac is designed as a graph-neural-network-based model, which directly takes 2D molecular structures as inputs and automatically adapts to different prediction tasks. In addition, carefully-designed active learning strategies are incorporated to substantially reduce the number of necessary experiments for model training. We demonstrate the universality and high efficiency of DeepReac+ by achieving the state-of-the-art results with a minimum of labeled data on three diverse chemical reaction datasets in several scenarios. Collectively, DeepReac+ has great potential and utility in the development of AI-aided chemical synthesis. DeepReac+ is freely accessible at https://github.com/bm2-lab/DeepReac.

Based on GNNs and active learning, DeepReac+ is designed as a universal framework for quantitative modeling of chemical reactions. It takes molecular structures as inputs directly and adapts to various prediction tasks with fewer training data.  相似文献   

11.
12.
This perspective highlights our recent efforts to develop interactive resources in chemical education for worldwide usage. First, we highlight online tutorials that connect organic chemistry to medicine and popular culture, along with game-like resources for active learning. Next, we describe efforts to aid students in learning to visualize chemical structures in three dimensions. Finally, we present recent approaches toward engaging children and the general population through organic chemistry coloring and activity books. Collectively, these tools have benefited hundreds of thousands of users worldwide. We hope this perspective promotes a spirit of innovation in chemical education and spurs the development of additional free, interactive, and widely accessible chemical education resources.

This perspective highlights the development of interactive chemical education resources for worldwide usage. We hope to promote a spirit of innovation in chemical education and spur the development of new chemical education resources.  相似文献   

13.
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as “hits”. In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.

A novel machine learning framework based on Bayesian optimization for efficient sampling of chemical space. The framework is able to identify 90% of top-1000 hits by only sampling 6% of the complete dataset containing ∼100 million compounds.  相似文献   

14.
15.
Short aliphatic groups are prevalent in bioactive small molecules and play an essential role in regulating physicochemistry and molecular recognition phenomena. Delineating their biological origins and significance have resulted in landmark developments in synthetic organic chemistry: Arigoni''s venerable synthesis of the chiral methyl group is a personal favourite. Whilst radioisotopes allow the steric footprint of the native group to be preserved, this strategy was never intended for therapeutic chemotype development. In contrast, leveraging H → F bioisosterism provides scope to complement the chiral, radioactive bioisostere portfolio and to reach unexplored areas of chiral chemical space for small molecule drug discovery. Accelerated by advances in I(i)/I(iii) catalysis, the current arsenal of achiral 2D and 3D drug discovery modules is rapidly expanding to include chiral units with unprecedented topologies and van der Waals volumes. This Perspective surveys key developments in the design and synthesis of short multivicinal fluoroalkanes under the auspices of main group catalysis paradigms.

Short aliphatic groups are prevalent in bioactive small molecules and play an essential role in regulating physicochemistry and molecular recognition phenomena.  相似文献   

16.
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential''s main features, and judge what they could expect from each one.

This article provides a lifeline for those lost in the sea of the molecular machine learning potentials by providing a balanced overview and evaluation of popular potentials.  相似文献   

17.
How molecules pack has vital ramifications for their applications as functional molecular materials. Small changes in a molecule''s functionality can lead to large, non-intuitive, changes in their global solid-state packing, resulting in difficulty in targeted design. Predicting the crystal structure of organic molecules from only their molecular structure is a well-known problem plaguing crystal engineering. Although relevant to the properties of many organic molecules, the packing behaviour of modular porous materials, such as porous organic cages (POCs), greatly impacts the properties of the material. We present a novel way of predicting the solid-state phase behaviour of POCs by using a simplistic model containing the dominant degrees of freedom driving crystalline phase formation. We employ coarse-grained simulations to systematically study how chemical functionality of pseudo-octahedral cages can be used to manipulate the solid-state phase formation of POCs. Our results support those of experimentally reported structures, showing that for cages which pack via their windows forming a porous network, only one phase is formed, whereas when cages pack via their windows and arenes, the phase behaviour is more complex. While presenting a lower computational cost route for predicting molecular crystal packing, coarse-grained models also allow for the development of design rules which we start to formulate through our results.

This work presents a novel method for predicting molecular crystal structure formation using coarse-grained modelling, enabling the development of design rules.  相似文献   

18.
Computational methods, including crystal structure and property prediction, have the potential to accelerate the materials discovery process by enabling structure prediction and screening of possible molecular building blocks prior to their synthesis. However, the discovery of new functional molecular materials is still limited by the need to identify promising molecules from a vast chemical space. We describe an evolutionary method which explores a user specified region of chemical space to identify promising molecules, which are subsequently evaluated using crystal structure prediction. We demonstrate the methods for the exploration of aza-substituted pentacenes with the aim of finding small molecule organic semiconductors with high charge carrier mobilities, where the space of possible substitution patterns is too large to exhaustively search using a high throughput approach. The method efficiently explores this large space, typically requiring calculations on only ∼1% of molecules during a search. The results reveal two promising structural motifs: aza-substituted naphtho[1,2-a]anthracenes with reorganisation energies as low as pentacene and a series of pyridazine-based molecules having both low reorganisation energies and high electron affinities.

Evolutionary optimisation and crystal structure prediction are used to explore chemical space for molecular organic semiconductors.  相似文献   

19.
The self-assembled inclusion of molecules into two-dimensional (2D) porous networks on surfaces has been extensively studied because 2D functional materials consisting of organic molecules have become an important research topic. However, the isolation of a single molecular thiol remains a challenging goal. Here, we report a method of planting and isolating organothiols onto a 2D patterned organic adlayer at an electrochemical interface. In situ scanning tunneling microscopy revealed that the phase transition of an ovalene adlayer is electrochemically induced and that the gap site created by three ovalene molecules serves as a 2D molecular template to isolate thiol molecules and to standardize the distance between them via the formation of precise selective open spaces, suggesting that electrochemical “molecular planting” opens applications for 2D patterns of isolated single organothiol molecules.

Gap sites electrochemically created in the ovalene adlayer can accept a single thiol.  相似文献   

20.
Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.

We propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient sequence-based synthesis prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号