首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Characterization of molecular species based on the use of suitable graph invariants (graph paths, in particular) can provide a quantitative means of encoding structure; the technique is complementary to commoner approaches to studies of quantitative structure— activity relationships. Graph path encoding is here applied to quantitative studies of relationships between molecular structures and biological activity; the examples are the rates of various substrate reactions with hexoldnase, and the potential opiate-like activity of enkephalin analogs.  相似文献   

2.
Hundreds of catalytic methods are developed each year to meet the demand for high-purity chiral compounds. The computational design of enantioselective organocatalysts remains a significant challenge, as catalysts are typically discovered through experimental screening. Recent advances in combining quantum chemical computations and machine learning (ML) hold great potential to propel the next leap forward in asymmetric catalysis. Within the context of quantum chemical machine learning (QML, or atomistic ML), the ML representations used to encode the three-dimensional structure of molecules and evaluate their similarity cannot easily capture the subtle energy differences that govern enantioselectivity. Here, we present a general strategy for improving molecular representations within an atomistic machine learning model to predict the DFT-computed enantiomeric excess of asymmetric propargylation organocatalysts solely from the structure of catalytic cycle intermediates. Mean absolute errors as low as 0.25 kcal mol−1 were achieved in predictions of the activation energy with respect to DFT computations. By virtue of its design, this strategy is generalisable to other ML models, to experimental data and to any catalytic asymmetric reaction, enabling the rapid screening of structurally diverse organocatalysts from available structural information.

A machine learning model for enantioselectivity prediction using reaction-based molecular representations.  相似文献   

3.
4.
Electronic structure methods based on quantum mechanics (QM) are widely employed in the computational predictions of the molecular properties and optoelectronic properties of molecular materials. The computational costs of these QM methods, ranging from density functional theory (DFT) or time-dependent DFT (TDDFT) to wave-function theory (WFT), usually increase sharply with the system size, causing the curse of dimensionality and hindering the QM calculations for large sized systems such as long polymer oligomers and complex molecular aggregates. In such cases, in recent years low scaling QM methods and machine learning (ML) techniques have been adopted to reduce the computational costs and thus assist computational and data driven molecular material design. In this review, we illustrated low scaling ground-state and excited-state QM approaches and their applications to long oligomers, self-assembled supramolecular complexes, stimuli-responsive materials, mechanically interlocked molecules, and excited state processes in molecular aggregates. Variable electrostatic parameters were also introduced in the modified force fields with the polarization model. On the basis of QM computational or experimental datasets, several ML algorithms, including explainable models, deep learning, and on-line learning methods, have been employed to predict the molecular energies, forces, electronic structure properties, and optical or electrical properties of materials. It can be conceived that low scaling algorithms with periodic boundary conditions are expected to be further applicable to functional materials, perhaps in combination with machine learning to fast predict the lattice energy, crystal structures, and spectroscopic properties of periodic functional materials.

Low scaling quantum mechanics calculations and machine learning can be employed to efficiently predict the molecular energies, forces, and optical and electrical properties of molecular materials and their aggregates.  相似文献   

5.
Feature extraction is essential for chemical property estimation of molecules using machine learning. Recently, graph neural networks have attracted attention for feature extraction from molecules. However, existing methods focus only on specific structural information, such as node relationship. In this paper, we propose a novel graph convolutional neural network that performs feature extraction with simultaneously considering multiple structures. Specifically, we propose feature extraction paths specialized in node, edge, and three-dimensional structures. Moreover, we propose an attention mechanism to aggregate the features extracted by the paths. The attention aggregation enables us to select useful features dynamically. The experimental results showed that the proposed method outperformed previous methods.  相似文献   

6.
Establishing a unified framework for describing the structures of molecular and periodic systems is a long-standing challenge in physics, chemistry, and material science. With the rise of machine learning methods in these fields, there is a growing need for such a method. This perspective aims to discuss the development and use of three promising approaches—topological, atom-density, and symmetry-based—for the prediction and rationalization of physical, chemical, and mechanical properties of atomistic systems across different scales and compositions.  相似文献   

7.
8.
9.
10.
There exists many databases containing information on genes that are useful for background information in machine learning analysis of microarray data. The gene ontology and gene ontology annotation projects are among the most comprehensive of these. We demonstrate how inductive logic programming (ILP) can be used to build classification rules for microarray data which naturally incorporates the gene ontology and annotations to it as background knowledge without removing the inherent graph structure of the ontology. The ILP rules generated are parsimonious and easy to interpret. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

11.
Monte Carlo (MC) methods are important computational tools for molecular structure optimizations and predictions. When solvent effects are explicitly considered, MC methods become very expensive due to the large degree of freedom associated with the water molecules and mobile ions. Alternatively implicit-solvent MC can largely reduce the computational cost by applying a mean field approximation to solvent effects and meanwhile maintains the atomic detail of the target molecule. The two most popular implicit-solvent models are the Poisson-Boltzmann (PB) model and the Generalized Born (GB) model in a way such that the GB model is an approximation to the PB model but is much faster in simulation time. In this work, we develop a machine learning-based implicit-solvent Monte Carlo (MLIMC) method by combining the advantages of both implicit solvent models in accuracy and efficiency. Specifically, the MLIMC method uses a fast and accurate PB-based machine learning (PBML) scheme to compute the electrostatic solvation free energy at each step. We validate our MLIMC method by using a benzene-water system and a protein-water system. We show that the proposed MLIMC method has great advantages in speed and accuracy for molecular structure optimization and prediction.  相似文献   

12.
13.
The chromosome aberration test is frequently used for the assessment of the potential of chemicals and drugs to elicit genetic damage in mammalian cells in vitro. Due to the limitations of experimental genotoxicity testing in early drug discovery phases, a model to predict the chromosome aberration test yielding high accuracy and providing guidance for structure optimization is urgently needed. In this paper, we describe a machine learning approach for predicting the outcome of this assay based on the structure of the investigated compound. The novelty of the proposed method consists in combining a maximum common subgraph kernel for measuring the similarity of two chemical graphs with the potential support vector machine for classification. In contrast to standard support vector machine classifiers, the proposed approach does not provide a black box model but rather allows to visualize structural elements with high positive or negative contribution to the class decision. In order to compare the performance of different methods for predicting the outcome of the chromosome aberration test, we compiled a large data set exhibiting high quality, reliability, and consistency from public sources and configured a fixed cross-validation protocol, which we make publicly available. In a comparison to standard methods currently used in pharmaceutical industry as well as to other graph kernel approaches, the proposed method achieved significantly better performance.  相似文献   

14.
15.
16.
Abstract

In the genomic era DNA sequencing is increasing our knowledge of the molecular structure of genetic codes from bacteria to man at a hyperbolic rate. Billions of nucleotides and millions of aminoacids are already filling the electronic files of the data bases presently available, which contain a tremendous amount of information on the most biologically relevant macromolecules, such as DNA. RNA and proteins. The most urgent problem originates from the need to single out the relevant information amidst a wealth of general features. Intelligent tools are therefore needed to optimise the search. Data mining for sequence analysis in biotechnology has been substantially aided by the development of new powerful methods borrowed from the machine learning approach. In this paper we discuss the application of artificial feedforward neural networks to deal with some fundamental problems tied with the folding process and the structure-function relationship in proteins.  相似文献   

17.
18.
Methods to automate structure elucidation that can be applied broadly across chemical structure space have the potential to greatly accelerate chemical discovery. NMR spectroscopy is the most widely used and arguably the most powerful method for elucidating structures of organic molecules. Here we introduce a machine learning (ML) framework that provides a quantitative probabilistic ranking of the most likely structural connectivity of an unknown compound when given routine, experimental one dimensional 1H and/or 13C NMR spectra. In particular, our ML-based algorithm takes input NMR spectra and (i) predicts the presence of specific substructures out of hundreds of substructures it has learned to identify; (ii) annotates the spectrum to label peaks with predicted substructures; and (iii) uses the substructures to construct candidate constitutional isomers and assign to them a probabilistic ranking. Using experimental spectra and molecular formulae for molecules containing up to 10 non-hydrogen atoms, the correct constitutional isomer was the highest-ranking prediction made by our model in 67.4% of the cases and one of the top-ten predictions in 95.8% of the cases. This advance will aid in solving the structure of unknown compounds, and thus further the development of automated structure elucidation tools that could enable the creation of fully autonomous reaction discovery platforms.

A machine learning model and graph generator were able to accurately predict for the presence of nearly 1000 substructures and the connectivity of small organic molecules from experimental 1D NMR data.  相似文献   

19.
Transition states are among the most important molecular structures in chemistry, critical to a variety of fields such as reaction kinetics, catalyst design, and the study of protein function. However, transition states are very unstable, typically only existing on the order of femtoseconds. The transient nature of these structures makes them incredibly difficult to study, thus chemists often turn to simulation. Unfortunately, computer simulation of transition states is also challenging, as they are first-order saddle points on highly dimensional mathematical surfaces. Locating these points is resource intensive and unreliable, resulting in methods which can take very long to converge. Machine learning, a relatively novel class of algorithm, has led to radical changes in several fields of computation, including computer vision and natural language processing due to its aptitude for highly accurate function approximation. While machine learning has been widely adopted throughout computational chemistry as a lightweight alternative to costly quantum mechanical calculations, little research has been pursued which utilizes machine learning for transition state structure optimization. In this paper TSNet is presented, a new end-to-end Siamese message-passing neural network based on tensor field networks shown to be capable of predicting transition state geometries. Also presented is a small dataset of SN2 reactions which includes transition state structures – the first of its kind built specifically for machine learning. Finally, transfer learning, a low data remedial technique, is explored to understand the viability of pretraining TSNet on widely available chemical data may provide better starting points during training, faster convergence, and lower loss values. Aspects of the new dataset and model shall be discussed in detail, along with motivations and general outlook on the future of machine learning-based transition state prediction.

Transition states are among the most important molecular structures in chemistry, critical to a variety of fields such as reaction kinetics, catalyst design, and the study of protein function.  相似文献   

20.
Recent availability of large publicly accessible databases of chemical compounds and their biological activities (PubChem, ChEMBL) has inspired us to develop a web‐based tool for structure activity relationship and quantitative structure activity relationship modeling to add to the services provided by CHARMMing ( www.charmming.org ). This new module implements some of the most recent advances in modern machine learning algorithms—Random Forest, Support Vector Machine, Stochastic Gradient Descent, Gradient Tree Boosting, so forth. A user can import training data from Pubchem Bioassay data collections directly from our interface or upload his or her own SD files which contain structures and activity information to create new models (either categorical or numerical). A user can then track the model generation process and run models on new data to predict activity. © 2014 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号