首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Hundreds of catalytic methods are developed each year to meet the demand for high-purity chiral compounds. The computational design of enantioselective organocatalysts remains a significant challenge, as catalysts are typically discovered through experimental screening. Recent advances in combining quantum chemical computations and machine learning (ML) hold great potential to propel the next leap forward in asymmetric catalysis. Within the context of quantum chemical machine learning (QML, or atomistic ML), the ML representations used to encode the three-dimensional structure of molecules and evaluate their similarity cannot easily capture the subtle energy differences that govern enantioselectivity. Here, we present a general strategy for improving molecular representations within an atomistic machine learning model to predict the DFT-computed enantiomeric excess of asymmetric propargylation organocatalysts solely from the structure of catalytic cycle intermediates. Mean absolute errors as low as 0.25 kcal mol−1 were achieved in predictions of the activation energy with respect to DFT computations. By virtue of its design, this strategy is generalisable to other ML models, to experimental data and to any catalytic asymmetric reaction, enabling the rapid screening of structurally diverse organocatalysts from available structural information.

A machine learning model for enantioselectivity prediction using reaction-based molecular representations.  相似文献   

2.
3.
4.
Predicting potentially dangerous chemical reactions is a critical task for laboratory safety. However, a traditional experimental investigation of reaction conditions for possible hazardous or explosive byproducts entails substantial time and cost, for which machine learning prediction could accelerate the process and help detailed experimental investigations. Several machine learning models have been developed which allow the prediction of major chemical reaction products with reasonable accuracy. However, these methods may not present sufficiently high accuracy for the prediction of hazardous products which particularly requires a low false negative result for laboratory safety in order not to miss any dangerous reactions. In this work, we propose an explainable artificial intelligence model that can predict the formation of hazardous reaction products in a binary classification fashion. The reactant molecules are transformed into substructure-encoded fingerprints and then fed into a convolutional neural network to make the binary decision of the chemical reaction. The proposed model shows a false negative rate of 0.09, which can be compared with 0.47–0.66 using the existing main product prediction models. To provide explanations for what substructures of the given reactant molecules are important to make a decision for target hazardous product formation, we apply an input attribution method, layer-wise relevance propagation, which computes the contributions of individual inputs per input data. The computed attributions indeed match some of the existing chemical intuitions and mechanisms, and also offer a way to analyze possible data-imbalance issues of the current predictions based on relatively small positive datasets. We expect that the proposed hazardous product prediction model will be complementary to existing main product prediction models and experimental investigations.

An explainable neural network model is developed to predict the formation of hazardous products for chemical reactions. An input attribution method, layer-wise relevance propagation, is used to explain the decision-making process.  相似文献   

5.
Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement.  相似文献   

6.
7.
Learning to predict chemical reactions   总被引:1,自引:0,他引:1  
  相似文献   

8.
建立了舒血宁注射剂中16种黄酮及4种萜类内酯同时定量的超高效液相色谱-质谱分析方法,并采用所建立的方法同时测定了不同厂家舒血宁注射剂产品中上述20种化合物的含量。注射剂样品经甲醇-水(体积比1∶1)稀释后,在Acquity UPLC BEH Shield RP18(2.1 mm×100 mm,1.7μm)色谱柱上分析,以0.1%(体积分数)甲酸水溶液-乙腈为流动相进行梯度洗脱,以电喷雾离子源负离子多反应监测(MRM)模式进行质谱监测。结果表明:20种化合物可在10 min内完成色谱分离分析,检出限和定量下限分别为0.02~1.59 ng/mL和0.07~5.30 ng/mL,16种黄酮及4种萜类内酯在各自的线性范围内线性关系良好,在低、中、高3个加标水平下的回收率为85.9%~109%。该方法前处理简单、快速高效、准确性高,为舒血宁注射剂的质量控制提供了参考。  相似文献   

9.
该文建立了一种基于中药多元多息指纹图谱联合人工智能识别的中药一法通识品种鉴定新方法,方法先通过对同一味药材采用不同处理手段得到具有不同性质的化学成分信息,并构建一种集反相色谱法、亲水色谱法以及分子排阻色谱法为一体的具有普适性的多元多息指纹图谱采集方式,实现了药材中小极性小分子、大极性小分子及大分子类化合物的全面表征,再对采集的多元多息指纹图谱进行数据标准化处理,采用卷积神经网络识别不同品种中药材,获得了准确率达92%的识别模型。该方法能够对中药品种进行快速、准确、高效地鉴定,克服了传统中药品种鉴别中的主观色彩,能够更加客观精准地给出鉴定结果。  相似文献   

10.
A broad collection of technologies, including e.g. drug metabolism, biofuel combustion, photochemical decontamination of water, and interfacial passivation in energy production/storage systems rely on chemical processes that involve bond-breaking molecular reactions. In this context, a fundamental thermodynamic property of interest is the bond dissociation energy (BDE) which measures the strength of a chemical bond. Fast and accurate prediction of BDEs for arbitrary molecules would lay the groundwork for data-driven projections of complex reaction cascades and hence a deeper understanding of these critical chemical processes and, ultimately, how to reverse design them. In this paper, we propose a chemically inspired graph neural network machine learning model, BonDNet, for the rapid and accurate prediction of BDEs. BonDNet maps the difference between the molecular representations of the reactants and products to the reaction BDE. Because of the use of this difference representation and the introduction of global features, including molecular charge, it is the first machine learning model capable of predicting both homolytic and heterolytic BDEs for molecules of any charge. To test the model, we have constructed a dataset of both homolytic and heterolytic BDEs for neutral and charged (−1 and +1) molecules. BonDNet achieves a mean absolute error (MAE) of 0.022 eV for unseen test data, significantly below chemical accuracy (0.043 eV). Besides the ability to handle complex bond dissociation reactions that no previous model could consider, BonDNet distinguishes itself even in only predicting homolytic BDEs for neutral molecules; it achieves an MAE of 0.020 eV on the PubChem BDE dataset, a 20% improvement over the previous best performing model. We gain additional insight into the model''s predictions by analyzing the patterns in the features representing the molecules and the bond dissociation reactions, which are qualitatively consistent with chemical rules and intuition. BonDNet is just one application of our general approach to representing and learning chemical reactivity, and it could be easily extended to the prediction of other reaction properties in the future.

Prediction of bond dissociation energies for charged molecules with a graph neural network enabled by global molecular features and reaction difference features between products and reactants.  相似文献   

11.
In organic chemistry, Comparative Molecular Field Analysis (CoMFA) can be defined as a regression analysis between reaction outcomes and molecular fields, wherein we can extract and visualize important structural information from the coefficients of the constructed regression models. In CoMFA, partial least‐squares (PLS) regression, which determines all coefficients in the model, is used for fitting the regression models. However, in organic reactions, steric effects are observed only near the reactive site, indicating that a large number of regression coefficients in the CoMFA of organic reactions should be assigned as 0. The regularized regression method, LASSO/Elastic Net, allows us to fit the regression model while assigning 0 values to unimportant coefficients. Although LASSO/Elastic Net should be suitable for CoMFA, there is no example of its use for organic reaction analysis. Herein, we examine the performance of LASSO/Elastic Net for the quantification of steric effects in CoMFA. We employ digitized molecular structures (the indicator field) as molecular fields that represent steric effects. LASSO/Elastic Net regressions provide highly interpretable models that include less noise than those from PLS regression. © 2017 Wiley Periodicals, Inc.  相似文献   

12.
分别以支持向量机(SVM)和KStar方法为基础, 构建了代谢产物的分子形状判别和代谢反应位点判别的嵌套预测模型. 分子形状判别模型是以272个分子为研究对象, 计算了包括分子拓扑、二维自相关、几何结构等在内的1280个分子描述符, 考查了支持向量机、决策树、贝叶斯网络、k最近邻这四种机器学习方法建立分类预测模型的准确性. 结果表明, 支持向量机优于其他方法, 此模型可用于预测分子能否被细胞色素P450酶催化发生氧脱烃反应. 代谢反应位点判别模型以538个氧脱烃反应代谢位点为研究对象, 计算了表征原子能量、价态、电荷等26个量子化学特征, 比较了决策树、贝叶斯网络、KStar、人工神经网络建模的准确率. 结果显示, KStar模型的准确率、敏感性、专一性均在90%以上, 对分子形状判别模型筛选出的分子, 此模型能较好地判断出哪个C―O键发生断裂. 本文以15个代谢反应明确的中药分子为验证集, 验证模型准确性, 研究结果表明基于SVM和KStar的嵌套预测模型具有一定的准确性, 有助于开展中药分子氧脱烃代谢产物的预测研究.  相似文献   

13.
14.
15.
Determining reaction mechanisms and kinetic models, which can be used for chemical reaction engineering and design, from atomistic simulation is highly challenging. In this study, we develop a novel methodology to solve this problem. Our approach has three components: (1) a procedure for precisely identifying chemical species and elementary reactions and statistically calculating the reaction rate constants; (2) a reduction method to simplify the complex reaction network into a skeletal network which can be used directly for kinetic modeling; and (3) a deterministic method for validating the derived full and skeletal kinetic models. The methodology is demonstrated by analyzing simulation data of hydrogen combustion. The full reaction network comprises 69 species and 256 reactions, which is reduced into a skeletal network of 9 species and 30 reactions. The kinetic models of both the full and skeletal networks represent the simulation data well. In addition, the essential elementary reactions and their rate constants agree favorably with those obtained experimentally. © 2019 Wiley Periodicals, Inc.  相似文献   

16.
Modern digital methods and powerful computers make it possible to simulate the time behavior of chemical reactions. These calculations can be performed on systems containing an almost unlimited number of elementary reactions. Generally, however, the reaction models used should contain only those elementary reactions which describe the bulk of the conversion. Such a reaction model may be obtained by reduction of the complete set of elementary reactions. Another possibility is analysis of the chemical system starting from conditions ensuring a simple chemistry, which is generally the case at low temperatures and low conversions. The reaction model may then be extended into the range of the reaction variables (temperature, time) of interest. Mathematical simulations may be helpful during the development of the reaction model, and sometimes even decisive. These methods were applied to the pyrolysis of ethylbenzene and n-hexane, and to CO oxidation. They yield information on the reaction paths, the importance of particular elementary reactions, and reaction stability. Furthermore, quantitative data can be obtained concerning the influence of single elementary reactions on the product distribution. The sensitivity matrix shows, e.g., whether the determination of kinetic parameters of an elementary reaction from kinetic data of the overall reaction is possible in principle, and how high the accuracy of the rate constants should be for simulation of the reaction. Both results are important for modeling chemical reactions.  相似文献   

17.
This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor – human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The first ensemble aggregates six classification models which vote whether a ligand binds to a receptor or not. If a ligand is classified as active (i.e., binds), the second ensemble predicts its binding affinity in terms of the inhibition constant Ki.  相似文献   

18.
19.
20.
Reduced graphs provide summary representations of chemical structures. Here, a variety of different types of reduced graphs are compared in similarity searches. The reduced graphs are found to give comparable performance to Daylight fingerprints in terms of the number of active compounds retrieved. However, no one type of reduced graph is found to be consistently superior across a variety of different data sets. Consequently, a representative set of reduced graphs was chosen and used together with Daylight fingerprints in data fusion experiments. The results show improved performance in 10 out of 11 data sets compared to using Daylight fingerprints alone. Finally, the potential of using reduced graphs to build SAR models is demonstrated using recursive partitioning. An SAR model consistent with a published model is found following just two splits in the decision tree.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号