首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called Deep Supervised Graph Partitioning Model (DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Human-annotated Mappings (HAM), consisting of 1180 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce high-quality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms state-of-the-art methods in the field of graph segmentation. Finally, we find that predicted CG mapping operators indeed result in good CG MD models when used in simulation.

We propose a scalable graph neural network-based method for automating coarse-grained mapping prediction for molecules.  相似文献   

2.
This study unites six popular machine learning approaches to enhance the prediction of a molecular binding affinity between receptors (large protein molecules) and ligands (small organic molecules). Here we examine a scheme where affinity of ligands is predicted against a single receptor – human thrombin, thus, the models consider ligand features only. However, the suggested approach can be repurposed for other receptors. The methods include Support Vector Machine, Random Forest, CatBoost, feed-forward neural network, graph neural network, and Bidirectional Encoder Representations from Transformers. The first five methods use input features based on physico-chemical properties of molecules, while the last one is based on textual molecular representations. All approaches do not rely on atomic spatial coordinates, avoiding a potential bias from known structures, and are capable of generalizing for compounds with unknown conformations. Within each of the methods, we have trained two models that solve classification and regression tasks. Then, all models are grouped into a pipeline of two subsequent ensembles. The first ensemble aggregates six classification models which vote whether a ligand binds to a receptor or not. If a ligand is classified as active (i.e., binds), the second ensemble predicts its binding affinity in terms of the inhibition constant Ki.  相似文献   

3.
Inferring molecular structure from Nuclear Magnetic Resonance (NMR) measurements requires an accurate forward model that can predict chemical shifts from 3D structure. Current forward models are limited to specific molecules like proteins and state-of-the-art models are not differentiable. Thus they cannot be used with gradient methods like biased molecular dynamics. Here we use graph neural networks (GNNs) for NMR chemical shift prediction. Our GNN can model chemical shifts accurately and capture important phenomena like hydrogen bonding induced downfield shift between multiple proteins, secondary structure effects, and predict shifts of organic molecules. Previous empirical NMR models of protein NMR have relied on careful feature engineering with domain expertise. These GNNs are trained from data alone with no feature engineering yet are as accurate and can work on arbitrary molecular structures. The models are also efficient, able to compute one million chemical shifts in about 5 seconds. This work enables a new category of NMR models that have multiple interacting types of macromolecules.

This model can predict chemical shifts on proteins and small molecules purely from atom elements and coordinates. It can capture important phenomena like hydrogen bonding induced downfield shift, thus can be used to infer intermolecular interactions.  相似文献   

4.
5.
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.  相似文献   

6.
7.
Dimension reduction is a crucial technique in machine learning and data mining, which is widely used in areas of medicine, bioinformatics and genetics. In this paper, we propose a two-stage local dimension reduction approach for classification on microarray data. In first stage, a new L1-regularized feature selection method is defined to remove irrelevant and redundant features and to select the important features (biomarkers). In the next stage, PLS-based feature extraction is implemented on the selected features to extract synthesis features that best reflect discriminating characteristics for classification. The suitability of the proposal is demonstrated in an empirical study done with ten widely used microarray datasets, and the results show its effectiveness and competitiveness compared with four state-of-the-art methods. The experimental results on St Jude dataset shows that our method can be effectively applied to microarray data analysis for subtype prediction and the discovery of gene coexpression.  相似文献   

8.
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement.  相似文献   

9.
质谱成像技术能够在同一个实验里无需标记手段而获得样品表面的分子信息及其分布信息,是当前质谱分析的热点.其分析所得数据量大且复杂,使其特征难以提取.多元统计分析方法,特别是主成分分析法已应用于质谱成像数据的压缩和特征提取.然而由于主成分分析常产生负的数据结果,其意义难以解释且不易分解为单一的特征.本研究开发出一种基于非负分解的质谱成像数据提取方法,能够提取单一的分子特征及其在样品上的分布特征,并将多个单一的特征分布通过红、绿、蓝三色叠加显示,获得轮廓直观的综合特征分布.应用本方法对小鼠脑组织切片质谱成像数据进行分析,可直观分解出灰质区域、白质区域和背景区域,相对主成分分析方法更直观且易于解释.应用本方法对在同一个样品靶上的人膀胱癌变组织和其相邻非癌变组织切片质谱成像数据进行分析,癌变与非癌变组织间差异清晰直观.本研究设计的质谱成像软件可由http://www.msimaging.net获取.  相似文献   

10.
Ring information is a large part of the structural topology used to identify and characterize molecular structures. It is hence of crucial importance to obtain this information for a variety of tasks in computational chemistry. Many different approaches for "ring perception", i.e., the extraction of cycles from a molecular graph, have been described. The chemistry literature on this topic, however, reports a surprisingly large number of incorrect statements about the properties of chemically relevant ring sets and, in particular, about the mutual relationships of different sets of cycles in a graph. In part these problems seem to have arisen from a sometimes rather idiosyncratic terminology for notions that are fairly standard in graph theory. In this contribution we translate the definitions of concepts such as the Smallest Set of Smallest Rings, Essential Set of Essential Rings, Extended Set of Smallest Rings, Set of Smallest Cycles at Edges, Set of Elementary Rings, K-rings, and beta-rings into a more widely used mathematical language. We then outline the basic properties of different cycle sets and provide numerous counterexamples to incorrect claims in the published literature. These counterexamples may have a serious practical impact because at least some of them are molecular graphs of well-known molecules. As a consequence, we propose a catalog of desirable properties for chemically useful sets of rings.  相似文献   

11.
Similarity-based methods for virtual screening are widely used. However, conventional searching using 2D chemical fingerprints or 2D graphs may retrieve only compounds which are structurally very similar to the original target molecule. Of particular current interest then is scaffold hopping, that is, the ability to identify molecules that belong to different chemical series but which could form the same interactions with a receptor. Reduced graphs provide summary representations of chemical structures and, therefore, offer the potential to retrieve compounds that are similar in terms of their gross features rather than at the atom-bond level. Using only a fingerprint representation of such graphs, we have previously shown that actives retrieved were more diverse than those found using Daylight fingerprints. Maximum common substructures give an intuitively reasonable view of the similarity between two molecules. However, their calculation using graph-matching techniques is too time-consuming for use in practical similarity searching in larger data sets. In this work, we exploit the low cardinality of the reduced graph in graph-based similarity searching. We reinterpret the reduced graph as a fully connected graph using the bond-distance information of the original graph. We describe searches, using both the maximum common induced subgraph and maximum common edge subgraph formulations, on the fully connected reduced graphs and compare the results with those obtained using both conventional chemical and reduced graph fingerprints. We show that graph matching using fully connected reduced graphs is an effective retrieval method and that the actives retrieved are likely to be topologically different from those retrieved using conventional 2D methods.  相似文献   

12.
13.
14.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

15.
Information extraction in medical field is an important method to structure medical knowledge and discover new knowledge. Traditional methods handle this task in a pipelined manner regarding the entity recognition and relation extraction as two sub-tasks, which, however, neglects the relevance between the two of them. In recent years, the research on the joint extraction model has achieved encouraging results in the general field, yet scholarship focusing on the joint extraction model applied to medical field is insufficient. In this paper, we construct a joint extraction model based on tagging scheme for Chinese medical texts. Firstly, we design a series of pretreatment procedures for Chinese medical data to obtain effective Chinese word sequence. Then, we propose the BIOH12D1D2 tagging scheme to convert the joint extraction task into a tagging problem and to solve the overlapping entity problem. After that, we use the encoder-decoder model to obtain the tag prediction sequence. And in decoding layer, the Bert pre-training model is adopted to extract token features to enhance the feature representation ability of our model. Lastly, the joint extraction model gains a F1 value by 0.7 on CHIP-2020, which increases by 0.364 compared with the baseline.  相似文献   

16.
While the concept of the graph center is unambiguous (and quite old) in the case of acyclic graphs, an attempt has been made recently to extend the concept to polycyclic structures using the distance matrix of a graph as the basis. In this work we continue exploring such generalizations considering in addition to the distance matrix, self-avoiding walks or paths as graph invariants of potential interest for discriminating distinctive vertex environments in a graph of polycyclic structures. A hierachy of criteria is suggested that offers a systematic approach to the vertex discrimination and eventually establishes in most cases the graph center as a single vertex, a single bond (edge), or a single group of equivalent vertices. Some applications and the significance of the concept of the graph center are presented.  相似文献   

17.
18.
A broad collection of technologies, including e.g. drug metabolism, biofuel combustion, photochemical decontamination of water, and interfacial passivation in energy production/storage systems rely on chemical processes that involve bond-breaking molecular reactions. In this context, a fundamental thermodynamic property of interest is the bond dissociation energy (BDE) which measures the strength of a chemical bond. Fast and accurate prediction of BDEs for arbitrary molecules would lay the groundwork for data-driven projections of complex reaction cascades and hence a deeper understanding of these critical chemical processes and, ultimately, how to reverse design them. In this paper, we propose a chemically inspired graph neural network machine learning model, BonDNet, for the rapid and accurate prediction of BDEs. BonDNet maps the difference between the molecular representations of the reactants and products to the reaction BDE. Because of the use of this difference representation and the introduction of global features, including molecular charge, it is the first machine learning model capable of predicting both homolytic and heterolytic BDEs for molecules of any charge. To test the model, we have constructed a dataset of both homolytic and heterolytic BDEs for neutral and charged (−1 and +1) molecules. BonDNet achieves a mean absolute error (MAE) of 0.022 eV for unseen test data, significantly below chemical accuracy (0.043 eV). Besides the ability to handle complex bond dissociation reactions that no previous model could consider, BonDNet distinguishes itself even in only predicting homolytic BDEs for neutral molecules; it achieves an MAE of 0.020 eV on the PubChem BDE dataset, a 20% improvement over the previous best performing model. We gain additional insight into the model''s predictions by analyzing the patterns in the features representing the molecules and the bond dissociation reactions, which are qualitatively consistent with chemical rules and intuition. BonDNet is just one application of our general approach to representing and learning chemical reactivity, and it could be easily extended to the prediction of other reaction properties in the future.

Prediction of bond dissociation energies for charged molecules with a graph neural network enabled by global molecular features and reaction difference features between products and reactants.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号