首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
Allostery is a process by which proteins transmit the effect of perturbation at one site to a distal functional site upon certain perturbation. As an intrinsically global effect of protein dynamics, it is difficult to associate protein allostery with individual residues, hindering effective selection of key residues for mutagenesis studies. The machine learning models including decision tree (DT) and artificial neural network (ANN) models were applied to develop classification model for a cell signaling allosteric protein with two states showing extremely similar tertiary structures in both crystallographic structures and molecular dynamics simulations. Both DT and ANN models were developed with 75% and 80% of predicting accuracy, respectively. Good agreement between machine learning models and previous experimental as well as computational studies of the same protein validates this approach as an alternative way to analyze protein dynamics simulations and allostery. In addition, the difference of distributions of key features in two allosteric states also underlies the population shift hypothesis of dynamics‐driven allostery model. © 2018 Wiley Periodicals, Inc.  相似文献   

2.
蛋白质是一切生命体的物质基础,是生命活动的主要承担者,参与各种生理功能的调节.设计具有特定功能的蛋白质在蛋白质工程、生物医药、材料科学等领域具有重要意义.蛋白质序列设计的目标是设计能够折叠成期望结构并具有相应功能的氨基酸序列,是所有理性蛋白质工程的核心问题,具有极其重要的研究和应用潜力.随着蛋白质序列数据的指数型增长和...  相似文献   

3.
Transition metal (TM) complexes exhibit diverse structural and electronic properties. The properties of a TM complex can be tuned by modulating the ligand field strength (LFS) inflicted by its ligands. Current quantification of the LFS of a ligand is mainly derived from experimental measurements on a subset of highly symmetrical TM complexes and is limited in ligand scope. Herein, we report a data-driven method to quantify the LFS of ligands assigned from experimental crystal structures of TM complexes. We first show that the experimental metal–ligand bond lengths of over 4,000 mononuclear Fe, Co, and Mn complexes form bimodal distributions. Using Gaussian fits on the bimodal distributions, each TM complex is assigned a spin state (SS) label. These SS labels can then be used to calculate the LFS of the ligands of the complexes. Using the obtained data-driven LFS metric, we establish that a semi-supervised deep generative model, junction tree variational autoencoder (JTVAE), can be employed to predict LFS values. Our model exhibits a mean absolute error (MAE) of 0.047 and root mean squared error of 0.072 on the training set. The model also allows the generation of novel ligands with desirable LFS values.  相似文献   

4.
Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users'' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization.

DeepLigBuilder, a novel deep generative model for structure-based de novo drug design, directly generates 3D structures of drug-like compounds in the target binding site.  相似文献   

5.
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.  相似文献   

6.
Deep learning methods for RNA secondary structure prediction have shown higher performance than traditional methods, but there is still much room to improve. It is known that the lengths of RNAs are very different, as are their secondary structures. However, the current deep learning methods all use length-independent models, so it is difficult for these models to learn very different secondary structures. Here, we propose a length-dependent model that is obtained by further training the length-independent model for different length ranges of RNAs through transfer learning. 2dRNA, a coupled deep learning neural network for RNA secondary structure prediction, is used to do this. Benchmarking shows that the length-dependent model performs better than the usual length-independent model.  相似文献   

7.
8.
Inverse design allows the generation of molecules with desirable physical quantities using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. First, we achieve non-trivial performance on typical benchmarks for generative models without any training. Additionally, we demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. Overall, we anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wider adoption.

Interpolation and exploration within the chemical space for inverse design.  相似文献   

9.
MotivationPrimary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence.ResultsThe performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.  相似文献   

10.
Recent studies suggest that protein folding should be revisited as the emergent property of a complex system and that the nature allows only a very limited number of folds that seem to be strongly influenced by geometrical properties. In this work we explore the principles underlying this new view and show how helical protein conformations can be obtained starting from simple geometric considerations. We generated a large data set of C-alpha traces made of 65 points, by computationally solving a backbone model that takes into account only topological features of the all-alpha proteins; then, we built corresponding tertiary structures, by using the sequences associated to the crystallographic structures of four small globular all-alpha proteins from PDB, and analysed them in terms of structural and energetic properties. In this way we obtained four poorly populated sets of structures that are reasonably similar to the conformational states typical of the experimental PDB structures. These results show that our computational approach can capture the native topology of all-alpha proteins; furthermore, it generates backbone folds without the influence of the side chains and uses the protein sequence to select a specific fold among the generated folds. This agrees with the recent view that the backbone plays an important role in the protein folding process and that the amino acid sequence chooses its own fold within a limited total number of folds.  相似文献   

11.
With the increasing application of deep-learning-based generative models for de novo molecule design, the quantitative estimation of molecular synthetic accessibility (SA) has become a crucial factor for prioritizing the structures generated from generative models. It is also useful for helping in the prioritization of hit/lead compounds and guiding retrosynthesis analysis. In this study, based on the USPTO and Pistachio reaction datasets, a chemical reaction network was constructed for the identification of the shortest reaction paths (SRP) needed to synthesize compounds, and different SRP cut-offs were then used as the threshold to distinguish a organic compound as either an easy-to-synthesize (ES) or hard-to-synthesize (HS) class. Two synthesis accessibility models (DNN-ECFP model and graph-based CMPNN model) were built using deep learning/machine learning algorithms. Compared to other existing synthesis accessibility scoring schemes, such as SYBA, SCScore, and SAScore, our results show that CMPNN (ROC AUC: 0.791) performs better than SYBA (ROC AUC: 0.76), albeit marginally, and outperforms SAScore and SCScore. Our prediction models based on historical reaction knowledge could be a potential tool for estimating molecule SA.  相似文献   

12.
Many RNA structures are composed of simple secondary structure elements linked by a few critical tertiary interactions. SHAPE chemistry has made interrogation of RNA dynamics at single-nucleotide resolution straightforward. However, de novo identification of nucleotides involved in tertiary interactions remains a challenge. Here we show that nucleotides that form noncanonical or tertiary contacts can be detected by comparing information obtained using two SHAPE reagents, N-methylisatoic anhydride (NMIA) and 1-methyl-6-nitroisatoic anhydride (1M6). Nucleotides that react preferentially with NMIA exhibit slow local nucleotide dynamics and usually adopt the less common C2'-endo ribose conformation. Experiments and first-principles calculations show that 1M6 reacts preferentially with nucleotides in which one face of the nucleobase allows an unhindered stacking interaction with the reagent. Differential SHAPE reactivities were used to detect noncanonical and tertiary interactions in four RNAs with diverse structures and to identify preformed noncanonical interactions in partially folded RNAs. Differential SHAPE reactivity analysis will enable experimentally concise, large-scale identification of tertiary structure elements and ligand binding sites in complex RNAs and in diverse biological environments.  相似文献   

13.
HNO can interact with numerous heme proteins, but atomic level structures are largely unknown. In this work, various structural models for the first stable HNO heme protein complex, MbHNO (Mb, myoglobin), were examined by quantum chemical calculations. This investigation led to the discovery of two novel structural models that can excellently reproduce numerous experimental spectroscopic properties. They are also the first atomic level structures that can account for the experimentally observed high stabilities. These two models involve two distal His conformations as reported previously for MbCNR and MbNO. However, a unique dual hydrogen bonding feature of the HNO binding was not reported before in heme protein complexes with other small molecules such as CO, NO, and O(2). These results shall facilitate investigations of HNO bindings in other heme proteins.  相似文献   

14.
Topology-based interaction potentials are simplified models that use the native contacts in the folded structure of a protein to define an energetically unfrustrated folding funnel. They have been widely used to analyze the folding transition and pathways of different proteins through computer simulations. Obviously, they need a reliable, experimentally determined folded structure to define the model interactions. In structures elucidated through NMR spectroscopy, a complex treatment of the raw experimental data usually provides a series of models, a set of different conformations compatible with the available experimental data. Here, we use an efficient coarse-grained simulation technique to independently consider the contact maps from every different NMR model in a protein whose structure has been resolved by the use of NMR spectroscopy. For lambda-Cro repressor, a homodimeric protein, we have analyzed its folding characteristics with a topology-based model. We have focused on the competition between the folding of the individual chains and their binding to form the final quaternary structure. From 20 different NMR models, we find a predominant three-state folding behavior, in agreement with experimental data on the folding pathway for this protein. Individual NMR models, however, show distinct characteristics, which are analyzed both at the level of the interplay between tertiary/quaternary structure formation and also regarding the thermal stability of the tertiary structure of every individual chain.  相似文献   

15.
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement.  相似文献   

16.
The goal of structure-based drug discovery is to find small molecules that bind to a given target protein. Deep learning has been used to generate drug-like molecules with certain cheminformatic properties, but has not yet been applied to generating 3D molecules predicted to bind to proteins by sampling the conditional distribution of protein–ligand binding interactions. In this work, we describe for the first time a deep learning system for generating 3D molecular structures conditioned on a receptor binding site. We approach the problem using a conditional variational autoencoder trained on an atomic density grid representation of cross-docked protein–ligand structures. We apply atom fitting and bond inference procedures to construct valid molecular conformations from generated atomic densities. We evaluate the properties of the generated molecules and demonstrate that they change significantly when conditioned on mutated receptors. We also explore the latent space learned by our generative model using sampling and interpolation techniques. This work opens the door for end-to-end prediction of stable bioactive molecules from protein structures with deep learning.

We generate 3D molecules conditioned on receptor binding sites by training a deep generative model on protein–ligand complexes. Our model uses the conditional receptor information to make chemically relevant changes to the generated molecules.  相似文献   

17.
Evaluation of ligand-binding affinity using the atomic coordinates of a protein-ligand complex is a challenge from the computational point of view. The availability of crystallographic structures of complexes with binding affinity data opens the possibility to create machine-learning models targeted to a specific protein system. Here, we describe a new methodology that combines a mass-spring system approach with supervised machine-learning techniques to predict the binding affinity of protein-ligand complexes. The combination of these techniques allows exploring the scoring function space, generating a model targeted to a protein system of interest. The new model shows superior predictive performance when compared with classical scoring functions implemented in the programs Molegro Virtual Docker, AutoDock4, and AutoDock Vina. We implemented this methodology in a new program named Taba. Taba is implemented in Python and available to download under the GNU license at https://github.com/azevedolab/taba . © 2019 Wiley Periodicals, Inc.  相似文献   

18.
Deep machine learning is expanding the conceptual framework and capacity of computational compound design, enabling new applications through generative modeling. We have explored the systematic design of covalent protein kinase inhibitors by learning from kinome-relevant chemical space, followed by focusing on an exemplary kinase of interest. Covalent inhibitors experience a renaissance in drug discovery, especially for targeting protein kinases. However, computational design of this class of inhibitors has thus far only been little investigated. To this end, we have devised a computational approach combining fragment-based design and deep generative modeling augmented by three-dimensional pharmacophore screening. This approach is thought to be particularly relevant for medicinal chemistry applications because it combines knowledge-based elements with deep learning and is chemically intuitive. As an exemplary application, we report for Bruton’s tyrosine kinase (BTK), a major drug target for the treatment of inflammatory diseases and leukemia, the generation of novel candidate inhibitors with a specific chemically reactive group for covalent modification, requiring only little target-specific compound information to guide the design efforts. Newly generated compounds include known inhibitors and characteristic substructures and many novel candidates, thus lending credence to the computational approach, which is readily applicable to other targets.  相似文献   

19.
As important functional structures, RNA pseudoknots provide excellent models for studying the interplay between secondary and tertiary structures and the roles of triplexes, noncanonical interactions, and coaxial stacking in the folding/unfolding process. Here we report a first atomistic and statistical analysis of the unfolding of the pseudoknot within gene 32 mRNA of bacteriophage T2. Multiple unfolding pathways, diverse transition states, and various intermediate structures were observed. Water molecules were found to be coupled with the unfolding process via the expulsion or concurrent mechanism.  相似文献   

20.
The adenosine monophosphate activated protein kinase (AMPK) is critical in the regulation of important cellular functions such as lipid, glucose, and protein metabolism; mitochondrial biogenesis and autophagy; and cellular growth. In many diseases—such as metabolic syndrome, obesity, diabetes, and also cancer—activation of AMPK is beneficial. Therefore, there is growing interest in AMPK activators that act either by direct action on the enzyme itself or by indirect activation of upstream regulators. Many natural compounds have been described that activate AMPK indirectly. These compounds are usually contained in mixtures with a variety of structurally different other compounds, which in turn can also alter the activity of AMPK via one or more pathways. For these compounds, experiments are complicated, since the required pure substances are often not yet isolated and/or therefore not sufficiently available. Therefore, our goal was to develop a screening tool that could handle the profound heterogeneity in activation pathways of the AMPK. Since machine learning algorithms can model complex (unknown) relationships and patterns, some of these methods (random forest, support vector machines, stochastic gradient boosting, logistic regression, and deep neural network) were applied and validated using a database, comprising of 904 activating and 799 neutral or inhibiting compounds identified by extensive PubMed literature search and PubChem Bioassay database. All models showed unexpectedly high classification accuracy in training, but more importantly in predicting the unseen test data. These models are therefore suitable tools for rapid in silico screening of established substances or multicomponent mixtures and can be used to identify compounds of interest for further testing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号