首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Allostery is a process by which proteins transmit the effect of perturbation at one site to a distal functional site upon certain perturbation. As an intrinsically global effect of protein dynamics, it is difficult to associate protein allostery with individual residues, hindering effective selection of key residues for mutagenesis studies. The machine learning models including decision tree (DT) and artificial neural network (ANN) models were applied to develop classification model for a cell signaling allosteric protein with two states showing extremely similar tertiary structures in both crystallographic structures and molecular dynamics simulations. Both DT and ANN models were developed with 75% and 80% of predicting accuracy, respectively. Good agreement between machine learning models and previous experimental as well as computational studies of the same protein validates this approach as an alternative way to analyze protein dynamics simulations and allostery. In addition, the difference of distributions of key features in two allosteric states also underlies the population shift hypothesis of dynamics‐driven allostery model. © 2018 Wiley Periodicals, Inc.  相似文献   

2.
蛋白质是一切生命体的物质基础,是生命活动的主要承担者,参与各种生理功能的调节.设计具有特定功能的蛋白质在蛋白质工程、生物医药、材料科学等领域具有重要意义.蛋白质序列设计的目标是设计能够折叠成期望结构并具有相应功能的氨基酸序列,是所有理性蛋白质工程的核心问题,具有极其重要的研究和应用潜力.随着蛋白质序列数据的指数型增长和...  相似文献   

3.
Transition metal (TM) complexes exhibit diverse structural and electronic properties. The properties of a TM complex can be tuned by modulating the ligand field strength (LFS) inflicted by its ligands. Current quantification of the LFS of a ligand is mainly derived from experimental measurements on a subset of highly symmetrical TM complexes and is limited in ligand scope. Herein, we report a data-driven method to quantify the LFS of ligands assigned from experimental crystal structures of TM complexes. We first show that the experimental metal–ligand bond lengths of over 4,000 mononuclear Fe, Co, and Mn complexes form bimodal distributions. Using Gaussian fits on the bimodal distributions, each TM complex is assigned a spin state (SS) label. These SS labels can then be used to calculate the LFS of the ligands of the complexes. Using the obtained data-driven LFS metric, we establish that a semi-supervised deep generative model, junction tree variational autoencoder (JTVAE), can be employed to predict LFS values. Our model exhibits a mean absolute error (MAE) of 0.047 and root mean squared error of 0.072 on the training set. The model also allows the generation of novel ligands with desirable LFS values.  相似文献   

4.
Deep generative models are attracting much attention in the field of de novo molecule design. Compared to traditional methods, deep generative models can be trained in a fully data-driven way with little requirement for expert knowledge. Although many models have been developed to generate 1D and 2D molecular structures, 3D molecule generation is less explored, and the direct design of drug-like molecules inside target binding sites remains challenging. In this work, we introduce DeepLigBuilder, a novel deep learning-based method for de novo drug design that generates 3D molecular structures in the binding sites of target proteins. We first developed Ligand Neural Network (L-Net), a novel graph generative model for the end-to-end design of chemically and conformationally valid 3D molecules with high drug-likeness. Then, we combined L-Net with Monte Carlo tree search to perform structure-based de novo drug design tasks. In the case study of inhibitor design for the main protease of SARS-CoV-2, DeepLigBuilder suggested a list of drug-like compounds with novel chemical structures, high predicted affinity, and similar binding features to those of known inhibitors. The current version of L-Net was trained on drug-like compounds from ChEMBL, which could be easily extended to other molecular datasets with desired properties based on users'' demands and applied in functional molecule generation. Merging deep generative models with atomic-level interaction evaluation, DeepLigBuilder provides a state-of-the-art model for structure-based de novo drug design and lead optimization.

DeepLigBuilder, a novel deep generative model for structure-based de novo drug design, directly generates 3D structures of drug-like compounds in the target binding site.  相似文献   

5.
RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited.  相似文献   

6.
7.
Inverse design allows the generation of molecules with desirable physical quantities using property optimization. Deep generative models have recently been applied to tackle inverse design, as they possess the ability to optimize molecular properties directly through structure modification using gradients. While the ability to carry out direct property optimizations is promising, the use of generative deep learning models to solve practical problems requires large amounts of data and is very time-consuming. In this work, we propose STONED – a simple and efficient algorithm to perform interpolation and exploration in the chemical space, comparable to deep generative models. STONED bypasses the need for large amounts of data and training times by using string modifications in the SELFIES molecular representation. First, we achieve non-trivial performance on typical benchmarks for generative models without any training. Additionally, we demonstrate applications in high-throughput virtual screening for the design of drugs, photovoltaics, and the construction of chemical paths, allowing for both property and structure-based interpolation in the chemical space. Overall, we anticipate our results to be a stepping stone for developing more sophisticated inverse design models and benchmarking tools, ultimately helping generative models achieve wider adoption.

Interpolation and exploration within the chemical space for inverse design.  相似文献   

8.
MotivationPrimary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence.ResultsThe performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.  相似文献   

9.
Many RNA structures are composed of simple secondary structure elements linked by a few critical tertiary interactions. SHAPE chemistry has made interrogation of RNA dynamics at single-nucleotide resolution straightforward. However, de novo identification of nucleotides involved in tertiary interactions remains a challenge. Here we show that nucleotides that form noncanonical or tertiary contacts can be detected by comparing information obtained using two SHAPE reagents, N-methylisatoic anhydride (NMIA) and 1-methyl-6-nitroisatoic anhydride (1M6). Nucleotides that react preferentially with NMIA exhibit slow local nucleotide dynamics and usually adopt the less common C2'-endo ribose conformation. Experiments and first-principles calculations show that 1M6 reacts preferentially with nucleotides in which one face of the nucleobase allows an unhindered stacking interaction with the reagent. Differential SHAPE reactivities were used to detect noncanonical and tertiary interactions in four RNAs with diverse structures and to identify preformed noncanonical interactions in partially folded RNAs. Differential SHAPE reactivity analysis will enable experimentally concise, large-scale identification of tertiary structure elements and ligand binding sites in complex RNAs and in diverse biological environments.  相似文献   

10.
Recent studies suggest that protein folding should be revisited as the emergent property of a complex system and that the nature allows only a very limited number of folds that seem to be strongly influenced by geometrical properties. In this work we explore the principles underlying this new view and show how helical protein conformations can be obtained starting from simple geometric considerations. We generated a large data set of C-alpha traces made of 65 points, by computationally solving a backbone model that takes into account only topological features of the all-alpha proteins; then, we built corresponding tertiary structures, by using the sequences associated to the crystallographic structures of four small globular all-alpha proteins from PDB, and analysed them in terms of structural and energetic properties. In this way we obtained four poorly populated sets of structures that are reasonably similar to the conformational states typical of the experimental PDB structures. These results show that our computational approach can capture the native topology of all-alpha proteins; furthermore, it generates backbone folds without the influence of the side chains and uses the protein sequence to select a specific fold among the generated folds. This agrees with the recent view that the backbone plays an important role in the protein folding process and that the amino acid sequence chooses its own fold within a limited total number of folds.  相似文献   

11.
HNO can interact with numerous heme proteins, but atomic level structures are largely unknown. In this work, various structural models for the first stable HNO heme protein complex, MbHNO (Mb, myoglobin), were examined by quantum chemical calculations. This investigation led to the discovery of two novel structural models that can excellently reproduce numerous experimental spectroscopic properties. They are also the first atomic level structures that can account for the experimentally observed high stabilities. These two models involve two distal His conformations as reported previously for MbCNR and MbNO. However, a unique dual hydrogen bonding feature of the HNO binding was not reported before in heme protein complexes with other small molecules such as CO, NO, and O(2). These results shall facilitate investigations of HNO bindings in other heme proteins.  相似文献   

12.
Topology-based interaction potentials are simplified models that use the native contacts in the folded structure of a protein to define an energetically unfrustrated folding funnel. They have been widely used to analyze the folding transition and pathways of different proteins through computer simulations. Obviously, they need a reliable, experimentally determined folded structure to define the model interactions. In structures elucidated through NMR spectroscopy, a complex treatment of the raw experimental data usually provides a series of models, a set of different conformations compatible with the available experimental data. Here, we use an efficient coarse-grained simulation technique to independently consider the contact maps from every different NMR model in a protein whose structure has been resolved by the use of NMR spectroscopy. For lambda-Cro repressor, a homodimeric protein, we have analyzed its folding characteristics with a topology-based model. We have focused on the competition between the folding of the individual chains and their binding to form the final quaternary structure. From 20 different NMR models, we find a predominant three-state folding behavior, in agreement with experimental data on the folding pathway for this protein. Individual NMR models, however, show distinct characteristics, which are analyzed both at the level of the interplay between tertiary/quaternary structure formation and also regarding the thermal stability of the tertiary structure of every individual chain.  相似文献   

13.
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement.  相似文献   

14.
Evaluation of ligand-binding affinity using the atomic coordinates of a protein-ligand complex is a challenge from the computational point of view. The availability of crystallographic structures of complexes with binding affinity data opens the possibility to create machine-learning models targeted to a specific protein system. Here, we describe a new methodology that combines a mass-spring system approach with supervised machine-learning techniques to predict the binding affinity of protein-ligand complexes. The combination of these techniques allows exploring the scoring function space, generating a model targeted to a protein system of interest. The new model shows superior predictive performance when compared with classical scoring functions implemented in the programs Molegro Virtual Docker, AutoDock4, and AutoDock Vina. We implemented this methodology in a new program named Taba. Taba is implemented in Python and available to download under the GNU license at https://github.com/azevedolab/taba . © 2019 Wiley Periodicals, Inc.  相似文献   

15.
The adenosine monophosphate activated protein kinase (AMPK) is critical in the regulation of important cellular functions such as lipid, glucose, and protein metabolism; mitochondrial biogenesis and autophagy; and cellular growth. In many diseases—such as metabolic syndrome, obesity, diabetes, and also cancer—activation of AMPK is beneficial. Therefore, there is growing interest in AMPK activators that act either by direct action on the enzyme itself or by indirect activation of upstream regulators. Many natural compounds have been described that activate AMPK indirectly. These compounds are usually contained in mixtures with a variety of structurally different other compounds, which in turn can also alter the activity of AMPK via one or more pathways. For these compounds, experiments are complicated, since the required pure substances are often not yet isolated and/or therefore not sufficiently available. Therefore, our goal was to develop a screening tool that could handle the profound heterogeneity in activation pathways of the AMPK. Since machine learning algorithms can model complex (unknown) relationships and patterns, some of these methods (random forest, support vector machines, stochastic gradient boosting, logistic regression, and deep neural network) were applied and validated using a database, comprising of 904 activating and 799 neutral or inhibiting compounds identified by extensive PubMed literature search and PubChem Bioassay database. All models showed unexpectedly high classification accuracy in training, but more importantly in predicting the unseen test data. These models are therefore suitable tools for rapid in silico screening of established substances or multicomponent mixtures and can be used to identify compounds of interest for further testing.  相似文献   

16.
As important functional structures, RNA pseudoknots provide excellent models for studying the interplay between secondary and tertiary structures and the roles of triplexes, noncanonical interactions, and coaxial stacking in the folding/unfolding process. Here we report a first atomistic and statistical analysis of the unfolding of the pseudoknot within gene 32 mRNA of bacteriophage T2. Multiple unfolding pathways, diverse transition states, and various intermediate structures were observed. Water molecules were found to be coupled with the unfolding process via the expulsion or concurrent mechanism.  相似文献   

17.
The validity and accuracy of a proposed tertiary structure of a protein can be assessed in several ways. Scoring such a structure by a knowledge‐based potential is a well‐known approach in molecular biophysics, an important task in structure prediction and refinement, and a key step in several experiments on protein structures. Although several parameterizations for such models have been derived over the course of time, improvements in accuracy by explicitly using continuous distance information have not been suggested yet. We close this methodological gap by formulating the parameterization of a protein structure model as a linear program. Optimization of the parameters was performed using amino acid distances calculated for the residues in topology rich 2830 protein structures. We show the capability of our derived model to discriminate between native structures and decoys for a diverse set of proteins. In addition, we discuss the effect of reduced amino acid alphabets on the model. In contrast to studies focusing on binary contact schemes (without considering distance dependencies and proposing five symbols as optimal alphabet size), we find an accurate protein alphabet size to contain at least five symbols, preferably more, to assure a satisfactory fold recognition capability. © 2012 Wiley Periodicals, Inc.  相似文献   

18.
Genetic algorithms constitute a powerful optimization method that has already been used in the study of the protein folding problem. However, they often suffer from a lack of convergence in a reasonably short time for complex fitness functions. Here, we propose an evolutionary strategy that can reproducibly find structures close to the minimum of a potential function for a simplified protein model in an efficient way. The model reduces the number of degrees of freedom of the system by treating the protein structure as composed of rigid fragments. The search incorporates a double encoding procedure and a merging operation from subpopulations that evolve independently of one another, both contributing to the good performance of the full algorithm. We have tested it with protein structures of different degrees of complexity, and present our conclusions related to its possible application as an efficient tool for the analysis of folding potentials.  相似文献   

19.
Mechref Y 《Electrophoresis》2011,32(24):3467-3481
The high structural variation of glycan derived from glycoconjugates, which substantially increases with the molecular size of a protein, contributes to the complexity of glycosylation patterns commonly associated with glycoconjugates. In the case of glycoproteins, such variation originates from the multiple glycosylation sites of proteins and the number of glycan structures associated with each site (microheterogeneity). The ability to comprehensively characterize highly complex mixture of glycans has been analytically stimulating and challenging. Although the most powerful MS and MS/MS techniques are capable of providing a wealth of structural information, they are still not able to readily identify isomeric glycan structures without high-order MS/MS (MS(n) ). The analysis of isomeric glycan structures has been attained using several separation methods, including high-pH anion-exchange chromatography, hydrophilic interaction chromatography and GC. However, CE and microfluidics CE (MCE) offer high separation efficiency and resolutions, allowing the separation of closely related glycan structures. Therefore, interfacing CE and MCE to MS is a powerful analytical approach, allowing potentially comprehensive and sensitive analysis of complex glycan samples. This review describes and discusses the utility of different CE and MCE approaches in the structural characterization of glycoproteins and the feasibility of interfacing these approaches to MS.  相似文献   

20.
In several years, deep learning is a modern machine learning technique using in a variety of fields with state‐of‐the‐art performance. Therefore, utilization of deep learning to enhance performance is also an important solution for current bioinformatics field. In this study, we try to use deep learning via convolutional neural networks and position specific scoring matrices to identify electron transport proteins, which is an important molecular function in transmembrane proteins. Our deep learning method can approach a precise model for identifying of electron transport proteins with achieved sensitivity of 80.3%, specificity of 94.4%, and accuracy of 92.3%, with MCC of 0.71 for independent dataset. The proposed technique can serve as a powerful tool for identifying electron transport proteins and can help biologists understand the function of the electron transport proteins. Moreover, this study provides a basis for further research that can enrich a field of applying deep learning in bioinformatics. © 2017 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号