首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 469 毫秒
1.
The structural class is an important feature widely used to characterize the overall folding type of a protein. How to improve the prediction quality for protein structural classification by effectively incorporating the sequence-order effects is an important and challenging problem. Based on the concept of the pseudo amino acid composition [Chou, K. C. Proteins Struct Funct Genet 2001, 43, 246; Erratum: Proteins Struct Funct Genet 2001, 44, 60], a novel approach for measuring the complexity of a protein sequence was introduced. The advantage by incorporating the complexity measure factor into the pseudo amino acid composition as one of its components is that it can catch the essence of the overall sequence pattern of a protein and hence more effectively reflect its sequence-order effects. It was demonstrated thru the jackknife crossvalidation test that the overall success rate by the new approach was significantly higher than those by the others. It has not escaped our notice that the introduction of the complexity measure factor can also be used to improve the prediction quality for, among many other protein attributes, subcellular localization, enzyme family class, membrane protein type, and G-protein couple receptor type.  相似文献   

2.
G-protein coupled receptors (GPCRs) play a key role in different biological processes, such as regulation of growth, death and metabolism of cells. They are major therapeutic targets of numerous prescribed drugs. However, the ligand specificity of many receptors is unknown and there is little structural information available. Bioinformatics may offer one approach to bridge the gap between sequence data and functional knowledge of a receptor. In this paper, we use a bagging classification tree algorithm to predict the type of the receptor based on its amino acid composition. The prediction is performed for GPCR at the sub-family and sub-sub-family level. In a cross-validation test, we achieved an overall predictive accuracy of 91.1% for GPCR sub-family classification, and 82.4% for sub-sub-family classification. These results demonstrate the applicability of this relative simple method and its potential for improving prediction accuracy.  相似文献   

3.
Intrinsically disordered proteins (IDPs) are involved in diverse cellular functions. Many IDPs can interact with multiple binding partners, resulting in their folding into alternative ligand‐specific functional structures. For such multi‐structural IDPs, a key question is whether these multiple structures are fully encoded in the protein sequence, as is the case in many globular proteins. To answer this question, here we employed a combination of single‐molecule and ensemble techniques to compare ligand‐induced and osmolyte‐forced folding of α‐synuclein. Our results reveal context‐dependent modulation of the protein′s folding landscape, suggesting that the codes for the protein′s native folds are partially encoded in its primary sequence, and are completed only upon interaction with binding partners. Our findings suggest a critical role for cellular interactions in expanding the repertoire of folds and functions available to disordered proteins.  相似文献   

4.
Structural DNA profiles use the structural properties of the constituent octamers either to observe any characteristics of a single sequence that are unusual (a single sequence query) or to visualize a pattern common to a set of sequences (a multiple sequence query). They are an aid in understanding structural reasons for functional DNA activity. Profiles that answer single sequence queries are introduced and Profile Manager (a software application developed to automate profile generation) is presented. Two sequences that are similar by their nucleotide composition but are known to be very different by structure are analyzed, resulting in useful illustrations that agree with the experimental nuclear magnetic resonance structures.  相似文献   

5.
RNA-binding proteins (RBPs) perform fundamental and diverse functions within the cell. Approximately 15% of proteins sequences are annotated as RNA-binding, but with a significant number of proteins without functional annotation, many RBPs are yet to be identified. A percentage of uncharacterised proteins can be annotated by transferring functional information from proteins sharing significant sequence homology. However, genomes contain a significant number of orphan open reading frames (ORFs) that do not share significant sequence similarity to other ORFs, but correspond to functional proteins. Hence methods for protein function annotation that go beyond sequence homology are essential. One method of annotation is the identification of ligands that bind to proteins, through the characterisation of binding site residues. In the current work RNA-binding residues (RBRs) are characterised in terms of their evolutionary conservation and the patterns they form in sequence space. The potential for such characteristics to be used to identify RBPs from sequence is then evaluated.In the current work the conservation of residues in 261 RBPs is compared for (a) RBRs vs. non-RBRs surface residues, and for (b) specific and non-specific RBRs. The analysis shows that RBRs are more conserved than other surface residues, and RBRs hydrogen-bonded to the RNA backbone are more conserved than those making hydrogen bonds to RNA bases. This observed conservation of RBRs was then used to inform the construction of RBR sequence patterns from known protein–RNA structures. A series of RBR patterns were generated for a case study protein aspartyl-tRNA synthetase bound to tRNA; and used to differentiate between RNA-binding and non-RNA-binding protein sequences. Six sequence patterns performed with high precision values of >80% and recall values 7 times that of an homology search. When the method was expanded to the complete dataset of 261 proteins, many patterns were of poor predictive value, as they had not been manipulated on a family-specific basis. However, two patterns with precision values ≥85% were used to make function predictions for a set of hypothetical proteins. This revealed a number of potential RBPs that require experimental verification.  相似文献   

6.
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co-evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co-evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co-evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.  相似文献   

7.
Protein structure prediction and design often involve discrete modeling of side‐chain conformations on structural templates. Introducing backbone flexibility into such models has proven important in many different applications. Backbone flexibility improves model accuracy and provides access to larger sequence spaces in computational design, although at a cost in complexity and time. Here, we show that the influence of backbone flexibility on protein conformational energetics can be treated implicitly, at the level of sequence, using the technique of cluster expansion. Cluster expansion provides a way to convert structure‐based energies into functions of sequence alone. It leads to dramatic speed‐ups in energy evaluation and provides a convenient functional form for the analysis and optimization of sequence‐structure relationships. We show that it can be applied effectively to flexible‐backbone structural models using four proteins: α‐helical coiled‐coil dimers and trimers, zinc fingers, and Bcl‐xL/peptide complexes. For each of these, low errors for the sequence‐based models when compared with structure‐based evaluations show that this new way of treating backbone flexibility has considerable promise, particularly for protein design. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

8.
Protein modeling tools utilize many kinds of structural information that may be predicted from amino acid sequence of a target protein or obtained from experiments. Such data provide geometrical constraints in a modeling process. The main aim is to generate the best possible consensus structure. The quality of models strictly depends on the imposed conditions. In this work we present an algorithm, which predicts short-range distances between Cα atoms as well as a set of short structural fragments that possibly share structural similarity with a query sequence. The only input of the method is a query sequence profile. The algorithm searches for short protein fragments with high sequence similarity. As a result a statistics of distances observed in the similar fragments is returned. The method can be used also as a scoring function or a short-range knowledge-based potential based on the computed statistics.  相似文献   

9.
Electron transfer dissociation (ETD)-based top-down mass spectrometry (MS) is the method of choice for in-depth structure characterization of large peptides, small- and medium-sized proteins, and non-covalent protein complexes. Here, we describe the performance of this approach for structural analysis of intact proteins as large as the 80 kDa serotransferrin. Current time-of-flight (TOF) MS technologies ensure adequate resolution and mass accuracy to simultaneously analyze intact 30–80 kDa protein ions and the complex mixture of their ETD product ions. Here, we show that ETD TOF MS is efficient and may provide extensive sequence information for unfolded and highly charged (around 1 charge/kDa) proteins of ~30 kDa and structural motifs embedded in larger proteins. Sequence regions protected by disulfide bonds within intact non-reduced proteins oftentimes remain uncharacterized due to the low efficiency of their fragmentation by ETD. For serotransferrin, reduction of S–S bonds leads to significantly varied ETD fragmentation pattern with higher sequence coverage of N- and C-terminal regions, providing a complementary structural information to top-down analysis of its oxidized form.
Figure
ETD TOF MS provides extensive sequence information for unfolded and highly charged proteins of ~30 kDa and above. In addition to charge number and distribution along the protein, disulfide bonds direct ETD fragmentation. For intact non-reduced 80 kDa serotransferrin, sequence regions protected by disulfide bonds oftentimes remain uncharacterized. Reduction of disulfide bonds of serotransferrin increases ETD sequence coverage of its N- and C-terminal regions, providing a complementary structural information to the top-down analysis of its oxidized form  相似文献   

10.
It is known that in the three-dimensional structure of a protein, certain amino acids can interact with each other in order to provide structural integrity or aid in its catalytic function. If these positions are mutated the loss of this interaction usually leads to a non-functional protein. Directed evolution experiments, which probe the sequence space of a protein through mutations in search for an improved variant, frequently result in such inactive sequences. In this work, we address the use of machine learning algorithms, Boolean learning and support vector machines (SVMs), to find such pairs of amino acid positions. The recombination method of imparting mutations was simulated to create in silico sequences that were used as training data for the algorithms. The two algorithms were combined together to develop an approach that weighs the structural risk as well as the empirical risk to solve the problem. This strategy was adapted to a multi-round framework of experiments where the data generated in the present round is used to design experiments for the next round to improve the generated library, as well as the estimation of the interacting positions. It is observed that this strategy can greatly improve the number of functional variants that are generated as well as the average number of mutations that can be made in the library.  相似文献   

11.
Mini-proteins, polypeptides containing less than 100 amino acids, such as (animal toxins, protease inhibitors, knottins, zinc fingers, etc.) represent successful structural solutions to the need to express a specific binding activity in different biological contexts. Artificial mini-proteins have also been designed de novo, representing simplified versions of natural folds and containing natural or artificial connectivities. Both systems have been used as structural scaffolds in the engineering of novel binding activities, according to three main approaches: i) incorporation of functional protein epitopes into structurally compatible regions of mini-protein scaffolds; ii) random mutagenesis and functional selection of particular structural regions of mini-protein scaffolds; iii) minimization of protein domains by the use of sequence randomization and functional selection, combined with structural information, in an iterative process. These newly engineered mini-proteins, with specific and high binding affinities within a small size and well-defined three-dimensional structure, represent novel tools in biology, biotechnology and medical sciences. In addition, some of them can also be directly used in therapy or present high potential to serve as drugs. In all cases, they represent precious structural intermediates useful to identify frameworks for peptidomimetic design or directly lead to new small organic structures, representing novel drug candidates. The engineering of novel functional mini-proteins has the potential to become a fundamental step towards the conversion of a protein functional epitope or a flexible peptide lead into a classical pharmaceutical.  相似文献   

12.
The relay stations play a significant role in long-range charge hopping transfer in proteins. Although studies have clarified that many more protein structural motifs can function as relays in charge hopping transfers by acting as intermediate charge carriers, the relaying properties are still poorly understood. In this work, taking a β-turn oligopeptide as an example, we report a dynamic character of a relay with tunable relaying properties using the density functional theory calculations. Our main finding is that a β-turn peptide can serve as an effective electron relay in facilitating long-range electron migration and its relay properties is vibration-tunable. The vibration-induced structural transient distortions remarkably affect the lowest occupied molecular orbital (LUMO) energy, vertical electron affinity and electron-binding mode of the β-turn oligopeptide and the singly occupied molecular orbital (SOMO) energy of the corresponding electron adduct and thus the relaying properties. Different vibration modes lead to different structural distortions and thus have different effects on the relaying properties and ability of the β-turn peptide. For the relaying properties, there approximately is a linear negative correlation of electron affinity with the LUMO energy of the β-turn or the SOMO energy of its electron adduct. Besides, such relaying properties also vary in the vibration evolution process, and the electron-binding modes may be tunable. As an important addition to the known static charge relaying properties occurring in various protein structural motifs, this work reports the dynamic electron-relaying characteristics of a β-turn oligopeptide with variable relaying properties governed by molecular vibrations which can be applied to different proteins in mediating long-range charge transfers. Clearly, this work reveals molecular vibration effects on the electron relaying properties of protein structural motifs and provides new insights into the dynamics of long-range charge transfers in proteins. © 2018 Wiley Periodicals, Inc.  相似文献   

13.
Protein deposits are associated with many devastating diseases and fluorescent ligands able to visualize these pathological entities are essential. Here, we report the synthesis of thiophene-based donor–acceptor–donor heptameric ligands that can be utilized for spectral assignment of distinct amyloid-β (Aβ) aggregates, one of the pathological hallmarks in Alzheimer's disease. The ability of the ligands to selectively distinguish Aβ deposits was abolished when the chemical composition of the ligands was altered. Our findings provide the structural and functional basis for the development of new fluorescent ligands that can distinguish between aggregated proteinaceous species consisting of the same peptide or protein. In addition, such ligands might aid in interpreting the potential role of polymorphic Aβ deposits in the pathogenesis of Alzheimer's disease.  相似文献   

14.
Protein chains are generally long and consist of multiple domains. Domains are distinct structural units of a protein that can evolve and function independently. The accurate and reliable prediction of protein domain linkers and boundaries is often considered to be the initial step of protein tertiary structure and function predictions. In this paper, we introduce CISA as a method for predicting inter-domain linker regions solely from the amino acid sequence information. The method first computes the amino acid compositional index from the protein sequence dataset of domain-linker segments and the amino acid composition. A preference profile is then generated by calculating the average compositional index values along the amino acid sequence using a sliding window. Finally, the protein sequence is segmented into intervals and a simulated annealing algorithm is employed to enhance the prediction by finding the optimal threshold value for each segment that separates domains from inter-domain linkers. The method was tested on two standard protein datasets and showed considerable improvement over the state-of-the-art domain linker prediction methods.  相似文献   

15.
Given an uncharacterized protein sequence, how can we identify whether it is a G‐protein–coupled receptor (GPCR) or not? If it is, which functional family class does it belong to? It is important to address these questions because GPCRs are among the most frequent targets of therapeutic drugs and the information thus obtained is very useful for “comparative and evolutionary pharmacology,” a technique often used for drug development. Here, we present a web‐server predictor called “GPCR‐CA,” where “CA” stands for “Cellular Automaton” (Wolfram, S. Nature 1984, 311, 419), meaning that the CA images have been utilized to reveal the pattern features hidden in piles of long and complicated protein sequences. Meanwhile, the gray‐level co‐occurrence matrix factors extracted from the CA images are used to represent the samples of proteins through their pseudo amino acid composition (Chou, K.C. Proteins 2001, 43, 246). GPCR‐CA is a two‐layer predictor: the first layer prediction engine is for identifying a query protein as GPCR on non‐GPCR; if it is a GPCR protein, the process will be automatically continued with the second‐layer prediction engine to further identify its type among the following six functional classes: (a) rhodopsin‐like, (b) secretin‐like, (c) metabotrophic/glutamate/pheromone; (d) fungal pheromone, (e) cAMP receptor, and (f) frizzled/smoothened family. The overall success rates by the predictor for the first and second layers are over 91% and 83%, respectively, that were obtained through rigorous jackknife cross‐validation tests on a new‐constructed stringent benchmark dataset in which none of proteins has ≥40% pairwise sequence identity to any other in a same subset. GPCR‐CA is freely accessible at http://218.65.61.89:8080/bioinfo/GPCR‐CA , by which one can get the desired two‐layer results for a query protein sequence within about 20 seconds. © 2008 Wiley Periodicals, Inc. J Comput Chem 2009  相似文献   

16.
It is an experimental fact that gross topological parameters of the native structure of small proteins presenting two-state kinetics, as relative contact order chi, correlate with the logarithm of their respective folding rate constant kappa(f). However, reported results show specific cases for which the (chi,log kappa(f)) dependence does not follow the overall trend of the entire collection of experimental data. Therefore, an interesting point to be clarified is to what extent the native topology alone can explain these exceptional data. In this work, the structural determinants of the folding kinetics are investigated by means of a 27-mer lattice model, in that each native is represented by a compact self-avoiding (CSA) configuration. The hydrophobic effect and steric constraints are taken as basic ingredients of the folding mechanism, and each CSA configuration is characterized according to its composition of specific patterns (resembling basic structural elements such as loops, sheets, and helices). Our results suggest that (i) folding rate constants are largely influenced by topological details of the native structure, as configurational pattern types and their combinations, and (ii) global parameters, as the relative contact order, may not be effective to detect them. Distinct pattern types and their combinations are determinants of what we call here the "content of secondary-type" structure (sigma) of the native: high sigma implies a large kappa(f). The largest part of all CSA configurations presents a mix of distinct structural patterns, which determine the chixlog kappa(f) linear dependence: Those structures not presenting a proper chi-dependent balance of patterns have their folding kinetics affected with respect to the pretense linear correlation between chi and log kappa(f). The basic physical mechanism relating sigma and kappa(f) involves the concept of cooperativity: If the native is composed of patterns producing a spatial order rich in effective short-range contacts, a properly designed sequence undertakes a fast folding process. On the other hand, the presence of some structural patterns, such as long loops, may reduce substantially the folding performance. This fact is illustrated through natives having a very similar topology but presenting a distinct folding rate kappa(f), and by analyzing structures having the same chi but different sigma.  相似文献   

17.
Aoneng Cao 《物理化学学报》2020,36(1):1907002-0
蛋白质折叠问题被称为第二遗传密码,至今未破译;蛋白质序列的天书仍然是"句读之不知,惑之不解"。在最近工作的基础上,我们提出了蛋白质结构的"限域下最低能量结构片段"假说。这一假说指出,蛋白质中存在一些关键的长程强相互作用位点,这些位点相当于标点符号,将蛋白质序列的天书变成可读的句子(多肽片段)。这些片段的天然结构是在这些强长程相互作用位点限域下的能量最低状态。完整的蛋白质结构由这些"限域下最低能量结构片段"拼合而成,而蛋白质整体结构并不一定是全局性的能量最低状态。在蛋白质折叠过程中,局部片段的天然结构倾向性为强长程相互作用的形成提供主要基于焓效应的驱动力,而天然强长程相互作用的形成为局部片段的天然结构提供主要基于熵效应的稳定性。在蛋白质进化早期,可能存在一个"石器时代",即依附不同界面(比如岩石)的限域作用而稳定的多肽片段先进化出来,后由这些片段逐步进化(包括拼合)而成蛋白质。  相似文献   

18.
Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein–protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.  相似文献   

19.
Due to advances in structural biology, an increasing number of protein structures of unknown function have been deposited in Protein Data Bank (PDB). These proteins are usually characterized by novel structures and sequences. Conventional comparative methodology (such as sequence alignment, structure comparison, or template search) is unable to determine their function. Thus, it is important to identify protein's function directly from its structure, but this is not an easy task. One of the strategies used is to analyze whether there are distinctive structure-derived features associated with functional residues. If so, one may be able to identify the functional residues directly from a single structure. Recently, we have shown that protein weighted contact number is related to atomic thermal fluctuations and can be used to derive motional correlations in proteins. In this report, we analyze the weighted contact-number profiles of both catalytic residues and non-catalytic residues for a dataset of 760 structures. We found that catalytic residues have distinct distributions of weighted contact numbers from those of non-catalytic residues. Using this feature, we are able to effectively differentiate catalytic residues from other residues with a single optimized threshold value. Our method is simple to implement and compares favourably with other more sophisticated methods. In addition, we discuss the physics behind the relationship between catalytic residues and their contact numbers as well as other features (such as residue centrality or B-factors) associated with catalytic residues.  相似文献   

20.
Rohlff C 《Electrophoresis》2000,21(6):1227-1234
Bodily fluids such as cerebrospinal fluid (CSF) and serum can be analysed at the time of presentation and throughout the course of the disease. Changes in the protein composition of CSF may be indicative of altered CNS protein expression pattern with a causative or diagnostic disease link. These findings can be strengthened through subsequent proteomic analysis of specific brain areas implicated in the pathology. New isolation strategies of clinically relevant cellular material such as laser capture microdissection, protein enrichment procedures and proteomic approaches to neuropeptide and neurotransmitter analysis give us the opportunity to map out complex cellular interaction at an unprecedented level of detail. In neurological disorders multiple underlying pathogenic mechanisms as well as an acute and a chronic CNS disease components may require a selective repertoire of molecular targets and biomarkers rather than an individual protein to better define a complex disease. The resulting proteome database bypasses many ambiguities of experimental models and may facilitate pre- and clinical development of more specific disease markers and new selective fast acting therapeutics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号