首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

2.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

3.
Prediction of protein accessibility from sequence, as prediction of protein secondary structure is an intermediate step for predicting structures and consequently functions of proteins. Most of the currently used methods are based on single residue prediction, either by statistical means or evolutionary information, and accessibility state of central residue in a window predicted. By expansion of databases of proteins with known 3D structures, we extracted information of pairwise residue types and conformational states of pairs simultaneously. For solving the problem of ambiguity in state prediction by one residue window sliding, we used dynamic programming algorithm to find the path with maximum score. The three state overall per-residue accuracy, Q3, of this method in a Jackknife test with dataset of known proteins is more than 65% which is an improvement on results of methods based on evolutionary information.  相似文献   

4.
An abundance of protein structures emerging from structural genomics and the Protein Structure Initiative (PSI) are not amenable to ready functional assignment because of a lack of sequence and structural homology to proteins of known function. We describe a high-throughput NMR methodology (FAST-NMR) to annotate the biological function of novel proteins through the structural and sequence analysis of protein-ligand interactions. This is based on basic tenets of biochemistry where proteins with similar functions will have similar active sites and exhibit similar ligand binding interactions, despite global differences in sequence and structure. Protein-ligand interactions are determined through a tiered NMR screen using a library composed of compounds with known biological activity. A rapid co-structure is determined by combining the experimental identification of the ligand binding site from NMR chemical shift perturbations with the protein-ligand docking program AutoDock. Our CPASS (Comparison of Protein Active Site Structures) software and database are then used to compare this active site with proteins of known function. The methodology is demonstrated using unannotated protein SAV1430 from Staphylococcus aureus.  相似文献   

5.
Interaction between ATP, a multifunctional and ubiquitous nucleotide, and proteins initializes phosphorylation, polypeptide synthesis and ATP hydrolysis which supplies energy for metabolism. However, current knowledge concerning the mechanisms through which ATP is recognized by proteins is incomplete, scattered, and inaccurate. We systemically investigate sequence and structural motifs of proteins that recognize ATP. We identified three novel motifs and refined the known p-loop and class II aminoacyl-tRNA synthetase motifs. The five motifs define five distinct ATP–protein interaction modes which concern over 5% of known protein structures. We demonstrate that although these motifs share a common GXG tripeptide they recognize ATP through different functional groups. The p-loop motif recognizes ATP through phosphates, class II aminoacyl-tRNA synthetase motif targets adenosine and the other three motifs recognize both phosphates and adenosine. We show that some motifs are shared by different enzyme types. Statistical tests demonstrate that the five sequence motifs are significantly associated with the nucleotide binding proteins. Large-scale test on PDB reveals that about 98% of proteins that include one of the structural motifs are confirmed to bind ATP.  相似文献   

6.
RNA-binding proteins (RBPs) perform fundamental and diverse functions within the cell. Approximately 15% of proteins sequences are annotated as RNA-binding, but with a significant number of proteins without functional annotation, many RBPs are yet to be identified. A percentage of uncharacterised proteins can be annotated by transferring functional information from proteins sharing significant sequence homology. However, genomes contain a significant number of orphan open reading frames (ORFs) that do not share significant sequence similarity to other ORFs, but correspond to functional proteins. Hence methods for protein function annotation that go beyond sequence homology are essential. One method of annotation is the identification of ligands that bind to proteins, through the characterisation of binding site residues. In the current work RNA-binding residues (RBRs) are characterised in terms of their evolutionary conservation and the patterns they form in sequence space. The potential for such characteristics to be used to identify RBPs from sequence is then evaluated.In the current work the conservation of residues in 261 RBPs is compared for (a) RBRs vs. non-RBRs surface residues, and for (b) specific and non-specific RBRs. The analysis shows that RBRs are more conserved than other surface residues, and RBRs hydrogen-bonded to the RNA backbone are more conserved than those making hydrogen bonds to RNA bases. This observed conservation of RBRs was then used to inform the construction of RBR sequence patterns from known protein–RNA structures. A series of RBR patterns were generated for a case study protein aspartyl-tRNA synthetase bound to tRNA; and used to differentiate between RNA-binding and non-RNA-binding protein sequences. Six sequence patterns performed with high precision values of >80% and recall values 7 times that of an homology search. When the method was expanded to the complete dataset of 261 proteins, many patterns were of poor predictive value, as they had not been manipulated on a family-specific basis. However, two patterns with precision values ≥85% were used to make function predictions for a set of hypothetical proteins. This revealed a number of potential RBPs that require experimental verification.  相似文献   

7.
Members of the Really Interesting New Gene (RING) family of proteins are found throughout the cells of eukaryotes and function in processes as diverse as development, oncogenesis, viral replication and apoptosis. There are over 200 members of the RING family where membership is based on the presence of a consensus sequence of zinc binding residues. Outside of these residues there is little sequence homology; however, there are conserved structural features. Current evidence strongly suggests that RINGs are protein interaction domains. We examine the features of RING binding motifs in terms of individual cases and the potential for finding a universal consensus sequence for RING binding domains (FRODOs). This review examines known and potential functions of RINGs, and attempts to develop a framework within which their seemingly multivalent cellular roles can be consistently understood in their structural and biochemical context. Interestingly, some RINGs can self-associate as well as bind other RINGs. The ability to self-associate is typically translated into the annoying propensity of these domains to aggregate during biochemical characterization. The RINGs of PML, BRCA1, RAG1, KAP1/TIF1beta, Polycomb proteins, TRAFs and the viral protein Z have been well characterized in terms of both biochemical studies and functional data and so will serve as focal points for discussion. We suggest physiological functions for the oligomeric properties of these domains, such as their role in formation of macromolecular assemblages which function in an intricate interplay of coupled metal binding, folding and aggregation, and participate in diverse functions: epigenetic regulation of gene expression, RNA transport, cell cycle control, ubiquitination, signal transduction and organelle assembly.  相似文献   

8.
This paper describes the program ASSAM, which has been developed to search for patterns of amino acid side-chains in the 3D structures in the Protein Data Bank. ASSAM represents an amino acid by a vector drawn from the main chain towards the functional part of the amino acid and then computes a graph representation of a protein in which the individual side-chain vectors are the nodes and the intervector distances are the edges. The presence of a query pattern in a Protein Data Bank structure can then be searched for by means of a subgraph isomorphism algorithm. Recent enhancements to ASSAM allow searches to include the following: the main-chain structure in addition to the side-chains; the secondary structure and solvent accessibility of side-chains; allowable distances from a known binding-site; disulfide bridges; and improved generic and wild-card queries. The effectiveness of these approaches is demonstrated by extensive searches of the Protein Data Bank for typical 3D query patterns.  相似文献   

9.
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20–40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain.  相似文献   

10.
Exhaustive and nonredundant generation of stereoisomers of a chemical compound with a specified constitution is an important tool for molecular structure elucidation and molecular design. It is known that many chemical compounds have outerplanar graph structures. In this paper we deal with chemical compounds composed of carbon, hydrogen, oxygen, and nitrogen atoms whose graphical structures are outerplanar and consider stereoisomers caused only by asymmetry around carbon atoms. Based on dynamic programming, we propose an algorithm of generating all stereoisomers without duplication. We treat a given outerplanar graph as a graph rooted at its structural center. Our algorithm first recursively computes the number of stereoisomers of the subgraph induced by the descendants of each vertex and then constructs each stereoisomer by backtracking the process of computing the numbers of stereoisomers. Our algorithm correctly counts the number of stereoisomers in O(n) time and space and correctly enumerates all of the stereoisomers in O(n3) time per stereoisomer on average and in O(n) space, where n is the number of atoms in a given structure.  相似文献   

11.
Protein fold recognition   总被引:4,自引:0,他引:4  
Summary An important, yet seemingly unattainable, goal in structural molecular biology is to be able to predict the native three-dimensional structure of a protein entirely from its amino acid sequence. Prediction methods based on rigorous energy calculations have not yet been successful, and best results have been obtained from homology modelling and statistical secondary structure prediction. Homology modelling is limited to cases where significant sequence similarity is shared between a protein of known structure and the unknown. Secondary structure prediction methods are not only unreliable, but also do not offer any obvious route to the full tertiary structure. Recently, methods have been developed whereby entire protein folds are recognized from sequence, even where little or no sequence similarity is shared between the proteins under consideration. In this paper we review the current methods, including our own, and in particular offer a historical background to their development. In addition, we also discuss the future of these methods and outline the developments under investigation in our laboratory.  相似文献   

12.
13.
利用机器学习方法对单个氨基酸突变引起的蛋白质稳定性变化进行精确地预测,对蛋白质的结构和功能方面的研究具有重要的价值,并且对设计新的蛋白质及蛋白质工程学具有一定的指导意义.通过对蛋白质网络拓扑特征的研究,发现网络拓扑特征对于蛋白质突变稳定性影响具有较高的准确率.基于蛋白质网络拓扑特征的随机森林算法,能较好的对蛋白质单点突...  相似文献   

14.
Recent investigations on the stability of proteins have demonstrated various structural factors, but few have considered sequence factors such as protein motifs. These motifs represent highly conserved regions and describe critical regions that may only exist on proteins that remain functional at high temperatures. This investigation presents a method for identifying and comparing corresponding mesophilic and thermophilic sequence motifs between protein families. Discriminative motifs that are conserved only in the mesophilic or thermophilic subfamily are identified. Analysis of the results shows that, although the subfamilies of most protein families share similar motifs, some discriminative motifs are present in particular thermophilic/mesophilic subfamilies. The thermophilic discriminative motifs are conserved only in thermophilic organisms, revealing that physiochemical principles support thermostability.  相似文献   

15.
The generating function of the sequence counting the number of graph vertices at a given distance from the root is called the spherical growth function of the rooted graph. The vertices farthest from the root form an induced subgraph called the distance-residual graph. These mathematical notions are applied to benzenoid graphs which are used in graph theory to represent benzenoid hydrocarbons. An algorithm for calculating the growth in catacondensed benzenoids is presented, followed by some examples.  相似文献   

16.
Summary A new database of conserved amino acid residues is derived from the multiple sequence alignment of over 84 families of protein sequences that have been reported in the literature. This database contains sequences of conserved hydrophobic core patterns which are probably important for structure and function, since they are conserved for most sequences in that family. This database differs from other single-motif or signature databases reported previously, since it contains multiple patterns for each family. The new database is used to align a new sequence with the conserved regions of a family. This is analogous to reports in the literature where multiple sequence alignments are used to improve a sequence alignment. A program called Homology-Plot (suitable for IBM or compatible computers) uses this database to find homology of a new sequence to a family of protein sequences. There are several advantages to using multiple patterns. First, the program correctly identifies a new sequence as a member of a known family. Second, the search of the entire database is rapid and requires less than one minute. This is similar to performing a multiple sequence alignment of a new sequence to all of the known protein family sequences. Third, the alignment of a new sequence to family members is reliable and can reproduce the alignment of conserved regions already described in the literature. The speed and efficiency of this method is enhanced, since there is no need to score for insertions or deletions as is done in the more commonly used sequence alignment methods. In this method only the patterns are aligned. HomologyPlot also provides general information on each family, as well as a listing of patterns in a family.  相似文献   

17.
A new algorithm to predict protein-protein binding sites using conservation of both protein surface structure and physical-chemical properties in structurally similar proteins is developed. Binding-site residues in proteins are known to be more conserved than the rest of the surface, and finding local surface similarities by comparing a protein to its structural neighbors can potentially reveal the location of binding sites on this protein. This approach, which has previously been used to predict binding sites for small ligands, is now extended to predict protein-protein binding sites. Examples of binding-site predictions for a set of proteins, which have previously been studied for sequence conservation in protein-protein interfaces, are given. The predicted binding sites and the actual binding sites are in good agreement. Our algorithm for finding conserved surface structures in a set of similar proteins is a useful tool for the prediction of protein-protein binding sites.  相似文献   

18.
The ability of FT-ICR MS to resolve isotopic variants of intact proteins for each of the charge states formed by electrospray ionization offers a sensitive, rapid method for detecting "low mass" heterogeneity, where this is defined as the presence of structural variants differing in mass by 2 Da or less. Such heterogeneity may reflect biological or chemical modifications of structure or may result from the coexpression of related proteins from a multi-gene family. In the analytical approach described here, comparisons are made between observed isotopic distributions and those expected for predicted protein sequences. Close agreement is demonstrated for a homogeneous model protein, and the utility of the method has been evaluated in the study of mouse major urinary proteins (MUPs), a group of closely related sequences. Divergence of the experimental isotopic distribution from distributions predicted for known MUP sequences can be explained, in quantitative terms, by the coexpression of closely related sequences. This approach provides a facile method for the assessment of protein homogeneity and for the detection of structural variants, without recourse to proteolytic digestion and analysis of the resulting products.  相似文献   

19.
We have been investigating the creation of novel proteins by means of block shuffling, where the term block refers to an amino acid sequence that corresponds to particular features of proteins, such as secondary structures, modules, functional motifs, and so on. Block shuffling makes it possible to explore the global sequence space, which is not feasible with conventional methods, such as DNA shuffling or family shuffling. To investigate what properties are required for the building blocks, we have analyzed the foldability and enzymatic activity of barnase mutants obtained by permutation of modules or secondary structure units. This reconstructive approach indicated that secondary structure units with mutual long-range interactions are more suitable than modules as building blocks, at least in the case of barnase. The results also suggested that proteins in evolutionarily intermediate states are created by block shuffling, and such proteins have the potential to be evolved into mature globular proteins. For the construction of combinatorial protein libraries, we have developed random multi-recombinant PCR (RM-PCR), which can combine different DNA fragments without homologous sequences. The libraries can be utilized for in vitro selection using in vitro virus (mRNA display) or stable (DNA display), which have also been developed in our laboratory. In this review article, we summarize our strategy to create novel proteins by block shuffling and review key literature in the field. Possible applications of the block shuffling strategy are also discussed.  相似文献   

20.
It has been shown that protein primary sequence encodes quaternary structure information. In this present work, function of degree of disagreement (FDOD), a new measure of information discrepancy, is applied to discriminating between homodimers and other homooligomeric proteins from the primary structure. This new approach is based on subsequence distributions of the primary sequences, so the effect of residue order on protein structure is taken into account. When the length of subsequence is 4, the overall accuracy of the 10-fold cross-validation test attains to 82.5%, which is much better than that of the previous method on the same data set. Our tests demonstrate that the residue order along protein sequences plays an important role in the prediction of homooligomers. In addition, our results suggest that FDOD measure is a simple and powerful tool for the prediction of protein multimeric states.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号