首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Studies on protein–protein interaction are important in proteome research. How to build more effective models based on sequence information, structure information and physicochemical characteristics, is the key technology in protein–protein interface prediction. In this paper, we study the protein–protein interface prediction problem. We propose a novel method for identifying residues on interfaces from an input protein with both sequence and 3D structure information, based on hexagon structure similarity. Experiments show that our method achieves better results than some state-of-the-art methods for identifying protein–protein interface. Comparing to existing methods, our approach improves F-measure value by at least 0.03. On a common dataset consisting of 41 complexes, our method has overall precision and recall values of 63% and 57%. On Benchmark v4.0, our method has overall precision and recall values of 55% and 56%. On CAPRI targets, our method has overall precision and recall values of 52% and 55%.  相似文献   

2.
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.  相似文献   

3.
In an era that has been dominated by Structural Biology for the last 30-40 years, a dramatic change of focus towards sequence analysis has spurred the advent of the genome projects and the resultant diverging sequence/structure deficit. The central challenge of Computational Structural Biology is therefore to rationalize the mass of sequence information into biochemical and biophysical knowledge and to decipher the structural, functional and evolutionary clues encoded in the language of biological sequences. In investigating the meaning of sequences, two distinct analytical themes have emerged: in the first approach, pattern recognition techniques are used to detect similarity between sequences and hence to infer related structures and functions; in the second ab initio prediction methods are used to deduce 3D structure, and ultimately to infer function, directly from the linear sequence. In this article, we attempt to provide a critical assessment of what one may and may not expect from the biological sequences and to identify major issues yet to be resolved. The presentation is organized under several subtitles like protein sequences, pattern recognition techniques, protein tertiary structure prediction, membrane protein bioinformatics, human proteome, protein-protein interactions, metabolic networks, potential drug targets based on simple sequence properties, disordered proteins, the sequence-structure relationship and chemical logic of protein sequences.  相似文献   

4.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand.  相似文献   

5.
Vaccine based strategies offer a promising future in malaria control by generating protective immunity against natural infection. However, vaccine development is hindered by the Plasmodium sp. genetic diversity. Previously, we have shown P41 protein from 6-Cysteine shared by Plasmodium sp. and could be used for cross-species anti-malaria vaccines. Two different approaches, ancestral, and consensus sequence, could produce a single target for all human-infecting Plasmodium. In this study, we investigated the efficacy of ancestral and consensus of P41 protein. Phylogenetic and time tree reconstruction was conducted by RAXML and BEAST2 package to determine the relationship of known P41 sequences. Ancestral and consensus sequences were reconstructed by the GRASP server and Unipro Ugene software, respectively. The structural prediction was made using the Psipred and Rosetta program. The protein characteristic was analyzed by assessing hydrophobicity and Post-Translational Modification sites. Meanwhile, the immunogenicity score for B-cell, T-cell, and MHC was determined using an immunoinformatic approach. The result suggests that ancestral and consensus have a distinct protein characteristic with high immunogenicity scores for all immune cells. We found one shared conserved epitope with phosphorylation modification from the ancestral sequence to target the cross-species vaccine. Thus, this study provides detailed insight into P41 efficacy for the cross-species anti-malaria blood-stage vaccine.  相似文献   

6.
We have formulated the ab-initio prediction of the 3D-structure of proteins as a probabilistic programming problem where the inter-residue 3D-distances are treated as random variables. Lower and upper bounds for these random variables and the corresponding probabilities are estimated by nonparametric statistical methods and knowledge-based heuristics. In this paper we focus on the probabilistic computation of the 3D-structure using these distance interval estimates. Validation of the predicted structures shows our method to be more accurate than other computational methods reported so far. Our method is also found to be computationally more efficient than other existing ab-initio structure prediction methods. Moreover, we provide a reliability index for the predicted structures too. Because of its computational simplicity and its applicability to any random sequence, our algorithm called PROPAINOR (PROtein structure Prediction by AI an Nonparametric Regression) has significant scope in computational protein structural genomics.  相似文献   

7.

Computational tools can bridge the gap between sequence and protein 3D structure based on the notion that information is to be retrieved from the databases and that knowledge-based methods can help in approaching a solution of the protein-folding problem. To this aim our group has implemented neural network-based predictors capable of performing with some success in different tasks, including predictions of the secondary structure of globular and membrane proteins, the topology of membrane proteins and porins and stable f -helical segments suited for protein design. Moreover we have developed methods for predicting contact maps in proteins and the probability of finding a cysteine in a disulfide bridge, tools which can contribute to the goal of predicting the 3D structure starting from the sequence (the so called ab initio prediction). All our predictors take advantage of evolution information derived from the structural alignments of homologous (evolutionary related) proteins and taken from the sequence and structure databases. When it is necessary to build models for proteins of unknown spatial structure, which have very little homology with other proteins of known structure, non-standard techniques need to be developed and the tools for protein structure predictions may help in protein modeling. The results of a recent simulation performed in our lab highlights the role of high performing computing technology and of tools of computational biology in protein modeling and peptidomimetic design.  相似文献   

8.
Protein-Protein Interaction (PPI) prediction is a well known problem in Bioinformatics, for which a large number of techniques have been proposed in the past. However, prediction results have not been sufficiently satisfactory for guiding biologists in web-lab experiments. One reason is that not all useful information, such as pairwise protein interaction information based on sequence alignment, has been integrated together in PPI prediction. Alignment is a basic concept to measure sequence similarity in Proteomics that has been used in a number of applications ranging from protein recognition to protein subcellular localization. In this article, we propose a novel integrated approach to predicting PPI based on sequence alignment by jointly using a k-Nearest Neighbor classifier (SA-kNN) and a Support Vector Machine (SVM). SVM is a machine learning technique used in a wide range of Bioinformatics applications, thanks to the ability to alleviate the overfitting problems. We demonstrate that in our approach the two methods, SA-kNN and SVM, are complementary, which are combined in an ensemble to overcome their respective limitations. While the SVM is trained on Amino Acid (AA) compositions and protein signatures mined from literature, the SA-kNN makes use of the similarity of two protein pairs through alignment. Experimentally, our technique leads to a significant gain in accuracy, precision and sensitivity measures at ~5%, 16% and 10% respectively.  相似文献   

9.
Prediction of protein accessibility from sequence, as prediction of protein secondary structure is an intermediate step for predicting structures and consequently functions of proteins. Most of the currently used methods are based on single residue prediction, either by statistical means or evolutionary information, and accessibility state of central residue in a window predicted. By expansion of databases of proteins with known 3D structures, we extracted information of pairwise residue types and conformational states of pairs simultaneously. For solving the problem of ambiguity in state prediction by one residue window sliding, we used dynamic programming algorithm to find the path with maximum score. The three state overall per-residue accuracy, Q3, of this method in a Jackknife test with dataset of known proteins is more than 65% which is an improvement on results of methods based on evolutionary information.  相似文献   

10.
The goal of computational protein structure prediction is to provide three-dimensional (3D) structures with resolution comparable to experimental results. Comparative modeling, which predicts the 3D structure of a protein based on its sequence similarity to homologous structures, is the most accurate computational method for structure prediction. In the last two decades, significant progress has been made on comparative modeling methods. Using the large number of protein structures deposited in the Protein Data Bank (~65,000), automatic prediction pipelines are generating a tremendous number of models (~1.9 million) for sequences whose structures have not been experimentally determined. Accurate models are suitable for a wide range of applications, such as prediction of protein binding sites, prediction of the effect of protein mutations, and structure-guided virtual screening. In particular, comparative modeling has enabled structure-based drug design against protein targets with unknown structures. In this review, we describe the theoretical basis of comparative modeling, the available automatic methods and databases, and the algorithms to evaluate the accuracy of predicted structures. Finally, we discuss relevant applications in the prediction of important drug target proteins, focusing on the G protein-coupled receptor (GPCR) and protein kinase families.  相似文献   

11.
12.
Protein modeling tools utilize many kinds of structural information that may be predicted from amino acid sequence of a target protein or obtained from experiments. Such data provide geometrical constraints in a modeling process. The main aim is to generate the best possible consensus structure. The quality of models strictly depends on the imposed conditions. In this work we present an algorithm, which predicts short-range distances between Cα atoms as well as a set of short structural fragments that possibly share structural similarity with a query sequence. The only input of the method is a query sequence profile. The algorithm searches for short protein fragments with high sequence similarity. As a result a statistics of distances observed in the similar fragments is returned. The method can be used also as a scoring function or a short-range knowledge-based potential based on the computed statistics.  相似文献   

13.
Aoneng Cao 《物理化学学报》2020,36(1):1907002-0
蛋白质折叠问题被称为第二遗传密码,至今未破译;蛋白质序列的天书仍然是"句读之不知,惑之不解"。在最近工作的基础上,我们提出了蛋白质结构的"限域下最低能量结构片段"假说。这一假说指出,蛋白质中存在一些关键的长程强相互作用位点,这些位点相当于标点符号,将蛋白质序列的天书变成可读的句子(多肽片段)。这些片段的天然结构是在这些强长程相互作用位点限域下的能量最低状态。完整的蛋白质结构由这些"限域下最低能量结构片段"拼合而成,而蛋白质整体结构并不一定是全局性的能量最低状态。在蛋白质折叠过程中,局部片段的天然结构倾向性为强长程相互作用的形成提供主要基于焓效应的驱动力,而天然强长程相互作用的形成为局部片段的天然结构提供主要基于熵效应的稳定性。在蛋白质进化早期,可能存在一个"石器时代",即依附不同界面(比如岩石)的限域作用而稳定的多肽片段先进化出来,后由这些片段逐步进化(包括拼合)而成蛋白质。  相似文献   

14.
A statistical analytical approach has been used to analyze the secondary structure (SS) of amino acids as a function of the sequence of amino acid residues. We have used 306 non-homologous best-resolved protein structures from the Protein Data Bank for the analysis. A sequence region of 32 amino acids on either side of the residue is considered in order to calculate single amino acid propensities, di-amino acid potentials and tri-amino acid potentials. A weighted sum of predictions obtained using these properties is used to suggest a final prediction method. Our method is as good as the best-known SS prediction methods, is the simplest of all the methods, and uses no homologous sequence/family alignment data, yet gives 72% SS prediction accuracy. Since the method did not use many other factors that may increase the prediction accuracy there is scope to achieve greater accuracy using this approach. Received: 4 May 1998 / Accepted: 17 September 1998 / Published online: 10 December 1998  相似文献   

15.
Efforts to use computers in predicting the secondary structure of proteins based only on primary structure information started over a quarter century ago [1-3]. Although the results were encouraging initially, the accuracy of the pioneering methods generally did not attain the level required for using predictions of secondary structures reliably in modelling the three-dimensional topology of proteins. During the last decade, however, the introduction of new computational techniques as well as the use of multiple sequence information has lead to a dramatic increase in the success rate of prediction methods, such that successful 3D modelling based on predicted secondary structure has become feasible [e.g., Ref 4]. This review is aimed at presenting an overview of the scale of the secondary structure prediction problem and associated pitfalls, as well as the history of the development of computational prediction methods. As recent successful strategies for secondary structure prediction all rely on multiple sequence information, some methods for accurate protein multiple sequence alignments will also be described. While the main focus is on prediction methods for globular proteins, also the prediction of trans-membrane segments within membrane proteins will be briefly summarised. Finally, an integrated iterative approach tying secondary structure prediction and multiple alignment will be introduced [5].  相似文献   

16.
Although the characterization of proteins cannot solely rely upon sequence similarity, it has been widely proved that all-vs-all massive sequence comparisons may be an effective approach and a good basis for the prediction of biochemical functions or for the delineation of common shared properties. The program Cluster-C presented here enables a stand-alone and efficient construction of protein families within whole proteomes. The algorithm, which is based on the detection of cliques, ensures a high level of connectivity within the clusters. As opposed to the single transitive linkage method, Cluster-C allows a large number of sequences to be classified in such a way that the multidomain proteins do not produce a chain-grouping effect resulting in meaningless clusters. Moreover, some proteins can be present in several different but relevant clusters, which is of help in the determination of their functional domains. In the present analysis we used the Z-value, an evaluation of the significance of the similarity score, as the criterion for connecting sequences (the user can freely define the threshold of the similarity criterion). The clusters built with a rather low threshold (Z= 14) include more than 97% of the sequences and are consistent with known protein families and PROSITE patterns.  相似文献   

17.
A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value<0.006) than using either sequence similarity or subcellular location independently.  相似文献   

18.
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement.  相似文献   

19.
Detection of protein complexes is very important to understand the principles of cellular organization and function. Recently, large protein–protein interactions (PPIs) networks have become available using high-throughput experimental techniques. These networks make it possible to develop computational methods for protein complex detection. Most of the current methods rely on the assumption that protein complex as a module has dense structure. However complexes have core-attachment structure and proteins in a complex core share a high degree of functional similarity, so it expects that a core has high weighted density. In this paper we present a Core-Attachment based method for protein complex detection from Weighted PPI Interactions using clustering coefficient and weighted density. Experimental results show that the proposed method, CAMWI improves the accuracy of protein complex detection.  相似文献   

20.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号