首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A procedure that uses pattern recognition techniques to compute tripeptide conformational probabilities is described. The procedure differs in several respects from the many “secondary structure” prediction algorithms that have been published over the last 20 years. First, the procedure classifies tripeptides into 64 different conformational types, rather than just α, β and coil, as is commonly done. Thus, the procedure can attempt to predict regions of irregular structure. Second, the procedure uses the methods of pattern recognition, which are powerful but conceptually simple. In this approach, amino acid properties are used to map peptide sequences into a multivariate property space. Particular tripeptide conformations tend to map to particular regions of the property space. These regions are represented by multivariate gaussian distributions, where the parameters of the distributions are determined from tripeptides in the protein X-ray data bank. Finally, rather than making simple predictions, the procedure computes probabilities. Tripeptide conformational probabilities are calculated in the multivariate property space using the gaussian distributions. In a prediction, the procedure might find that a particular tripeptide in a protein has a 36% chance of being in the ααα conformation, a 17% chance of being αα?, a 14% chance of being ααα*, etc. The α-helical conformation is thus the most probable, but, in predicting the structure of the protein, a search algorithm should also consider some of the other possibilities. The values of the probability provide a rational basis for selecting from among the possible conformations. The second article of this series describes a procedure that uses the probabilities to direct a search through the conformational space of a protein. The third article of the series describes a procedure that generates actual three-dimensional structures, and minimizes their energies. The three articles together describe a complete procedure, termed “pattern recognition-based importance-sampling minimization” (PRISM), for predicting protein structure from amino acid sequence.  相似文献   

2.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand.  相似文献   

3.
Improvement of prediction accuracy of the protein secondary structure is essential for further developments of the whole field of protein research. In this paper, the expertness of protein secondary structure prediction engines has been studied in three levels and a new criterion has been introduced in the third level. This criterion could be considered as an extension of the previous ones based on amino acid index. Using this new criterion, the expertness of some high score secondary structure prediction engines has been reanalyzed and some hidden facts have been discovered. The results of this new assessment demonstrated that a noticeable harmony has been existed among each amino acid prediction behavior in all engines. This harmony has also been seen between single global propensity and prediction accuracy of amino acid types in each secondary structure class. Moreover, it is shown that Proline and Glycine amino acids have been predicted with less accuracy in alpha helices and beta strands. In addition, regardless of different approaches used in prediction engines, beta strands have been predicted with less accuracy.  相似文献   

4.
Currently, several energy functions and conformational search methods have been developed that are based on the observed distribution of phi and psi angles in protein structures. The definition of phi and psi angles is directly related to the orientation of the peptide plane (CA CO NH CA). Starting from one conformation and rotating a single peptide plane, the angles psi for one residue and phi for the consecutive residue that are linked by the peptide plane, display a continuous range of values within one global conformation. When peptide plane rotation is analyzed in several different conformations generated from a restricted conformation database, a large number of these conformations are related. Based on these observations, a new simplified all-atom representation for protein folding simulations is presented where only one torsion angle variable is required for each residue. The underlying theme of this article is that conformational search methods using phi and psi torsion space, search through many redundant conformations. These conformations are related by anticorrelated torsion changes of peptide plane rotations. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 947–955, 1999  相似文献   

5.
RNA folding dynamics plays important roles in various functions of RNAs. To date, coarse-grained modeling has been successfully employed to simulate RNA folding dynamics on the energy landscape composed of secondary structures. In such a modeling, the energy barrier height between metastable structures is a key parameter that crucially affects the simulation results. Although a number of approaches ranging from the exact method to heuristic ones are available to predict the barrier heights, developing an efficient heuristic for this purpose is still an algorithmic challenge.We developed a novel RNA folding pathway prediction method, ACOfoldpath, based on Ant Colony Optimization (ACO). ACO is a widely used powerful combinatorial optimization algorithm inspired from the food-seeking behavior of ants. In ACOfoldpath, to accelerate the folding pathway prediction, we reduce the search space by utilizing originally devised structure generation rules. To evaluate the performance of the proposed method, we benchmarked ACOfoldpath on the known nineteen conformational RNA switches. As a result, ACOfoldpath successfully predicted folding pathways better than or comparable to the previous heuristics. The results of RNA folding dynamics simulations and pseudoknotted pathway predictions are also presented.  相似文献   

6.
The protein structure prediction problem is a classical NP hard problem in bioinformatics. The lack of an effective global optimization method is the key obstacle in solving this problem. As one of the global optimization algorithms, tabu search (TS) algorithm has been successfully applied in many optimization problems. We define the new neighborhood conformation, tabu object and acceptance criteria of current conformation based on the original TS algorithm and put forward an improved TS algorithm. By integrating the heuristic initialization mechanism, the heuristic conformation updating mechanism, and the gradient method into the improved TS algorithm, a heuristic-based tabu search (HTS) algorithm is presented for predicting the two-dimensional (2D) protein folding structure in AB off-lattice model which consists of hydrophobic (A) and hydrophilic (B) monomers. The tabu search minimization leads to the basins of local minima, near which a local search mechanism is then proposed to further search for lower-energy conformations. To test the performance of the proposed algorithm, experiments are performed on four Fibonacci sequences and two real protein sequences. The experimental results show that the proposed algorithm has found the lowest-energy conformations so far for three shorter Fibonacci sequences and renewed the results for the longest one, as well as two real protein sequences, demonstrating that the HTS algorithm is quite promising in finding the ground states for AB off-lattice model proteins.  相似文献   

7.
RNA molecules participate in many important biological processes, and they need to fold into well-defined secondary and tertiary structures to realize their functions. Like the well-known protein folding problem, there is also an RNA folding problem. The folding problem includes two aspects: structure prediction and folding mechanism. Although the former has been widely studied, the latter is still not well understood. Here we present a deep reinforcement learning algorithms 2dRNA-Fold to study the fastest folding paths of RNA secondary structure. 2dRNA-Fold uses a neural network combined with Monte Carlo tree search to select residue pairing step by step according to a given RNA sequence until the final secondary structure is formed. We apply 2dRNA-Fold to several short RNA molecules and one longer RNA 1Y26 and find that their fastest folding paths show some interesting features. 2dRNA-Fold is further trained using a set of RNA molecules from the dataset bpRNA and is used to predict RNA secondary structure. Since in 2dRNA-Fold the scoring to determine next step is based on possible base pairings, the learned or predicted fastest folding path may not agree with the actual folding paths determined by free energy according to physical laws.  相似文献   

8.
The three-dimensional structures of proteins provide their functions and incorrect folding of its β-strands can be the cause of many diseases. There are two major approaches for determining protein structures: computational prediction and experimental methods that employ technologies such as Cryo-electron microscopy. Due to experimental methods’s high costs, extended wait times for its lengthy processes, and incompleteness of results, computational prediction is an attractive alternative. As the focus of the present paper, β-sheet structure prediction is a major portion of overall protein structure prediction. Prediction of other substructures, such as α-helices, is simpler with lower computational time complexities. Brute force methods are the most common approach and dynamic programming is also utilized to generate all possible conformations. The current study introduces the Subset Sum Approach (SSA) for the direct search space generation method, which is shown to outperform the dynamic programming approach in terms of both time and space. For the first time, the present work has calculated both the state space cardinality of the dynamic programming approach and the search space cardinality of the general brute force approaches. In regard to a set of pruning rules, SSA has demonstrated higher efficiency with respect to both time and accuracy in comparison to state-of-the-art methods.  相似文献   

9.
Structure Based Drug Design (SBDD) is a computational approach to lead discovery that uses the three-dimensional structure of a protein to fit drug-like molecules into a ligand binding site to modulate function. Identifying the location of the binding site is therefore a vital first step in this process, restricting the search space for SBDD or virtual screening studies. The detection and characterisation of functional sites on proteins has increasingly become an area of interest. Structural genomics projects are increasingly yielding protein structures with unknown functions and binding sites. Binding site prediction was pioneered by pocket detection, since the binding site is often found in the largest pocket. More recent methods involve phylogenetic analysis, identifying structural similarity with proteins of known function and identifying regions on the protein surface with a potential for high binding affinity. Binding site prediction has been used in several SBDD projects and has been incorporated into several docking tools. We discuss different methods of ligand binding site prediction, their strengths and weaknesses, and how they have been used in SBDD.  相似文献   

10.
Protein structure prediction is a fundamental issue in the field of computational molecular biology. In this paper, the AB off-lattice model is adopted to transform the original protein structure prediction scheme into a numerical optimization problem. We present a balance-evolution artificial bee colony (BE-ABC) algorithm to address the problem, with the aim of finding the structure for a given protein sequence with the minimal free-energy value. This is achieved through the use of convergence information during the optimization process to adaptively manipulate the search intensity. Besides that, an overall degradation procedure is introduced as part of the BE-ABC algorithm to prevent premature convergence. Comprehensive simulation experiments based on the well-known artificial Fibonacci sequence set and several real sequences from the database of Protein Data Bank have been carried out to compare the performance of BE-ABC against other algorithms. Our numerical results show that the BE-ABC algorithm is able to outperform many state-of-the-art approaches and can be effectively employed for protein structure optimization.  相似文献   

11.
All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) re-introducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.  相似文献   

12.
A deterministic algorithm for enumeration of transmembrane protein folds is presented. Using a set of sparse pairwise atomic distance constraints (such as those obtained from chemical cross-linking, FRET, or dipolar EPR experiments), the algorithm performs an exhaustive search of secondary structure element packing conformations distributed throughout the entire conformational space. The end result is a set of distinct protein conformations, which can be scored and refined as part of a process designed for computational elucidation of transmembrane protein structures.  相似文献   

13.
A newly developed approach for predicting the structure of segments that connect known elements of secondary structure in proteins has been applied to some of the longer loops in the G-protein coupled receptors (GPCRs) rhodopsin and the dopamine receptor D2R. The algorithm uses Monte Carlo (MC) simulation in a temperature annealing protocol combined with a scaled collective variables (SCV) technique to search conformation space for loop structures that could belong to the native ensemble. Except for rhodopsin, structural information is only available for the transmembrane helices (TMHs), and therefore the usual approach of finding a single conformation of lowest energy has to be abandoned. Instead the MC search aims to find the ensemble located at the absolute minimum free energy, i.e., the native ensemble. It is assumed that structures in the native ensemble can be found by an MC search starting from any conformation in the native funnel. The hypothesis is that native structures are trapped in this part of conformational space because of the high-energy barriers that surround the native funnel. In this work it is shown that the crystal structure of the second extracellular loop (e2) of rhodopsin is a member of this loop’s native ensemble. In contrast, the crystal structure of the third intracellular loop is quite different in the different crystal structures that have been reported. Our calculations indicate, that of three crystal structures examined, two show features characteristic of native ensembles while the other one does not. Finally the protocol is used to calculate the structure of the e2 loop in D2R. Here, the crystal structure is not known, but it is shown that several side chains that are involved in interaction with a class of substituted benzamides assume conformations that point into the active site. Thus, they are poised to interact with the incoming ligand.  相似文献   

14.
15.
Although atomic structures have been determined directly from cryo-EM density maps with high resolutions, current structure determination methods for medium resolution (5 to 10 Å) cryo-EM maps are limited by the availability of structure templates. Secondary structure traces are lines detected from a cryo-EM density map for α-helices and β-strands of a protein. A topology of secondary structures defines the mapping between a set of sequence segments and a set of traces of secondary structures in three-dimensional space. In order to enhance accuracy in ranking secondary structure topologies, we explored a method that combines three sources of information: a set of sequence segments in 1D, a set of amino acid contact pairs in 2D, and a set of traces in 3D at the secondary structure level. A test of fourteen cases shows that the accuracy of predicted secondary structures is critical for deriving topologies. The use of significant long-range contact pairs is most effective at enriching the rank of the maximum-match topology for proteins with a large number of secondary structures, if the secondary structure prediction is fairly accurate. It was observed that the enrichment depends on the quality of initial topology candidates in this approach. We provide detailed analysis in various cases to show the potential and challenge when combining three sources of information.  相似文献   

16.
Protein structure prediction is a long‐standing problem in molecular biology. Due to lack of an accurate energy function, it is often difficult to know whether the sampling algorithm or the energy function is the most important factor for failure of locating near‐native conformations of proteins. This article examines the size dependence of sampling effectiveness by using a perfect “energy function”: the root‐mean‐squared distance from the target native structure. Using protein targets up to 460 residues from critical assessment of structure prediction techniques (CASP11, 2014), we show that the accuracy of near native structures sampled is relatively independent of protein sizes but strongly depends on the errors of predicted torsion angles. Even with 40% out‐of‐range angle prediction, 2 Å or less near‐native conformation can be sampled. The result supports that the poor energy function is one of the bottlenecks of structure prediction and predicted torsion angles are useful for overcoming the bottleneck by restricting the sampling space in the absence of a perfect energy function. © 2015 Wiley Periodicals, Inc.  相似文献   

17.
The key problem in polypeptide‐structure prediction is with regard to thermodynamics. Two factors limit prediction in ab initio computer simulations. First, the thermodynamically dominant conformations must be found from an extremely large number of possible conformations. Second, these low‐energy forms must deviate little from the experimental structures. Here, we report on the application of the diffusion‐controlled Monte Carlo approach to predict four α‐helical hairpins with 34–38 residues by global optimization, using an energy optimized on other supersecondary structures. A total of seven simulations is carried out for each protein starting from fully extended conformations. Three proteins are correctly folded (within 3.0 Å rms from the experimental structures), but the fourth protein cannot distinguish between several equienergetic conformations. Possible improvement of the energy model is suggested. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 582–589, 2000  相似文献   

18.
The two great challenges of the docking process are the prediction of ligand poses in a protein binding site and the scoring of the docked poses. Ligands that are composed of extended chains in their molecular structure display the most difficulties, predominantly because of the torsional flexibility. On the basis of the molecular docking program QXP-Flo+0802, we have developed a procedure particularly for ligands with a high degree of rotational freedom that allows the accurate prediction of the orientation and conformation of ligands in protein binding sites. Starting from an initial full Monte Carlo docking experiment, this was achieved by performing a series of successive multistep docking runs using a local Monte Carlo search with a restricted rotational angle, by which the conformational search space is limited. The method was established by using a highly flexible acetylcholinesterase inhibitor and has been applied to a number of challenging protein-ligand complexes known from the literature.  相似文献   

19.
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha–beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.  相似文献   

20.
Protein structure prediction and analysis are more significant for living organs to perfect asses the living organ functionalities. Several protein structure prediction methods use neural network (NN). However, the Hidden Markov model is more interpretable and effective for more biological data analysis compared to the NN. It employs statistical data analysis to enhance the prediction accuracy. The current work proposed a protein prediction approach from protein images based on Hidden Markov Model and Chapman Kolmogrov equation. Initially, a preprocessing stage was applied for protein images’ binarization using Otsu technique in order to convert the protein image into binary matrix. Subsequently, two counting algorithms, namely the Flood fill and Warshall are employed to classify the protein structures. Finally, Hidden Markov model and Chapman Kolmogrov equation are applied on the classified structures for predicting the protein structure. The execution time and algorithmic performances are measured to evaluate the primary, secondary and tertiary protein structure prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号