首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
Predicting protein function and structure from sequence remains an unsolved problem in bioinformatics. The best performing methods rely heavily on evolutionary information from multiple sequence alignments, which means their accuracy deteriorates for sequences with a few homologs, and given the increasing sequence database sizes requires long computation times. Here, a single‐sequence‐based prediction method is presented, called ProteinUnet, leveraging an U‐Net convolutional network architecture. It is compared to SPIDER3‐Single model, based on long short‐term memory‐bidirectional recurrent neural networks architecture. Both methods achieve similar results for prediction of secondary structures (both three‐ and eight‐state), half‐sphere exposure, and contact number, but ProteinUnet has two times fewer parameters, 17 times shorter inference time, and can be trained 11 times faster. Moreover, ProteinUnet tends to be better for short sequences and residues with a low number of local contacts. Additionally, the method of loss weighting is presented as an effective way of increasing accuracy for rare secondary structures.  相似文献   

2.
We propose a method for predicting RNA base pairing which imposes no restrictions on the order of base pairs, allows for pseudoknots and runs in O(mN2) time for N base pairs and m iterations. It employs a self‐consistent mean field method in which all base pairs are possible, but with each iteration, the most energetically favored base pairs become more likely as long as they are consistent with their neighbors. Performance was compared against three other programs using three test sets. Sensitivity varied from 20% to 74% and specificity from 44% to 77% and generally, the method predicts too many base pairs leading to good sensitivity and worse specificity. The predicted structures have excellent energies suggesting that, algorithmically, the method performs well, but the classic literature energy models may not be appropriate when pseudoknots are permitted. Website and source code for the simulations are available at http://cardigan.zbh.uni‐hamburg.de/~rnascmf . © 2009 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

3.
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/  相似文献   

4.
One of the major challenges for protein tertiary structure prediction strategies is the quality of conformational sampling algorithms, which can effectively and readily search the protein fold space to generate near‐native conformations. In an effort to advance the field by making the best use of available homology as well as fold recognition approaches along with ab initio folding methods, we have developed Bhageerath‐H Strgen, a homology/ab initio hybrid algorithm for protein conformational sampling. The methodology is tested on the benchmark CASP9 dataset of 116 targets. In 93% of the cases, a structure with TM‐score ≥ 0.5 is generated in the pool of decoys. Further, the performance of Bhageerath‐H Strgen was seen to be efficient in comparison with different decoy generation methods. The algorithm is web enabled as Bhageerath‐H Strgen web tool which is made freely accessible for protein decoy generation ( http://www.scfbio‐iitd.res.in/software/Bhageerath‐HStrgen1.jsp ). © 2013 Wiley Periodicals, Inc.  相似文献   

5.
Database-assisted ab initio protein structure prediction methods have exhibited considerable promise in the recent past, with several implementations being successful in community-wide experiments (CASP). We have employed combinatorial optimization techniques toward solving the protein structure prediction problem. A Monte Carlo minimization algorithm has been employed on a constrained search space to identify minimum energy configurations. The search space is constrained by using radius of gyration cutoffs, the loop backbone dihedral probability distributions, and various secondary structure packing conformations. Simulations have been carried out on several sequences and 1000 conformations have been initially generated. Of these, 50 best candidates have then been selected as probable conformations. The search for the optimum has been simplified by incorporating various geometrical constraints on secondary structural elements using distance restraint potential functions. The advantages of the reported methodology are its simplicity, and modifiability to include other geometric and probabilistic restraints.  相似文献   

6.
Different protein architectures show strong similarities regardless of their amino acid composition: the backbone folds of the different secondary structural elements exhibit nearly identical geometries. To investigate the principles of folding and stability properties, oligopeptide models (that is, HCO-(NH-L-CHR-CO)(n)-NH(2)) have been studied. Previously, ab initio structure determinations have provided a small amount of information on the conformational building units of di- and tripeptides. A maximum of nine differently folded backbone types is available for any natural alpha-amino acid residue, with the exception of proline. All of these conformers have different relative energies. The present study compiles an ab inito database of optimized HCO-(L-Xxx)(n)-NH(2) structures, where 1相似文献   

7.
Carbohydrate‐binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure‐based function‐prediction technique called SPOT‐Struc that identifies carbohydrate‐recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge‐based statistical potential based on the distance‐scaled finite‐ideal gas reference state (DFIRE). The leave‐one‐out cross‐validation of the method on 113 carbohydrate‐binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT‐Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT‐Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT‐Struc will not change significantly if highly homologous templates were used. SPOT‐Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin‐9 from Mus musculus. Thus, SPOT‐Struc is useful for uncovering novel carbohydrate‐binding proteins. SPOT‐Struc is available at http://sparks‐lab.org . © 2014 Wiley Periodicals, Inc.  相似文献   

8.
Knowledge-based energy profiles combined with secondary structure prediction have been applied to molecular modelling refinement. To check the procedure, three different models of human procarboxypeptidase A2 (hPCPA2) have been built using the 3D structures of procarboxypeptidase A1 (pPCPA1) and bovine procarboxypeptidase A (bPCPA) as templates. The results of the refinement can be tested against the X-ray structure of hPCPA2 which has been recently determined. Regions miss-modelled in the activation segment of hPCPA2 were detected by means of pseudo-energies using Prosa II and modified afterwards according to the secondary structure prediction. Moreover, models obtained by automated methods as COMPOSER, MODELLER and distance restraints have also been compared, where it was found possible to find out the best model by means of pseudo-energies. Two general conclusions can be elicited from this work: (1) on a given set of putative models it is possible to distinguish among them the one closest to the crystallographic structure, and (2) within a given structure it is possible to find by means of pseudo-energies those regions that have been defectively modelled.  相似文献   

9.
We have developed a new soft‐core potential function for the conformational search of complex systems with molecular dynamics. The potential function was designed to maintain the main equilibrium properties of the original force field, which means that the soft‐core potential gives physically realistic performance also without additional restraints, different from most of the previous soft‐core potential functions. The performance of the method was demonstrated by applying it to the problem of finding native conformations for protein loops. Short loops from neocarzinostatin and parvalbumin were used as the first test cases. The use of the new soft‐core potential function was shown to improve significantly the performance of molecular dynamics in the search of the native conformation of protein loops. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 388–397, 2000  相似文献   

10.
The neural network method was applied to the prediction of the content of protein secondary structure elements, including alpha-helix, beta-strand, beta-bridge, 3(10)-helix, pi-helix, H-bonded turn, bend, and random coil. The "pair-coupled amino acid composition" originally proposed by K. C. Chou [J Protein Chem 1999, 18, 473] was adopted as the input. Self-consistency and independent-dataset tests were used to appraise the performance of the neural network. Results of both tests indicated high performance of the method.  相似文献   

11.
The energy‐based refinement of protein structures generated by fold prediction algorithms to atomic‐level accuracy remains a major challenge in structural biology. Energy‐based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high‐resolution refinement algorithm called GRID. It takes a three‐dimensional protein structure as input and, using an all‐atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side‐chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high‐resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. © 2012 Wiley Periodicals, Inc.  相似文献   

12.
Atomization reactions are among the most challenging tests for electronic structure methods. We use the first‐principles Weizmann‐4 (W4) computational thermochemistry protocol to generate the W4‐17 dataset of 200 total atomization energies (TAEs) with 3σ confidence intervals of 1 kJ mol−1. W4‐17 is an extension of the earlier W4‐11 dataset; it includes first‐ and second‐row molecules and radicals with up to eight non‐hydrogen atoms. These cover a broad spectrum of bonding situations and multireference character, and as such are an excellent benchmark for the parameterization and validation of highly accurate ab initio methods (e.g., CCSD(T) composite procedures) and double‐hybrid density functional theory (DHDFT) methods. The W4‐17 dataset contains two subsets (i) a non‐multireference subset of 183 systems characterized by dynamical or moderate nondynamical correlation effects (denoted W4‐17‐nonMR) and (ii) a highly multireference subset of 17 systems (W4‐17‐MR). We use these databases to evaluate the performance of a wide range of CCSD(T) composite procedures (e.g., G4, G4(MP2), G4(MP2)‐6X, ROG4(MP2)‐6X, CBS‐QB3, ROCBS‐QB3, CBS‐APNO, ccCA‐PS3, W1, W2, W1‐F12, W2‐F12, W1X‐1, and W2X) and DHDFT methods (e.g., B2‐PLYP, B2GP‐PLYP, B2K‐PLYP, DSD‐BLYP, DSD‐PBEP86, PWPB95, ωB97X‐2(LP), and ωB97X‐2(TQZ)). © 2017 Wiley Periodicals, Inc.  相似文献   

13.
An effective approach is proposed to estimate liquids' contact angles on five commonly used plastics, polyethylene terephthalate, polypropylene, high‐density polyethylene, low‐density polyethylene, and polyvinyl chloride, with pillar‐like structures. A change in liquid droplets' three‐phase contact line due to surface roughness has been proposed in literatures. In this article, contact length ratio, σ, was used as a parameter corresponding to a specific dimension of the pillar‐like structure. Wettability of these rough plastics and their surface free energy were investigated by liquids with various polarities—de‐ionized water (polar), ethylene glycol (monopolar), and α‐bromonaphthalene (apolar). The effects of pillar‐like structures on liquids' contact angles and plastics' surface free energy were studied, and the results reveal that both effects are linear in the range of σ = 1.0 to 1.96. Linear regression models are hence proposed to predict liquids' contact angles, and accuracies are confirmed by less than 6% error for most plastic–liquid combinations. Plastics' surface free energy is also predicted by linear regression models, and the results agree with existing experimental data. Plastic–liquid interactions were also studied, and the results further validate predictions of plastics' surface free energy. In addition, plastics' polarity alteration due to effects of pillar‐like structure were analyzed and reported in this article. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

14.
15.
An important unsolved problem in molecular and structural biology is the protein folding and structure prediction problem. One major bottleneck for solving this is the lack of an accurate energy to discriminate near‐native conformations against other possible conformations. Here we have developed sDFIRE energy function, which is an optimized linear combination of DFIRE (the Distance‐scaled Finite Ideal gas Reference state based Energy), the orientation dependent (polar‐polar and polar‐nonpolar) statistical potentials, and the matching scores between predicted and model structural properties including predicted main‐chain torsion angles and solvent accessible surface area. The weights for these scoring terms are optimized by three widely used decoy sets consisting of a total of 134 proteins. Independent tests on CASP8 and CASP9 decoy sets indicate that sDFIRE outperforms other state‐of‐the‐art energy functions in selecting near native structures and in the Pearson's correlation coefficient between the energy score and structural accuracy of the model (measured by TM‐score). © 2016 Wiley Periodicals, Inc.  相似文献   

16.
Proteins are often characterized in terms of their primary, secondary, tertiary, and quaternary structure. Algorithms such as define secondary structure of proteins (DSSP) can automatically assign protein secondary structure based on the backbone hydrogen‐bonding pattern. However, the assignment of secondary structure elements (SSEs) becomes a challenge when only the Cα coordinates are available. In this work, we present protein C‐alpha secondary structure output (PCASSO), a fast and accurate program for assigning protein SSEs using only the Cα positions. PCASSO achieves ~95% accuracy with respect to DSSP and takes ~0.1 s using a single processor to analyze a 1000 residue system with multiple chains. Our approach was compared with current state‐of‐the‐art Cα‐based methods and was found to outperform all of them in both speed and accuracy. A practical application is also presented and discussed. © 2014 Wiley Periodicals, Inc.  相似文献   

17.
A Hessian‐free low‐mode search algorithm has been developed for large‐scale conformational searching. The new method is termed LLMOD, and it utilizes the ARPACK package to compute low‐mode eigenvectors of a Hessian matrix that is only referenced implicitly, through its product with a series of vectors. The Hessian × vector product is calculated utilizing a finite difference formula based on gradients. LLMOD is the first conformational search method that can be applied to fully flexible, unconstrained protein structures for complex loop optimization problems. LLMOD has been tested on a particularly difficult model system, c‐jun N‐terminal kinase JNK3. We demonstrate that LLMOD was able to correct a P38/ERK2/HCL‐based homology model that grossly misplaced the crucial glycine‐rich loop in the ATP‐binding site. © 2000 John Wiley & Sons, Inc. J Comput Chem 22: 21–30, 2001  相似文献   

18.
Poly(aniline) is a subject of considerable scientific and technological interest. Its homologs such as poly(m‐chloroaniline) potentially offer similar physical‐chemical properties. In this work we present a comparative theoretical study between aniline and the m‐chloroaniline species at several levels of theory. To envisage the possible mechanism of polymerization, we have obtained geometries and electronic structures for the monomers and dimers as well as the corresponding cations and dications. Based on the monomer‐optimized geometries, atomic charges, bond orders, and spin densities, a head‐to‐tail coupling in the electrochemical polymerization is suggested. We have also calculated band gaps and ionization potentials. For the cationic dimers of aniline and m‐chloroaniline, the highest occupied molecular orbital–lowest unoccupied molecular orbital (HOMO–LUMO) energy difference has a smaller value, and oxidation at specific sites may be observed. © 2000 John Wiley & Sons, Inc. Int J Quant Chem 78: 99–111, 2000  相似文献   

19.
PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha–beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.  相似文献   

20.
The molecular geometry of strontium dichloride has been determined by high-temperature electron diffraction (ED) and computational techniques. The computation at the MP2 level of theory yields a shallow bending potential with a barrier of about 0.1 kcal mol(-1) at the linear configuration. The experimentally determined thermal average Sr--Cl bond length, r(g), is 2.625+/-0.010 A and the bond angle, angle-spherical(a), is 142.4+/-4.0 degrees . There is excellent agreement between the equilibrium bond lengths estimated from the experimental data, 2.607+/-0.013 A, and computed at different levels of theory and basis sets, 2.605+/-0.006 A. Based on anharmonic analyses of the symmetric and asymmetric stretching as well as the bending motions of the molecule, we estimated the thermal average structure from the computation for the temperature of the ED experiment. In order to emulate the effect of the matrix environment on the measured vibrational frequencies, a series of complexes with argon atoms, SrCl(2)Ar(n) (n=1-7), with different geometrical arrangements were calculated. The complexes with six or seven argon atoms approximate the interaction best and the computed frequencies of these molecules are closer to the experimental ones than those computed for the free SrCl(2) molecule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号