首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Advances in protein crystallography and homology modeling techniques are producing vast amounts of high resolution protein structure data at ever increasing rates. As such, the ability to quickly and easily extract structural similarities is a key tool in discovering important functional relationships. We report on an approach for creating and maintaining a database of pairwise structure alignments for a comprehensive database comprising the PDB and homology models for the human and select pathogen genomes. Our approach consists of a novel, multistage method for determining pairwise structural similarity coupled with an efficient clustering protocol that approximates a full NxN assessment in a fraction of the time. Since biologists are commonly interested in recently released structures, and the homology models built from them, an automatically updating database of structural alignments has great value. Our approach yields a querying system that allows scientists to retrieve databank-wide protein structure similarities as easily as retrieving protein sequence similarities via BLAST or PSI-BLAST. Basic, noncommercial access to the database can be requested at https://tip.eidogen-sertanty.com/.  相似文献   

2.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

3.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

4.
Generalization of an earlier algorithm has led to the development of new local structural alignment algorithms for prediction of protein-protein binding sites. The algorithms use maximum cliques on protein graphs to define structurally similar protein regions. The search for structural neighbors in the new algorithms has been extended to all the proteins in the PDB and the query protein is compared to more than 60,000 proteins or over 300,000 single-chain structures. The resulting structural similarities are combined and used to predict the protein binding sites. This study shows that the location of protein binding sites can be predicted by comparing only local structural similarities irrespective of general protein folds.  相似文献   

5.
Point Accepted Mutation (PAM) is the Markov model of amino acid replacements in proteins introduced by Dayhoff and her co-workers (Dayhoff et al., 1978). The PAM matrices and other matrices based on the PAM model have been widely accepted as the standard scoring system of protein sequence similarity in protein sequence alignment tools. Here, we present Contact Accepted mutatiOn (CAO), a Markov model of protein residue contact mutations. The CAO model simulates the interchanging of structurally defined side-chain contacts, and introduces additional structural information into protein sequence alignments. Therefore, similarities between structurally conserved sequences can be detected even without apparent sequence similarity. CAO has been benchmarked on the HOMSTRAD database and a subset of the CATH database, by comparing sequence alignments with reference alignments derived from structural superposition. CAO yields scores that reflect coherently the structural quality of sequence alignments, which has implications particularly for homology modelling and threading techniques.  相似文献   

6.
The ProBiS algorithm performs a local structural comparison of the query protein surface against the nonredundant database of protein structures. It finds proteins that have binding sites in common with the query protein. Here, we present a new parallelized algorithm, Parallel‐ProBiS, for detecting similar binding sites on clusters of computers. The obtained speedups of the parallel ProBiS scale almost ideally with the number of computing cores up to about 64 computing cores. Scaling is better for larger than for smaller query proteins. For a protein with almost 600 amino acids, the maximum speedup of 180 was achieved on two interconnected clusters with 248 computing cores. Source code of Parallel‐ProBiS is available for download free for academic users at http://probis.cmm.ki.si/download . © 2012 Wiley Periodicals, Inc.  相似文献   

7.
Integration of knowledge on the sequence-structure correlation of proteins provides a basis for the structural design of artificial novel proteins. As one of strategies, it is effective to consider a short segment, whose size is in between an amino acid and a domain, as a correlation unit for exploring the structure-to-sequence relationship. Here we report the development of a database called ProSeg, which consists of two sub-databases, Segment DB and Cluster DB. Segment DB contains tens of thousands of segments that were prepared by dividing the primary sequences of 370 proteins using a sliding L-residue window (L = 5, 9, 11, 15). These segments were classified into several thousands of clusters according to their three-dimensional structural resemblance. Cluster DB contains much cluster-related information, which includes image, rank, frequency, secondary structure assignment, sequence profile, etc. Users can search for a suitable cluster by inputting an appropriate parameter (i.e., PDB ID, dihedral angles, or DSSP symbols), which identifies the backbone structure of a query segment. Analogous to a language, ProSeg could be regarded as a ‘structure-sequence dictionary’ that contains over 10,000 ‘protein words’. ProSeg is freely accessible through the Internet ().  相似文献   

8.
9.
Protein-protein interactions are central to most biological processes and represent a large and important class of targets for human therapeutics. Small molecules containing peptide substituents may mimic regions of interacting proteins and inhibit their interactions. We set out to develop efficient methods to screen for similarities between known peptide structures within proteins and small molecules. We developed a method to rank peptide-compound similarities, that is restricted to small linear motifs in proteins, and to compounds containing amino acid substituents. Application to a search of the PubChem database (5.4 million compounds) using all short motifs on accessible surface areas in a nonredundant set of 11 488 peptides from the protein structure database PDB demonstrated the feasibility of the method for high throughput comparisons and the availability of compounds with comparable substituents: over 6 million compound-peptide pairs shared at least three amino acid substituents, approximately 100 000 of which had an rmsd score of less than 1 A. A Z-score function was developed that compares matches of a compound to different instances of the peptide motif in PDB, providing an appropriate scoring function for comparison among peptide-compound similarities involving different numbers of atoms (while simultaneously enriching for similarities that are likely to be more specific for the protein of interest). We applied the method to searches of known short protein motifs against the National Cancer Institute Developmental Therapeutic Program compound database, identifying a known true positive.  相似文献   

10.
Shape-based methods for aligning and scoring ligands have proven to be valuable in the field of computer-aided drug design. Here, we describe a new shape-based flexible ligand superposition and virtual screening method, Phase Shape, which is shown to rapidly produce accurate 3D ligand alignments and efficiently enrich actives in virtual screening. We describe the methodology, which is based on the principle of atom distribution triplets to rapidly define trial alignments, followed by refinement of top alignments to maximize the volume overlap. The method can be run in a shape-only mode or it can include atom types or pharmacophore feature encoding, the latter consistently producing the best results for database screening. We apply Phase Shape to flexibly align molecules that bind to the same target and show that the method consistently produces correct alignments when compared with crystal structures. We then illustrate the effectiveness of the method for identifying active compounds in virtual screening of eleven diverse targets. Multiple parameters are explored, including atom typing, query structure conformation, and the database conformer generation protocol. We show that Phase Shape performs well in database screening calculations when compared with other shape-based methods using a common set of actives and decoys from the literature.  相似文献   

11.
12.
对泉生热袍菌进行了结构基因组的选靶研究,从泉生热袍菌的蛋白组中挑选了20个蛋白质作为第一批进行结构测定的目标,以发现新的蛋白质折叠模式. 选靶研究主要使用了BLAST搜索, PSI-BLAST搜索和ProtoNet数据库搜索等方法. 另外,还用PredictProtein程序对选中的蛋白质进行了二级结构和外形预测. 选中的20个蛋白质中有8个被克隆、表达和纯化,其中2个得到了单晶并收集了X衍射数据. 实验结果和最近一些文献报道的结果表明,挑选的一些蛋白质具有新的折叠模式,表明了这种选靶策略的有效性.  相似文献   

13.
We present a new method (fFLASH) for the virtual screening of compound databases that is based on explicit three-dimensional molecular superpositions. fFLASH takes the torsional flexibility of the database molecules fully into account, and can deal with an arbitrary number of conformation-dependent molecular features. The method utilizes a fragmentation-reassembly approach which allows for an efficient sampling of the conformational space. A fast clique-based pattern matching algorithm generates alignments of pairs of adjacent molecular fragments on the rigid query molecule that are subsequently reassembled to complete database molecules. Using conventional molecular features (hydrogen bond donors and acceptors, charges, and hydrophobic groups) we show that fFLASH is able to rapidly produce accurate alignments of medium-sized drug-like molecules. Experiments with a test database containing a diverse set of 1780 drug-like molecules (including all conformers) have shown that average query processing times of the order of 0.1 seconds per molecule can be achieved on a PC.  相似文献   

14.
Despite recent advances in fold recognition algorithms that identify template structures with distant homology to the target sequence, the quality of the target-template alignment can be a major problem for distantly related proteins in comparative modeling. Here we report for the first time on the use of ensembles of pairwise alignments obtained by stochastic backtracking as a means to improve three-dimensional comparative protein models. In every one of the 35 cases, the ensemble produced by the program probA resulted in alignments that were closer to the structural alignment than those obtained from the optimal alignment. In addition, we examined the lowest energy structure among these ensembles from four different structural assessment methods and compared these with the optimal and structural alignment model. The structural assessment methods consisted of the DFIRE, DOPE, and ProsaII statistical potential energies and the potential energy from the CHARMM protein force field coupled to a Generalized Born implicit solvent model. The results demonstrate that the generation of alignment ensembles through stochastic backtracking using probA combined with one of the statistical potentials for assessing three-dimensional structures can be used to improve comparative models.  相似文献   

15.
Protein sequence space is vast compared to protein fold space. This raises important questions about how structures adapt to evolutionary changes in protein sequences. A growing trend is to regard protein fold space as a continuum rather than a series of discrete structures. From this perspective, homologous protein structures within the same functional classification should reveal a constant rate of structural drift relative to sequence changes. The clusters of orthologous groups (COG) classification system was used to annotate homologous bacterial protein structures in the Protein Data Bank (PDB). The structures and sequences of proteins within each COG were compared against each other to establish their relatedness. As expected, the analysis demonstrates a sharp structural divergence between the bacterial phyla Firmicutes and Proteobacteria. Additionally, each COG had a distinct sequence/structure relationship, indicating that different evolutionary pressures affect the degree of structural divergence. However, our analysis also shows the relative drift rate between sequence identity and structure divergence remains constant.  相似文献   

16.
Prediction of the binding mode for a series of active compounds, in the absence of known protein structure, is a problem of paramount importance in rational drug design. GAPE (genetic algorithm for pharmacophore elucidation) is an automated multicompound overlay creation program, based on the original GASP program, that uses a genetic algorithm to fully explore the conformational space of the input structures and their alignments, so as to elucidate a pharmacophore. The software was evaluated on 13 test systems from nine protein targets using overlaid ligands extracted from the PDB. Using objective rmsd criteria and starting from 2D structures, in the absence of any protein information, GAPE was observed in eight systems to approximate the crystallographically observed binding mode. In the predicted alignments for each of those eight systems, at least half the input structures were within 2 ? rmsd of the crystal structure coordinates. Further analysis, using stricter subjective criteria, showed considerable success in five systems. For example, the prediction for a set of 12 ligands targeting P38 had 11 ligands with a 1.8 ? rmsd to crystal structure coordinates. Finally, the algorithm was favorably compared with the current GASP and Galahad programs.  相似文献   

17.
Computational tools can bridge the gap between sequence and protein 3D structure based on the notion that information is to be retrieved from the databases and that knowledge-based methods can help in approaching a solution of the protein-folding problem. To this aim our group has implemented neural network-based predictors capable of performing with some success in different tasks, including predictions of the secondary structure of globular and membrane proteins, the topology of membrane proteins and porins and stable alpha-helical segments suited for protein design. Moreover we have developed methods for predicting contact maps in proteins and the probability of finding a cysteine in a disulfide bridge, tools which can contribute to the goal of predicting the 3D structure starting from the sequence (the so called ab initio prediction). All our predictors take advantage of evolution information derived from the structural alignments of homologous (evolutionary related) proteins and taken from the sequence and structure databases. When it is necessary to build models for proteins of unknown spatial structure, which have very little homology with other proteins of known structure, non-standard techniques need to be developed and the tools for protein structure predictions may help in protein modeling. The results of a recent simulation performed in our lab highlights the role of high performing computing technology and of tools of computational biology in protein modeling and peptidomimetic design.  相似文献   

18.
19.
KiBank is a database of inhibition constant (Ki) values with 3D structures of target proteins and chemicals. Ki values were accumulated from peer-reviewed literature searched via PubMed. The 3D structure files of target proteins were originally from Protein Data Bank (PDB), while the 2D structure files of the chemicals were collected together with the Ki values and then converted into 3D ones. In KiBank, the chemical and protein 3D structures with hydrogen atoms were optimized by energy minimization and stored in MDL MOL and PDB format, respectively.

KiBank is designed to support structure-based drug design. It provides structure files of proteins and chemicals ready for use in virtual screening through automated docking methods, while the Ki values can be applied for tests of docking/scoring combinations, program parameter settings, and calibration of empirical scoring functions. Additionally, the chemical structures and corresponding Ki values in KiBank are useful for lead optimization based on quantitative structure–activity relationship (QSAR) techniques.

KiBank is updated on a daily basis and is freely available at http://kibank.iis.u-tokyo.ac.jp/. As of August 2004, KiBank contains 8000 Ki values, over 6000 chemicals and 166 proteins covering the subtypes of receptors and enzymes.  相似文献   


20.
Electron transfer dissociation (ETD)-based top-down mass spectrometry (MS) is the method of choice for in-depth structure characterization of large peptides, small- and medium-sized proteins, and non-covalent protein complexes. Here, we describe the performance of this approach for structural analysis of intact proteins as large as the 80 kDa serotransferrin. Current time-of-flight (TOF) MS technologies ensure adequate resolution and mass accuracy to simultaneously analyze intact 30–80 kDa protein ions and the complex mixture of their ETD product ions. Here, we show that ETD TOF MS is efficient and may provide extensive sequence information for unfolded and highly charged (around 1 charge/kDa) proteins of ~30 kDa and structural motifs embedded in larger proteins. Sequence regions protected by disulfide bonds within intact non-reduced proteins oftentimes remain uncharacterized due to the low efficiency of their fragmentation by ETD. For serotransferrin, reduction of S–S bonds leads to significantly varied ETD fragmentation pattern with higher sequence coverage of N- and C-terminal regions, providing a complementary structural information to top-down analysis of its oxidized form.
Figure
ETD TOF MS provides extensive sequence information for unfolded and highly charged proteins of ~30 kDa and above. In addition to charge number and distribution along the protein, disulfide bonds direct ETD fragmentation. For intact non-reduced 80 kDa serotransferrin, sequence regions protected by disulfide bonds oftentimes remain uncharacterized. Reduction of disulfide bonds of serotransferrin increases ETD sequence coverage of its N- and C-terminal regions, providing a complementary structural information to the top-down analysis of its oxidized form  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号