首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
This paper presents a novel method of mining biological data using a self-organizing map (SOM). After partitioning a set of protein sequences using SOM, conventional homology alignment is applied to each cluster to determine the conserved local motif (biological pattern) for the cluster. These local motifs are then regarded as rules for prediction and classification. In the application to the prediction of HIV protease cleavage sites in proteins, we found that the rules derived from this method are much more robust than those derived from the decision tree method.  相似文献   

3.
Summary A new database of conserved amino acid residues is derived from the multiple sequence alignment of over 84 families of protein sequences that have been reported in the literature. This database contains sequences of conserved hydrophobic core patterns which are probably important for structure and function, since they are conserved for most sequences in that family. This database differs from other single-motif or signature databases reported previously, since it contains multiple patterns for each family. The new database is used to align a new sequence with the conserved regions of a family. This is analogous to reports in the literature where multiple sequence alignments are used to improve a sequence alignment. A program called Homology-Plot (suitable for IBM or compatible computers) uses this database to find homology of a new sequence to a family of protein sequences. There are several advantages to using multiple patterns. First, the program correctly identifies a new sequence as a member of a known family. Second, the search of the entire database is rapid and requires less than one minute. This is similar to performing a multiple sequence alignment of a new sequence to all of the known protein family sequences. Third, the alignment of a new sequence to family members is reliable and can reproduce the alignment of conserved regions already described in the literature. The speed and efficiency of this method is enhanced, since there is no need to score for insertions or deletions as is done in the more commonly used sequence alignment methods. In this method only the patterns are aligned. HomologyPlot also provides general information on each family, as well as a listing of patterns in a family.  相似文献   

4.
Protein-protein interactions are central to most biological processes and represent a large and important class of targets for human therapeutics. Small molecules containing peptide substituents may mimic regions of interacting proteins and inhibit their interactions. We set out to develop efficient methods to screen for similarities between known peptide structures within proteins and small molecules. We developed a method to rank peptide-compound similarities, that is restricted to small linear motifs in proteins, and to compounds containing amino acid substituents. Application to a search of the PubChem database (5.4 million compounds) using all short motifs on accessible surface areas in a nonredundant set of 11 488 peptides from the protein structure database PDB demonstrated the feasibility of the method for high throughput comparisons and the availability of compounds with comparable substituents: over 6 million compound-peptide pairs shared at least three amino acid substituents, approximately 100 000 of which had an rmsd score of less than 1 A. A Z-score function was developed that compares matches of a compound to different instances of the peptide motif in PDB, providing an appropriate scoring function for comparison among peptide-compound similarities involving different numbers of atoms (while simultaneously enriching for similarities that are likely to be more specific for the protein of interest). We applied the method to searches of known short protein motifs against the National Cancer Institute Developmental Therapeutic Program compound database, identifying a known true positive.  相似文献   

5.
6.
Nitrilases, member of nitrilase superfamily catalyse the hydrolysis of different nitriles to corresponding amides and acids. In this article, we demonstrate two-fold computational comparative analysis on coding gene sequences, amino acid sequences, three-dimensional structure of the nitrilases from different species and discovered conserved motifs linked with related species. A large ensemble-based dataset was utilized from bacteria, fungi, plants and animals. Here, we used comparative genomics, motif analyses and Bayesian phylogenetic analyses in combination with structural analyses [molecular dynamics simulation, principal component analysis (PCA), dynamic cross correlation (DCCM), root mean squared inner product (RMSIP), free energy surface (FES)] to investigate the evolution, ecological relationship and structure-function association of nitrilase family. The inferred evolutionary tree displayed nitrilase gene clusters to be shared among bacteria, fungi and plants. Structural analysis revealed that the folding of catalytic sites is similar among species; however, the loop region varies. We provide evidence based on PCA that the nitrilases are clustered into different clades due to variation in side chains. Numerous of significant correlations were found between sequence clades and the structural discriminating properties of nitrilases originating from different species. The results are consistent with the hypothesis that bacterial nitrilases are in ecological and evolutionary relationships with fungi and plants during plant-pathogen interaction to large extent. This compact and detail results also open new dimensions for further studying and improvement of industrially important nitrilase enzymes.  相似文献   

7.
The evolvability of proteins is not only restricted by functional and structural importance, but also by other factors such as gene duplication, protein stability, and an organism's robustness. Recently, intrinsically disordered proteins (IDPs)/regions (IDRs) have been suggested to play a role in facilitating protein evolution. However, the mechanisms by which this occurs remain largely unknown. To address this, we have systematically analyzed the relationship between the evolvability, stability, and function of IDPs/IDRs. Evolutionary analysis shows that more recently emerged IDRs have higher evolutionary rates with more functional constraints relaxed (or experiencing more positive selection), and that this may have caused accelerated evolution in the flanking regions and in the whole protein. A systematic analysis of observed stability changes due to single amino acid mutations in IDRs and ordered regions shows that while most mutations induce a destabilizing effect in proteins, mutations in IDRs cause smaller stability changes than in ordered regions. The weaker impact of mutations in IDRs on protein stability may have advantages for protein evolvability in the gain of new functions. Interestingly, however, an analysis of functional motifs in the PROSITE and ELM databases showed that motifs in IDRs are more conserved, characterized by smaller entropy and lower evolutionary rate, than in ordered regions. This apparently opposing evolutionary effect may be partly due to the flexible nature of motifs in IDRs, which require some key amino acid residues to engage in tighter interactions with other molecules. Our study suggests that the unique conformational and thermodynamic characteristics of IDPs/IDRs play an important role in the evolvability of proteins to gain new functions.  相似文献   

8.
9.
Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20–40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain.  相似文献   

10.
11.
基于氨基酸模糊聚类分析的跨膜区域预测   总被引:2,自引:0,他引:2  
邓勇  刘琪  李亦学 《化学学报》2004,62(19):1968-1972
跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.  相似文献   

12.
Human peptidylarginine deiminases (hPADs) are a family of five calcium-dependent enzymes that facilitate citrullination, which is the post-translational modification of peptidyl arginine to peptidyl citrulline. The isozymes hPAD2 and hPAD4 have been implicated in the development and progression of several autoimmune diseases, including rheumatoid arthritis and multiple sclerosis. To better characterize the primary and secondary structure determinants of citrullination specificity, we mined the literature for protein sequences susceptible to citrullination by hPAD2 or hPAD4. First, protein secondary structure classification (α-helix, β-sheet, or coil) was predicted using the PSIPRED software. Next, we used motif-x and pLogo to extract and visualize statistically significant motifs within each data set. Within the data sets of peptides predicted to lie in coil regions, both hPAD2 and hPAD4 appear to favor citrullination of glycine-containing motifs, while distinct hydrophobic motifs were identified for hPAD2 citrullination sites predicted to reside within α-helical and β-sheet regions. Additionally, we identified potential substrate overlap between coil region citrullination and arginine methylation. Together, these results confirm the importance and offer some insight into the role of secondary structure elements for citrullination specificity, and provide biological context for the existing hPAD specificity and arginine post-translational modification literature.  相似文献   

13.
The Arabidopsis ECERIFERUM1 (CER1) protein is a decarbonylase that converts fatty acid metabolites into alkanes. Alkanes are components of waxes in the plant cuticle, a waterproof barrier serving to protect land plants from both biotic and abiotic stimuli. CER1 enzymes can be used to produce alternative and sustainable hydrocarbons in eukaryotic systems. In this report we identified 193 CER1 and 128 CER3 sequences from 56 land plants respectively. CER1 and CER3 proteins have high amino acid similarity and both are involved in alkane synthesis in Arabidopsis. The common homologues of CER1 and CER3 genes were identified in three species of chlorophytes, which may be one of the earliest plant taxa that possess CER1 and CER3 genes. To facilitate potential applications, the 3-dimensional structure and conserved motifs of CER1 proteins were also characterized. CER1 and CER3 proteins are structurally similar, but CER1 proteins have more conserved histidine-containing motifs common to fatty acid hydroxylases and stearoyl-CoA desaturases. There was no significant loss or gain of protein motifs after ancient and recent duplications, suggesting that varied properties of CER1 proteins may be associated with less-conserved regions. Among 56 land plants, the codon-based assessments of selection modes revealed that neither entire proteins nor individual amino acids of CER1 proteins were significantly subjected to positive selection, indicating that CER1 proteins are highly conserved throughout evolution.  相似文献   

14.
CCG triplet repeats can fold into tetraplex structures, which are associated with the expansion of (CCG)n trinucleotide sequences in certain neurological diseases. These structures are stabilized by intertwining i‐motifs. However, the structural basis for tetraplex i‐motif formation in CCG triplet repeats remains largely unknown. We report the first crystal structure of a CCG‐repeat sequence, which shows that two dT(CCG)3A strands can associate to form a tetraplex structure with an i‐motif core containing four C:C+ pairs flanked by two G:G homopurine base pairs as a structural motif. The tetraplex core is attached to a short parallel‐stranded duplex. Each hairpin itself contains a central CCG loop in which the nucleotides are flipped out and stabilized by stacking interactions. The helical twists between adjacent cytosine residues of this structure in the i‐motif core have an average value of 30°, which is greater than those previously reported for i‐motif structures.  相似文献   

15.
Reconfigurable molecular events are key to molecular machines. In response to external cues, molecular machines rearrange/change their structures to perform certain functions. Such machines exist in nature, for example cell surface receptors, and have been artificially engineered. To be able to build sophisticated and efficient molecular machines for an increasing range of applications, constant efforts have been devoted to developing new mechanisms of controllable structural reconfiguration. Herein, we report a general design principle for pH‐responsive DNA motifs for general DNA sequences (not limited to triplex or i‐motif forming sequences). We have thoroughly characterized such DNA motifs by polyacrylamide gel electrophoresis (PAGE) and fluorescence spectroscopy and demonstrated their applications in dynamic DNA nanotechnology. We expect that it will greatly facilitate the development of DNA nanomachines, biosensing/bioimaging, drug delivery, etc.  相似文献   

16.
Recent investigations on the stability of proteins have demonstrated various structural factors, but few have considered sequence factors such as protein motifs. These motifs represent highly conserved regions and describe critical regions that may only exist on proteins that remain functional at high temperatures. This investigation presents a method for identifying and comparing corresponding mesophilic and thermophilic sequence motifs between protein families. Discriminative motifs that are conserved only in the mesophilic or thermophilic subfamily are identified. Analysis of the results shows that, although the subfamilies of most protein families share similar motifs, some discriminative motifs are present in particular thermophilic/mesophilic subfamilies. The thermophilic discriminative motifs are conserved only in thermophilic organisms, revealing that physiochemical principles support thermostability.  相似文献   

17.
Cell-penetrating peptides (CPPs) provide promising tools for the cellular delivery of molecular cargos ranging in size from small molecules and peptides to proteins and quantum dots. CPPs are typically cationic and/or amphipathic sequences that are unstructured or alpha-helical. We expand the repertoire of cell-penetrating motifs by designing encodable CPPs possessing type-II polyproline (PPII) helical structure. These motifs surpass the uptake efficiency of existing CPPs and are not cytotoxic at concentrations 100 times greater than that necessary for delivery. By replacing the PPII helix of a miniature protein, the motif can endow intrinsic cell permeability without increasing molecular size.  相似文献   

18.
RNA tertiary interactions or tertiary motifs are conserved structural patterns formed by pairwise interactions between nucleotides. They include base-pairing, base-stacking, and base-phosphate interactions. A-minor motifs are the most common tertiary interactions in the large ribosomal subunit. The A-minor motif is a nucleotide triple in which minor groove edges of an adenine base are inserted into the minor groove of neighboring helices, leading to interaction with a stabilizing base pair. We propose here novel features for identifying and predicting A-minor motifs in a given three-dimensional RNA molecule. By utilizing the features together with machine learning algorithms including random forests and support vector machines, we show experimentally that our approach is capable of predicting A-minor motifs in the given RNA molecule effectively, demonstrating the usefulness of the proposed approach. The techniques developed from this work will be useful for molecular biologists and biochemists to analyze RNA tertiary motifs, specifically A-minor interactions.  相似文献   

19.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

20.
Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman’s subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号