首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automated classification of proteins is indispensable for further in vivo investigation of excessive number of unknown sequences generated by large scale molecular biology techniques. This study describes a discriminative system based on feature space mapping, called subsequence profile map (SPMap) for functional classification of protein sequences. SPMap takes into account the information coming from the subsequences of a protein. A group of protein sequences that belong to the same level of classification is decomposed into fixed-length subsequences and they are clustered to obtain a representative feature space mapping. Mapping is defined as the distribution of the subsequences of a protein sequence over these clusters. The resulting feature space representation is used to train discriminative classifiers for functional families. The aim of this approach is to incorporate information coming from important subregions that are conserved over a family of proteins while avoiding the difficult task of explicit motif identification. The performance of the method was assessed through tests on various protein classification tasks. Our results showed that SPMap is capable of high accuracy classification in most of these tasks. Furthermore SPMap is fast and scalable enough to handle large datasets.  相似文献   

2.
The Pfam database is an important tool in genome annotation, since it provides a collection of curated protein families. However, a subset of these families, known as domains of unknown function (DUFs), remains poorly characterized. We have related sequences from DUF404, DUF407, DUF482, DUF608, DUF810, DUF853, DUF976 and DUF1111 to homologs in PDB, within the midnight zone (9–20%) of sequence identity. These relationships were extended to provide functional annotation by sequence analysis and model building. Also described are examples of residue plasticity within enzyme active sites, and change of function within homologous sequences of a DUF.  相似文献   

3.
4.
BACKGROUND: Recent advances in the molecular biology of polyketide biosynthesis have allowed the engineering of polyketide synthases and the biological ('combinatorial') synthesis of novel polyketides. Additional structural diversity in these compounds could be expected if more diverse polyketide synthases (PKS) could be utilised. Fungal polyketides are highly variable in structure, reflecting a potentially wide range of differences in the structure and function of fungal PKS complexes. Relatively few fungal synthases have been investigated, perhaps because of a lack of suitable genetic techniques available for the isolation and manipulation of gene clusters from diverse hosts. We set out to devise a general method for the detection of specific PKS genes from fungi. RESULTS: We examined sequence data from known fungal and bacterial polyketide synthases as well as sequence data from bacterial, fungal and vertebrate fatty acid synthases in order to determine regions of high sequence conservation. Using individual domains such as beta-ketoacylsynthases (KS), beta-ketoreductases (KR) and methyltransferases (MeT) we determined specific short (ca 7 amino acid) sequences showing high conservation for particular functional domains (e.g. fungal KR domains involved in producing partially reduced metabolites; fungal KS domains involved in the production of highly reduced metabolites etc.). Degenerate PCR primers were designed matching these regions of specific homology and the primers were used in PCR reactions with fungal genomic DNA from a number of known polyketide producing species. Products obtained from these reactions were sequenced and shown to be fragments from as-yet undiscovered PKS gene clusters. The fragments could be used in blotting experiments with either homologous or heterologous fungal genomic DNA. CONCLUSIONS: A number of sequences are presented which have high utility for the discovery of novel fungal PKS gene clusters. The sequences appear to be specific for particular types of fungal polyketide (i.e. non-reduced, partially reduced or highly reduced KS domains). We have also developed primers suitable for amplifying segments of fungal genes encoding polyketide C-methyltransferase domains. Genomic fragments amplified using these specific primer sequences can be used in blotting experiments and have high potential as aids for the eventual cloning of new fungal PKS gene clusters.  相似文献   

5.
Although the characterization of proteins cannot solely rely upon sequence similarity, it has been widely proved that all-vs-all massive sequence comparisons may be an effective approach and a good basis for the prediction of biochemical functions or for the delineation of common shared properties. The program Cluster-C presented here enables a stand-alone and efficient construction of protein families within whole proteomes. The algorithm, which is based on the detection of cliques, ensures a high level of connectivity within the clusters. As opposed to the single transitive linkage method, Cluster-C allows a large number of sequences to be classified in such a way that the multidomain proteins do not produce a chain-grouping effect resulting in meaningless clusters. Moreover, some proteins can be present in several different but relevant clusters, which is of help in the determination of their functional domains. In the present analysis we used the Z-value, an evaluation of the significance of the similarity score, as the criterion for connecting sequences (the user can freely define the threshold of the similarity criterion). The clusters built with a rather low threshold (Z= 14) include more than 97% of the sequences and are consistent with known protein families and PROSITE patterns.  相似文献   

6.
7.
Although thousands of microRNAs (miRNAs) have been identified in recent experimental efforts, it remains a challenge to explore their specific biological functions through molecular biological experiments. Since those members from same family share same or similar biological functions, classifying new miRNAs into their corresponding families will be helpful for their further functional analysis. In this study, we initially built a vector space by characterizing the features from miRNA sequences and structures according to their miRBase family organizations. Then we further assigned miRNAs into its specific miRNA families by developing a novel genes discriminant analysis (GDA) approach in this study. As can be seen from the results of new families from GDA, in each of these new families, there was a high degree of similarity among all members of nucleotide sequences. At the same time, we employed 10-fold cross-validation machine learning to achieve the accuracy rates of 68.68%, 80.74%, and 83.65% respectively for the original miRNA families with no less than two, three, and four members. The encouraging results suggested that the proposed GDA could not only provide a support in identifying new miRNAs’ families, but also contributing to predicting their biological functions.  相似文献   

8.
9.
Tea (Camellia sinensis (L.) O. Kuntze) is an economically important plant cultivated for its leaves. Infection of Pestalotiopsis theae in leaves causes gray blight disease and enormous loss to the tea industry. We used suppressive subtractive hybridization (SSH) technique to unravel the differential gene expression pattern during gray blight disease development in tea. Complementary DNA from P. theae-infected and uninfected leaves of disease tolerant cultivar UPASI-10 was used as tester and driver populations respectively. Subtraction efficiency was confirmed by comparing abundance of ??-actin gene. A total of 377 and 720 clones with insert size >250?bp from forward and reverse library respectively were sequenced and analyzed. Basic Local Alignment Search Tool analysis revealed 17 sequences in forward SSH library have high degree of similarity with disease and hypersensitive response related genes and 20 sequences with hypothetical proteins while in reverse SSH library, 23 sequences have high degree of similarity with disease and stress response-related genes and 15 sequences with hypothetical proteins. Functional analysis indicated unknown (61 and 59?%) or hypothetical functions (23 and 18?%) for most of the differentially regulated genes in forward and reverse SSH library, respectively, while others have important role in different cellular activities. Majority of the upregulated genes are related to hypersensitive response and reactive oxygen species production. Based on these expressed sequence tag data, putative role of differentially expressed genes were discussed in relation to disease. We also demonstrated the efficiency of SSH as a tool in enriching gray blight disease related up- and downregulated genes in tea. The present study revealed that many genes related to disease resistance were suppressed during P. theae infection and enhancing these genes by the application of inducers may impart better disease tolerance to the plants.  相似文献   

10.
To collect information on gene expression during the dark period in the luminous dinoflagellate Lingulodinium polyedrum, normalized complementary DNA (cDNA) libraries were constructed from cells collected during the first hour of night phase in a 12:12 h light‐dark cycle. A total of 4324 5′‐end sequence tags were isolated. The sequences were grouped into 2111 independent expressed sequence tags (EST) from which 433 groups were established by similarity searches of the public nonredundant protein database. Homology analysis of the total sequences indicated that the luminous dinoflagellate is more similar to land plants and animals (vertebrates and invertebrates) than to prokaryotes or algae. We also isolated three bioluminescence‐related (luciferase and two luciferinbinding proteins [LBP]) and 37 photosynthesis‐related genes. Interestingly, two kinds of LBP genes occur in multiple copies in the genome, in contrast to the single luciferase gene. These cDNA clones and EST sequence data should provide a powerful resource for future genome‐wide functional analyses for uncharacterized genes.  相似文献   

11.
12.
Momordica charantia (bitter gourd, bitter melon) is a monoecious Cucurbitaceae with anti-oxidant, anti-microbial, anti-viral and anti-diabetic potential. Molecular studies on this economically valuable plant are very essential to understand its phylogeny and evolution. MicroRNAs (miRNAs) are conserved, small, non-coding RNA with ability to regulate gene expression by bind the 3′ UTR region of target mRNA and are evolved at different rates in different plant species. In this study we have utilized homology based computational approach and identified 27 mature miRNAs for the first time from this bio-medically important plant. The phylogenetic tree developed from binary data derived from the data on presence/absence of the identified miRNAs were noticed to be uncertain and biased. Most of the identified miRNAs were highly conserved among the plant species and sequence based phylogeny analysis of miRNAs resolved the above difficulties in phylogeny approach using miRNA. Predicted gene targets of the identified miRNAs revealed their importance in regulation of plant developmental process. Reported miRNAs held sequence conservation in mature miRNAs and the detailed phylogeny analysis of pre-miRNA sequences revealed genus specific segregation of clusters.  相似文献   

13.
The analysis of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of plant nuclear DNA. In the present study, we analyzed the nature of pCtKpnI-I and pCtKpnI-II tandem repeated sequences, reported earlier in Carthamus tinctorius. Interestingly, homolog of pCtKpnI-I repeat sequence was also found to be present in widely divergent families of angiosperms. pCtKpnI-I showed high sequence similarity but low copy number among various taxa of different families of angiosperms analyzed. In comparison, pCtKpnI-II was specific to the genus Carthamus and was not present in any other taxa analyzed. The molecular structure of pCtKpnI-I was analyzed in various unrelated taxa of angiosperms to decipher the evolutionary conserved nature of the sequence and its possible functional role.  相似文献   

14.
Protein-based polymers possess chemically defined sequences that can encode diverse properties and functions into a new class of biopolymeric materials. However, sequence variation that emerges from evolution can obscure the sequence–function relationships of naturally derived polymers. One strategy to clarify these relationships is to identify common sequences between proteins with similar functions. These conserved sequences often emerge from repeat proteins, and “consensus repeat sequences” provide a convenient platform for systematic investigations of biopolymer sequence–property relationships. In this review, we highlight recent approaches to engineer tunable polymeric materials using monomer-scale design of consensus repeat proteins. We explore established and emerging protein-based materials with mechanical resilience, thermodynamic phase behavior, chemical responsiveness, biomolecular transport, and hierarchical structure. Overall, recent advances in the monomer-scale design of repetitive protein polymers present exciting fundamental and translational opportunities for polymer scientists and engineers.  相似文献   

15.
16.
Recently we have proposed a model for folding proteins into packed’ clusters’. We have constructed a local homology measure for protein fold classes by projecting consecutively secondary structures onto a lattice. Taking into account hydrophobic forces we have found a mechanism for formation of clusters containing magic numbers of secondary structures and multipla of these clusters. A scheme for the relation between the sequence information and the native fold is given. We have performed a statistical analysis of available protein structures and found agreement with the predicted preferred abundances. In this paper we demonstrate that the results are robust to variations in the coordination number of the model.  相似文献   

17.
18.
Next-generation sequencing is regularly used to identify viral sequences in DNA or RNA samples of infected hosts. A major step of most pipelines for virus detection is to map sequence reads against known virus genomes. Due to small differences between the sequences of related viruses, and due to several biological or technical errors, mapping underlies uncertainties. As a consequence, the resulting list of detected viruses can lack robustness.A new approach for generating artificial sequencing reads together with a strategy of resampling from the original findings is proposed that can help to assess the robustness of the originally identified list of viruses. From the original mapping result in form of a SAM file, a set of statistical distributions are derived. These are used in the resampling pipeline to generate new artificial reads which are again mapped versus the reference genomes. By summarizing the resampling procedure, the analyst receives information about whether the presence of a particular virus in the sample gains or losses evidence, and thus about the robustness of the original mapping list but also that of individual viruses in this list. To judge robustness, several indicators are derived from the resampling procedure such as the correlation between original and resampling read counts, or the statistical detection of outliers in the differences of read counts. Additionally, graphical illustrations of read count shifts via Sankey diagrams are provided.To demonstrate the use of the new approach, the resampling approach is applied to three real-world data samples, one of them with laboratory-confirmed Influenza sequences, and to artificially generated data where virus sequences have been spiked into the sequencing data of a host. By applying the resampling pipeline, several viruses drop from the original list while new viruses emerge, showing robustness of those viruses that remain in the list.The evaluation of the new approach shows that the resampling approach is helpful to analyze the viral content of a biological sample, to rate the robustness of original findings and to better show the overall distribution of findings. The method is also applicable to other virus detection pipelines based on read mapping.  相似文献   

19.
The homology of peptide sequences selected from a 7mer phage display library with antibodies elicited by the multicelled parasite Taenia solium in cerebrospinal fluid and serum of neurocysticercosis (NCC) patients and by antibodies of uninfected control patients with similar neurological complications of other ethiology (non-NCC) were analyzed using a PILEUP-Tudos sequence alignments program. The analysis generated dendrograms bearing two types of sequence clusters, those containing (1) only NCC patients-derived peptides and (2) both NCC- and control non-CC -- patient derivatives. By using ELISA, peptides that were selected by the antibodies were identified predominantly in the NCC-derived clusters. In repeated analysis in which sequences were added or removed, the first type of clusters maintained their structure, while the second type of clusters were split into many separate homology units dispersed throughout the guide tree. These results are interpreted as the ability of the analysis to segregate NCC-specific peptide sequences from other sequences. Altogether, this study demonstrates the high potential of the PILEUP-Tudos computer program to analyze phagotope collections recovered through biopanning with polyclonal antibodies elicited in patients by complex and as yet unknown multiple pathogenic antigens and to separate all phagotopes that are disease-relevant on the basis of the sequence homology.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号