首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins.  相似文献   

2.
A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value<0.006) than using either sequence similarity or subcellular location independently.  相似文献   

3.
Due to the exponential growth of sequenced genomes, the need to quickly provide accurate annotation for existing and new sequences is paramount to facilitate biological research. Current sequence comparison approaches fail to detect homologous relationships when sequence similarity is low. Support vector machine (SVM) algorithms approach this problem by transforming all proteins into a feature space of equal dimension based on protein properties, such as sequence similarity scores against a basis set of proteins or motifs. This multivariate representation of the protein space is then used to build a classifier specific to a pre-defined protein family. However, this approach is not well suited to large-scale annotation. We have developed a SVM approach that formulates remote homology as a single classifier that answers the pairwise comparison problem by integrating the two feature vectors for a pair of sequences into a single vector representation that can be used to build a classifier that separates sequence pairs into homologs and non-homologs. This pairwise SVM approach significantly improves the task of remote homology detection on the benchmark dataset, quantified as the area under the receiver operating characteristic curve; 0.97 versus 0.73 and 0.70 for PSI-BLAST and Basic Local Alignment Search Tool (BLAST), respectively.  相似文献   

4.
We have developed a novel approach for dissecting transmembrane beta-barrel proteins (TMBs) in genomic sequences. The features include (i) the identification of TMBs using the preference of residue pairs in globular, transmembrane helical (TMH) and TMBs, (ii) elimination of globular/TMH proteins that show sequence identity of more than 70% for the coverage of 80% residues with known structures, (iii) elimination of globular/TMH proteins that have sequence identity of more than 60% with known sequences in SWISS-PROT, and (iv) exclusion of TMH proteins using SOSUI, a prediction system for TMH proteins. Our approach picked up 7% TMBs in all the considered genomes. The comparison between the identified TMBs in E. coli genome and available experimental data demonstrated that the new approach could correctly identify all the 11 known TMBs, whose crystal structures are available. Further, it revealed the presence of 19 TMBs, homology with known structures, 60 TMBs similar to well annotated sequences, and 54 TMBs that have high sequence similarity with Escherichia coli beta-barrel proteins deposited in Transport Classification Database (TCDB). Interestingly, the present approach identified TMBs from all 15 families in TCDB. In human genome, the occurrence of TMBs varies from 0 to 3% in different chromosomes. We suggest that our approach could lead to a step forward in the advancement of structural and functional genomics.  相似文献   

5.
Jagannadham MV 《Electrophoresis》2008,29(21):4341-4350
Multidimensional protein identification technology helps in identifying a large number of proteins with ESI by sequencing several peptides with MS/MS methods. When ionization and separation of different hydrophobic and hydrophilic peptides in a single process are difficult, a combination of LC-coupled linear ion trap MS and MALDI TOF/TOF can be used for identification of proteins as shown in the present study. We have used this combinational approach to identify membrane proteins of the Antarctic bacterium Pseudomonas syringae Lz4W, which are separated by SDS gel electrophoresis. Although the genome of P. syringae Lz4W has not been sequenced, the known genome sequences of mesophilic Pseudomonas species have been used for the identification of the proteins. Broadly, many membrane proteins, proteins with a wide range of molecular weight and pI including some integral membrane proteins could be identified using this procedure. Some of the identified proteins are involved in low temperature adaptation.  相似文献   

6.
Three abundant small acid-soluble proteins (SASPs) from spores of Bacillus globigii were sequenced using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry with post-source decay and nanoelectrospray collision-induced dissociation tandem mass spectrometry. The proteins were extracted from spores with 1 M HCl. Scanning electron micrographs of spores before and after acid extraction show that the spores retain their overall structure but have a shriveled texture following the acid treatment. Extracted SASPs were purified by high-performance liquid chromatography and molecular masses of the SASPs were identified at 7068 (SASP-1), 7332 (SASP-2), and 8889 (gamma-SASP). De novo peptide sequencing was used to determine the protein sequences. The correct ordering of peptide sequences was aided by mapping overlapping enzymatic digests and by comparison with homologous SASPs from Bacillus stearothermophilus. B. globigii is used in many field tests as a surrogate for B. anthracis. Thus complete SASP sequences from B. globigii will facilitate the development of methods for rapid identification of bacteria based on mass spectrometry and the examination of taxonomic relationships between Bacillus species.  相似文献   

7.
Peptide research has increased during the last years due to their applications as biomarkers, therapeutic alternatives or as antigenic sub-units in vaccines. The implementation of computational resources have facilitated the identification of novel sequences, the prediction of properties, and the modelling of structures. However, there is still a lack of open source protocols that enable their straightforward analysis. Here, we present PepFun, a compilation of bioinformatics and cheminformatics functionalities that are easy to implement and customize for studying peptides at different levels: sequence, structure and their interactions with proteins. PepFun enables calculating multiple characteristics for massive sets of peptide sequences, and obtaining different structural observables derived from protein-peptide complexes. In addition, random or guided library design of peptide sequences can be customized for screening campaigns. The package has been created under the python language based on built-in functions and methods available in the open source projects BioPython and RDKit. We present two tutorials where we tested peptide binders of the MHC class II and the Granzyme B protease.  相似文献   

8.
In recent years, the use of liquid chromatography tandem mass spectrometry (LC–MS/MS) on tryptic digests of cultural heritage objects has attracted much attention. It allows for unambiguous identification of peptides and proteins, and even in complex mixtures species-specific identification becomes feasible with minimal sample consumption. Determination of the peptides is commonly based on theoretical cleavage of known protein sequences and on comparison of the expected peptide fragments with those found in the MS/MS spectra. In this approach, complex computer programs, such as Mascot, perform well identifying known proteins, but fail when protein sequences are unknown or incomplete. Often, when trying to distinguish evolutionarily well preserved collagens of different species, Mascot lacks the required specificity. Complementary and often more accurate information on the proteins can be obtained using a reference library of MS/MS spectra of species-specific peptides. Therefore, a library dedicated to various sources of proteins in works of art was set up, with an initial focus on collagen rich materials. This paper discusses the construction and the advantages of this spectral library for conservation science, and its application on a number of samples from historical works of art.  相似文献   

9.
Proteins are the macromolecules responsible for almost all biological processes in a cell. With the availability of large number of protein sequences from different sequencing projects, the challenge with the scientist is to characterize their functions. As the wet lab methods are time consuming and expensive, many computational methods such as FASTA, PSI-BLAST, DNA microarray clustering, and Nearest Neighborhood classification on protein–protein interaction network have been proposed. Support vector machine is one such method that has been used successfully for several problems such as protein fold recognition, protein structure prediction etc. Cai et al. in 2003 have used SVM for classifying proteins into different functional classes and to predict their function. They used the physico-chemical properties of proteins to represent the protein sequences. In this paper a model comprising of feature subset selection followed by multiclass Support Vector Machine is proposed to determine the functional class of a newly generated protein sequence. To train and test the model for its performance, 32 physico-chemical properties of enzymes from 6 enzyme classes are considered. To determine the features that contribute significantly for functional classification, Sequential Forward Floating Selection (SFFS), Orthogonal Forward Selection (OFS), and SVM Recursive Feature Elimination (SVM-RFE) algorithms are used and it is observed that out of 32 properties considered initially, only 20 features are sufficient to classify the proteins into its functional classes with an accuracy ranging from 91% to 94%. On comparison it is seen that, OFS followed by SVM performs better than other methods. Our model generalizes the existing model to include multiclass classification and to identify most significant features affecting the protein function.  相似文献   

10.
The remarkable conservation of protein structure, compared to that of sequences, suggests that, in the course of evolution, residue substitutions which tend to destabilise a particular structure must be compensated by other substitutions that confer greater stability on that structure. Given the compactness of proteins, spatially close residues are expected to undergo the compensatory process. Surprisingly, approaches designed to detect such correlated changes have led, until now, only to limited success in detecting pairs of residues adjacent in the three-dimensional structures. We have undertaken, by simulating the evolution of DNA sequences including sites mutating in a correlated manner, to analyse whether such poor results can be attributed to the detection methods or if this failure could result from a compensatory process more complex than that implicitly underlying the different approaches. Present results show that only methods taking into account the phylogenetic reconstruction can lead to correct detection. Received: 24 April 1998 / Accepted: 8 August 1998 / Published online: 11 November 1998  相似文献   

11.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell--the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated.  相似文献   

12.
As two-dimensional (2-D) electrophoresis allows the separation of several hundred proteins in a single gel, this technique has become an important tool for proteome studies and for investigating the cellular physiology. In order to take advantage of information provided by the comparison of proteome pictures, the mass spectrometry technique is the way chosen for a rapid and an accurate identification of proteins of interest. Unfortunately, in the case of industrial yeasts, due to the high level of complexity of their genome, the whole DNA sequence is not yet available and all encoded protein sequences are still unknown. Nevertheless, this study presents here 30 lager brewing yeast proteins newly identified with matrix assisted laser desorption/ionization-time of flight (MALDI-TOF), tandem mass spectrometry (MS/MS) and database searching against the protein sequences of Saccharomyces cerevisiae. The identified proteins of the industrial strain correspond to proteins which do not comigrate with known proteins of S. cerevisiae separated on 2-D gels. This study presents an application of the MS technique for the identification of industrial yeast proteins which are only homologous to the corresponding S. cerevisiae proteins.  相似文献   

13.
Chemical cross-linking of proteins, an established method in protein chemistry, has gained renewed interest in combination with mass spectrometric analysis of the reaction products for elucidating low-resolution three-dimensional protein structures and interacting sequences in protein complexes. The identification of the large number of cross-linking sites from the complex mixtures generated by chemical cross-linking, however, remains a challenging task. This review describes the most popular cross-linking reagents for protein structure analysis and gives an overview of the strategies employing intra- or intermolecular chemical cross-linking and mass spectrometry. The various approaches described in the literature to facilitate detection of cross-linking products and also computer software for data analysis are reviewed. Cross-linking techniques combined with mass spectrometry and bioinformatic methods have the potential to provide the basis for an efficient structural characterization of proteins and protein complexes.  相似文献   

14.
Recent advances in high-throughput experimental technologies have generated a huge amount of data on interactions between proteins and nucleic acids. Motivated by the big experimental data, several computational methods have been developed either to predict binding sites in a sequence or to determine if an interaction exists between protein and nucleic acid sequences. However, most of the methods cannot be used to discover new nucleic acid sequences that bind to a target protein because they are classifiers rather than generators. In this paper we propose a generative model for constructing protein-binding RNA sequences and motifs using a long short-term memory (LSTM) neural network. Testing the model for several target proteins showed that RNA sequences generated by the model have high binding affinity and specificity for their target proteins and that the protein-binding motifs derived from the generated RNA sequences are comparable to the motifs from experimentally validated protein-binding RNA sequences. The results are promising and we believe this approach will help design more efficient in vitro or in vivo experiments by suggesting potential RNA aptamers for a target protein.  相似文献   

15.
Integral membrane proteins play important roles in living cells. Due to difficulties of experimental techniques, theoretical approaches, i.e., topology prediction methods, are important for structure determination of this class of proteins. Here we show a detailed comparison of transmembrane topology prediction methods. According to this comparison, we conclude that the topology of integral membrane proteins is determined by the maximum divergence of the amino acid composition of sequence segments. These segments are located in different areas of the cell, which can be characterized by different physicochemical properties. The results of these prediction methods compared to the X-ray diffraction data of several transmembrane proteins will also be discussed.  相似文献   

16.
The sequencing of biopolymers such as proteins and DNA is among the most significant scientific achievements of the 20th century. Indeed, modern chemical methods for sequence analysis allow reading and understanding the codes of life. Thus, sequencing methods currently play a major role in applications as diverse as genomics, gene therapy, biotechnology, and data storage. However, in terms of fundamental science, sequencing is not really a question of molecular biology but rather a more general topic in macromolecular chemistry. Broadly speaking, it can be defined as the analysis of comonomer sequences in copolymers. However, relatively different approaches have been used in the past to study monomer sequences in biological and manmade polymers. Yet, these “cultural” differences are slowly fading away with the recent development of synthetic sequence‐controlled polymers. In this context, the aim of this Minireview is to present an overview of the tools that are currently available for sequence analysis in macromolecular science.  相似文献   

17.
High-throughput DNA sequencing has resulted in increasing input in protein sequence databases. Today more than 20 genomes have been sequenced and many more will be completed in the near future, including the largest of them all, the human genome. Presently, sequence databases contain entries for more than 425.000 protein sequences. However, the cellular functions are determined by the set of proteins expressed in the cell – the proteome. Two-dimensional gel electrophoresis, mass spectrometry and bioinformatics have become important tools in correlating the proteome with the genome. The current dominant strategies for identification of proteins from gels based on peptide mass spectrometric fingerprinting and partial sequencing by mass spectrometry are described. After identification of the proteins the next challenge in proteome analysis is characterization of their post-translational modifications. The general problems associated with characterization of these directly from gel separated proteins are described and the current state of art for the determination of phosphorylation, glycosylation and proteolytic processing is illustrated. Received: 16 December 1999 / Accepted: 17 December 1999  相似文献   

18.
Protein SUMOylation modification conjugated with small ubiquitin‐like modifiers (SUMOs) is one kind of PTMs, which exerts comprehensive roles in cellular functions, including gene expression regulation, DNA repair, intracellular transport, stress responses, and tumorigenesis. With the development of the peptide enrichment approaches and MS technology, more than 6000 SUMOylated proteins and about 40 000 SUMO acceptor sites have been identified. In this review, we summarize several popular approaches that have been developed for the identification of SUMOylated proteins in human cells, and further compare their technical advantages and disadvantages. And we also introduce identification approaches of target proteins which are co‐modified by both SUMOylation and ubiquitylation. We highlight the emerging trends in the SUMOylation field as well. Especially, the advent of the clustered regularly interspaced short palindromic repeats/ Cas9 technique will facilitate the development of MS for SUMOylation identification.  相似文献   

19.
Since it was observed that the structural class of a protein is related to its amino acid composition, various methods based on amino acid composition have been proposed to predict protein structural classes. Though those methods are effective to some degree, their predictive quality is confined because amino acid composition cannot sufficiently include the information of protein sequences. In this paper, a measure of information discrepancy is applied to the prediction of protein structural classes; different from the previous methods, this new approach is based on the comparisons of subsequence distributions; therefore, the effect of residue order on protein structure is taken into account. The predictive results of the new approach on the same data set are better than those of the previous methods. As to a data set of 1401 sequences with no more than 30% redundancy, the overall correctness rates of resubstitution test and Jackknife test are 99.4 and 75.02%, respectively, and to other data sets the similar results are also obtained. All tests demonstrate that the residue order along protein sequences plays an important role on recognition of protein structural classes, especially for alpha/beta proteins and alpha+beta proteins. In addition, the tests also show that the new method is simple and efficient.  相似文献   

20.
The role of separation science in proteomics research.   总被引:22,自引:0,他引:22  
H J Issaq 《Electrophoresis》2001,22(17):3629-3638
In the last few years there has been an increased effort into the separation, quantification and identification of all proteins in a cell or tissue. This is a review of the role gel electrophoresis, high performance liquid chromatography (HPLC), and capillary electrophoresis (CE) play in proteomics research. The capabilities and limitations of each separation technique have been pointed out. Instrumental strategies for the resolution of cell proteins which are based on efficient separation employing either a single high-resolution procedure or a multidimensional approach on-line or off-line, and a mass spectrometer for protein identification have been reviewed. A comparison of the advantages of multi-dimensional separations such as two-dimensional polyacrylamide gel electrophoresis, HPLC-HPLC, and HPLC-CE to the separation of cell and tissue proteins are discussed. Also, a discussion of novel approaches to protein concentration, separation, detection, and quantification is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号