首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given the huge number of sequences of otherwise uncharacterized protein sequences, computer-aided prediction of posttranslational modifications (PTMs) and translocation signals from amino acid sequence becomes a necessity. We have contributed to this multi-faceted, worldwide effort with the development of predictors for GPI lipid anchor sites, for N-terminal N-myristoylation sites, for farnesyl and geranylgeranyl anchor attachment as well as for the PTS1 peroxisomal signal. Although the substrate protein sequence signals for various PTMs or translocation systems vary dramatically, we found that their principal architecture is similar for all the cases studied. Typically, a small stretch of the amino acid residues is buried in the catalytic cleft of the protein-modifying enzyme (or the binding site of the transporter). This piece most intensely interacts with the enzyme and its sequence variability is most restricted. This stretch is surrounded by linker segments that connect the part bound by the enzyme with the rest of the substrate protein. These residues are, as a trend, small with a flexible backbone and polar. Due to the mechanistic requirements of binding to the enzyme, we suggest that most PTM sites are necessarily embedded into intrinsically disordered regions (except for cases of autocatalytic PTMs, PTMs executed in the unfolded state or non-enzymatic PTMs) and this issue requires consideration in structural studies of proteins with complex architecture. Surprisingly, some proteins carry sequence signals for posttranslational modification or translocation that remain hidden in the normal biological context but can become fully functional in certain conditions.  相似文献   

2.
This report describes a mechanical method for efficient and accurate replication of DNA microarrays from a zip code master. The zip code master is a DNA array that defines the location of oligonucleotides consisting of two parts: a code sequence, which is complementary to one or more of the zip codes, and the functional sequence, which is terminated with biotin. Following hybridization of the zip code to the code sequence, a replica surface functionalized with streptavidin is brought into conformal contact with the surface of the master. When the two surfaces are separated, the functional and code sequences are transferred to the replica, and the zip code remains on the surface of the master. Using this approach it is possible to prepare replica arrays having any configuration from a single, universal master array. Here we demonstrate that this approach can be used to replicate master arrays having up to three different sequences, that feature sizes as small as 100 microm can be replicated, and that master arrays can be used to prepare multiple replicas.  相似文献   

3.
基于氨基酸模糊聚类分析的跨膜区域预测   总被引:2,自引:0,他引:2  
邓勇  刘琪  李亦学 《化学学报》2004,62(19):1968-1972
跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.  相似文献   

4.
Single nucleotide polymorphism (SNP) is the most common genetic variation among individuals. The association of SNP with individual's response to pathogens, phenotypic variations, and gene functions emphasizes the importance of sensitive and reliable SNP detection for biomedical diagnosis and therapy. To increase sensitivity, most approaches employ amplification steps, such as PCR, to generate detectable signals that are usually ensemble-averaged. Introduction of amplification steps increases the complexity of a system, whereas ensemble averaging of signals often suffers from background interference. Here, we have exploited the stochastic behavior of a single-molecule probe to recognize SNP sequence in a microfluidic platform using a laser-tweezers instrument. The detection relies on on-off mechanical signals that provide little background interference and high specificity between wild type and SNP sequences. The microfluidic setting allows multiplex sensing and in situ recycling of the SNP probe. As a proof-of-concept, we have detected as low as 100 pM of an SNP target associated with coronary heart diseases within half an hour without any amplification steps. The mechanical signal permits the detection of single mutations involving either G/C or A/T pairs. We anticipate this system has the capacity to function as a highly sensitive generic biosensor after incorporation of a specific recognition element, such as an aptamer for example.  相似文献   

5.
6.
Protein structure prediction is a fundamental issue in the field of computational molecular biology. In this paper, the AB off-lattice model is adopted to transform the original protein structure prediction scheme into a numerical optimization problem. We present a balance-evolution artificial bee colony (BE-ABC) algorithm to address the problem, with the aim of finding the structure for a given protein sequence with the minimal free-energy value. This is achieved through the use of convergence information during the optimization process to adaptively manipulate the search intensity. Besides that, an overall degradation procedure is introduced as part of the BE-ABC algorithm to prevent premature convergence. Comprehensive simulation experiments based on the well-known artificial Fibonacci sequence set and several real sequences from the database of Protein Data Bank have been carried out to compare the performance of BE-ABC against other algorithms. Our numerical results show that the BE-ABC algorithm is able to outperform many state-of-the-art approaches and can be effectively employed for protein structure optimization.  相似文献   

7.
Protein-based polymers possess chemically defined sequences that can encode diverse properties and functions into a new class of biopolymeric materials. However, sequence variation that emerges from evolution can obscure the sequence–function relationships of naturally derived polymers. One strategy to clarify these relationships is to identify common sequences between proteins with similar functions. These conserved sequences often emerge from repeat proteins, and “consensus repeat sequences” provide a convenient platform for systematic investigations of biopolymer sequence–property relationships. In this review, we highlight recent approaches to engineer tunable polymeric materials using monomer-scale design of consensus repeat proteins. We explore established and emerging protein-based materials with mechanical resilience, thermodynamic phase behavior, chemical responsiveness, biomolecular transport, and hierarchical structure. Overall, recent advances in the monomer-scale design of repetitive protein polymers present exciting fundamental and translational opportunities for polymer scientists and engineers.  相似文献   

8.
9.
The presented program ALIGN_MTX makes alignment of two textual sequences with an opportunity to use any several characters for the designation of sequence elements and arbitrary user substitution matrices. It can be used not only for the alignment of amino acid and nucleotide sequences but also for sequence-structure alignment used in threading, amino acid sequence alignment, using preliminary known PSSM matrix, and in other cases when alignment of biological or non-biological textual sequences is required. This distinguishes it from the majority of similar alignment programs that make, as a rule, alignment only of amino acid or nucleotide sequences represented as a sequence of single alphabetic characters. ALIGN_MTX is presented as downloadable zip archive at http://www.imbbp.org/software/ALIGN_MTX/ and available for free use.As application of using the program, the results of comparison of different types of substitution matrix for alignment quality in distantly related protein pair sets were presented. Threading matrix SORDIS, based on side-chain orientation in relation to hydrophobic core centers with evolutionary change-based substitution matrix BLOSUM and using multiple sequence alignment information position-specific score matrices (PSSM) were taken for test alignment accuracy. The best performance shows PSSM matrix, but in the reduced set with lower sequence similarity threading matrix SORDIS shows the same performance and it was shown that combined potential with SORDIS and PSSM can improve alignment quality in evolutionary distantly related protein pairs.  相似文献   

10.
Due to the exponential growth of sequenced genomes, the need to quickly provide accurate annotation for existing and new sequences is paramount to facilitate biological research. Current sequence comparison approaches fail to detect homologous relationships when sequence similarity is low. Support vector machine (SVM) algorithms approach this problem by transforming all proteins into a feature space of equal dimension based on protein properties, such as sequence similarity scores against a basis set of proteins or motifs. This multivariate representation of the protein space is then used to build a classifier specific to a pre-defined protein family. However, this approach is not well suited to large-scale annotation. We have developed a SVM approach that formulates remote homology as a single classifier that answers the pairwise comparison problem by integrating the two feature vectors for a pair of sequences into a single vector representation that can be used to build a classifier that separates sequence pairs into homologs and non-homologs. This pairwise SVM approach significantly improves the task of remote homology detection on the benchmark dataset, quantified as the area under the receiver operating characteristic curve; 0.97 versus 0.73 and 0.70 for PSI-BLAST and Basic Local Alignment Search Tool (BLAST), respectively.  相似文献   

11.
Protein and peptide sequences contain clues for functional prediction. A challenge is to predict sequences that show low or no homology to proteins or peptides of known function. A machine learning method, support vector machines (SVM), has recently been explored for predicting functional class of proteins and peptides from sequence-derived properties irrespective of sequence similarity, which has shown impressive performance for predicting a wide range of protein and peptide classes including certain low- and non- homologous sequences. This method serves as a new and valuable addition to complement the extensively-used alignment-based, clustering-based, and structure-based functional prediction methods. This article evaluates the strategies, current progresses, reported prediction performances, available software tools, and underlying difficulties in using SVM for predicting the functional class of proteins and peptides.  相似文献   

12.
Proteins destined for regions other than the cytoplasm in cells have to cross at least one membrane barrier before reaching their proper destination. Almost all such proteins are initially biosynthesized as precursors with signal sequences at the amino terminus. Signal sequences are essential and also sufficient for proteins to be targeted to membranes and also for translocation across membranes. One striking feature that is clearly evident amongst signal sequences of secretory proteins is a positively charged amino terminus followed by a region comprising 10–12 very hydrophobic amino acids. The structural and physico-chemical properties of signal sequences have been analysed. On the basis of the analyses it is proposed that the structural feature of a positively charged amino terminal region followed by a hydrophobic stretch of amino acids, rather than a conformational one, is recognised by components of the cells export machinery. It is also postulated that signal sequences insert in the lipid bilayer of the translocation competent membrane after targeting. The presence of the signal sequence results in the formation of local ‘defects’ in the bilayer which have a role in translocation of proteins across membranes.  相似文献   

13.
Accurately predicting phosphorylation sites in proteins is an important issue in postgenomics, for which how to efficiently extract the most predictive features from amino acid sequences for modeling is still challenging. Although both the distributed encoding method and the bio-basis function method work well, they still have some limits in use. The distributed encoding method is unable to code the biological content in sequences efficiently, whereas the bio-basis function method is a nonparametric method, which is often computationally expensive. As hidden Markov models (HMMs) can be used to generate one model for one cluster of aligned protein sequences, the aim in this study is to use HMMs to extract features from amino acid sequences, where sequence clusters are determined using available biological knowledge. In this novel method, HMMs are first constructed using functional sequences only. Both functional and nonfunctional training sequences are then inputted into the trained HMMs to generate functional and nonfunctional feature vectors. From this, a machine learning algorithm is used to construct a classifier based on these feature vectors. It is found in this work that (1) this method provides much better prediction accuracy than the use of HMMs only for prediction, and (2) the support vector machines (SVMs) algorithm outperforms decision trees and neural network algorithms when they are constructed on the features extracted using the trained HMMs.  相似文献   

14.
Nanoparticles (NPs) are useful as matrixes for the analyses of several types of biomolecules (including aminothiols, peptides, and proteins) and for mass spectrometric imaging through surface-assisted laser desorption/ionization mass spectrometry (SALDI-MS), mainly because of their large surface area, strong absorption in the ultraviolet-near-infrared region, and ready functionalization. Metallic NPs, metal oxide NPs, and semiconductor quantum dots, unmodified or functionalized with recognition ligands, have a strong affinity toward analytes; therefore, they allow the enrichment of biomolecules, leading to improved sensitivity with minimal matrix interference in their mass spectra. SALDI-MS using NPs overcomes the two major problems commonly encountered in matrix-assisted laser desorption/ionization mass spectrometry: the presence of "sweet spots" and the high background signals in the low-mass region. In this tutorial review, we discuss the roles played by the nature, size, and concentration of the NPs, the buffer composition, and the laser energy in determining the sensitivity and mass ranges for the analytes. We describe internal standard SALDI-MS methods that allow the concentrations of analytes to be determined with low variation (relative standard deviations: <10%) and we highlight how the simplicity, sensitivity, and reproducibility of SALDI-MS approaches using various NPs allow the analyses of proteins and small analytes and the imaging of cells.  相似文献   

15.
The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. For the particular case of kink prediction, we generated a data set of 132 membrane proteins containing 1014 manually labeled helices and examined the environment of kinks. Our sequence analysis confirms the great relevance of proline and reveals disproportionately high occurrences of glycine and serine at kink positions. The structural analysis shows significantly different solvent accessible surface area mean values for kinked and nonkinked helices. More important, we used this data set to validate string kernels for support vector machines as a new kink prediction method. Applying the new predictor, about 80% of all helices could be correctly predicted as kinked or nonkinked even when focusing on small helical fragments. The results exceed recently reported accuracies of alternative approaches and are a consequence of both the method and the data set.  相似文献   

16.
Hyperpolarization is one of the approaches to enhance Nuclear Magnetic Resonance (NMR) and Magnetic Resonance Imaging (MRI) signal by increasing the population difference between the nuclear spin states. Imaging hyperpolarized solids opens up extensive possibilities, yet is challenging to perform. The highly populated state is normally not replenishable to the initial polarization level by spin-lattice relaxation, which regular MRI sequences rely on. This makes it necessary to carefully “budget” the polarization to optimize the image quality. In this paper, we present a theoretical framework to address such challenge under the assumption of either variable flip angles or a constant flip angle. In addition, we analyze the gradient arrangement to perform fast imaging to overcome intrinsic short decoherence in solids. Hyperpolarized diamonds imaging is demonstrated as a prototypical platform to test the theory.  相似文献   

17.
The focus of the computational structural biology community has taken a dramatic shift over the past one-and-a-half decades from the classical protein structure prediction problem to the possible understanding of intrinsically disordered proteins (IDP) or proteins containing regions of disorder (IDPR). The current interest lies in the unraveling of a disorder-to-order transitioning code embedded in the amino acid sequences of IDPs/IDPRs. Disordered proteins are characterized by an enormous amount of structural plasticity which makes them promiscuous in binding to different partners, multi-functional in cellular activity and atypical in folding energy landscapes resembling partially folded molten globules. Also, their involvement in several deadly human diseases (e.g. cancer, cardiovascular and neurodegenerative diseases) makes them attractive drug targets, and important for a biochemical understanding of the disease(s). The study of the structural ensemble of IDPs is rather difficult, in particular for transient interactions. When bound to a structured partner, an IDPR adapts an ordered conformation in the complex. The residues that undergo this disorder-to-order transition are called protean residues, generally found in short contiguous stretches and the first step in understanding the modus operandi of an IDP/IDPR would be to predict these residues. There are a few available methods which predict these protean segments from their amino acid sequences; however, their performance reported in the literature leaves clear room for improvement. With this background, the current study presents ‘Proteus’, a random forest classifier that predicts the likelihood of a residue undergoing a disorder-to-order transition upon binding to a potential partner protein. The prediction is based on features that can be calculated using the amino acid sequence alone. Proteus compares favorably with existing methods predicting twice as many true positives as the second best method (55 vs. 27%) with a much higher precision on an independent data set. The current study also sheds some light on a possible ‘disorder-to-order’ transitioning consensus, untangled, yet embedded in the amino acid sequence of IDPs. Some guidelines have also been suggested for proceeding with a real-life structural modeling involving an IDPR using Proteus.  相似文献   

18.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

19.
A new way to represent and analyze DNA sequence data is described. This approach complements methods currently used, in that it allows the systematic part of the variation between different sequences to be modeled. This can prove as informative as absence of variation (homology), which is the most widely used criterion for comparing sequence data. A multivariate sequence-activity model (SAM), for DNA-promoter sequences is presented, by which the relative promoter strength is modeled in terms of the primary DNA-sequence. The model is shown to have a good predictive capability. The coefficients from the model are interpreted, and used to design new structures predicted to be strong promoters in the system investigated. The approach described is also applicable to other kinds of sequence data, e.g. RNAs, proteins or peptides.  相似文献   

20.
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing software package, PEAKS, to extract amino acid sequence information without the use of databases. PEAKS uses a new model and a new algorithm to efficiently compute the best peptide sequences whose fragment ions can best interpret the peaks in the MS/MS spectrum. The output of the software gives amino acid sequences with confidence scores for the entire sequences, as well as an additional novel positional scoring scheme for portions of the sequences. The performance of PEAKS is compared with Lutefisk, a well-known de novo sequencing software, using quadrupole-time-of-flight (Q-TOF) data obtained for several tryptic peptides from standard proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号