首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new two-dimensional graphical representation of protein sequences is introduced. Twenty concentric evenly spaced circles divided by n radial lines into equal divisions are selected to represent any protein sequence of length n. Each circle represents one of the different 20 amino acids, and each radial line represents a single amino acid of the protein sequence. An efficient numerical method based on the graph is proposed to measure the similarity between two protein sequences. To prove the accuracy of our approach, the method is applied to NADH dehydrogenase subunit 5 (ND5) proteins of nine different species and 24 transferrin sequences from vertebrates. High values of correlation coefficient between our results and the results of ClustalW are obtained (approximately perfect correlations). These values are higher than the values obtained in many other related works.  相似文献   

2.
In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.  相似文献   

3.
We consider a novel numerical representation of proteins obtained by assigning to individual amino acids the polar coordinate on a unit circle. As a result one can represent protein sequence as one-dimensional numerical sequence, the entries of which when subtracted facilitates search for alignment between pairs of proteins of interest. The alignment is sought by shifting one sequence relative to another by several sequence units to the left or to the right. The novel approach is illustrated on two yeast proteins having 174 and 171 amino acids. Visiting Emeritus from the Department of Mathematics & Computer Science Drake University, Des Moines, Iowa.  相似文献   

4.
It has tremendous values for both drug discovery and basic research to develop a solid bioinformatical tool for guiding peptide reagent design. Based on the physical and chemical properties of amino acids, a new strategy for peptide reagent design, the so-called AABPD (amino acid based-peptide design), is proposed. The peptide samples in a training dataset are described by a series of HMLP (heuristic molecular lipophilicity potential) parameters and other physicochemical properties of amino acid residues that form a three-dimensional data matrix where each component is defined by three indexes: the first index refers to the peptide samples, the second to the amino acid positions, and the third to the amino acid parameters. The binding free energy between a peptide ligand and its protein receptor is calculated by a linear free energy equation through the physicochemical parameters, resulting in a set of simultaneous linear equations between the bioactivity of the peptides and the physicochemical properties of amino acids. An iterative double least square technique is developed for the solution of the three-dimensional simultaneous linear equation set to determine the amino acid position coefficients of peptide sequence and the physicochemical parameter coefficients of amino acid residues alternately. The two sets of coefficients thus obtained are used for predicting the bioactivity of other query peptide reagents. Two calculation examples, the peptide substrate specificity of the SARS coronavirus 3C-like proteinase and the affinity prediction for epitope-peptides with Class I MHC molecules are studied by using the peptide reagent design strategy.  相似文献   

5.
Understanding the relationship between amino acid sequences and folding rate of proteins is a challenging task similar to protein folding problem. In this work, we have analyzed the relative importance of protein sequence and structure for predicting the protein folding rates in terms of amino acid properties and contact distances, respectively. We found that the parameters derived with protein sequence (physical-chemical, energetic, and conformational properties of amino acid residues) show very weak correlation (|r| < 0.39) with folding rates of 28 two-state proteins, indicating that the sequence information alone is not sufficient to understand the folding rates of two-state proteins. However, the maximum positive correlation obtained for the properties, number of medium-range contacts, and alpha-helical tendency reveals the importance of local interactions to initiate protein folding. On the other hand, a remarkable correlation (r varies from -0.74 to -0.88) has been obtained between structural parameters (contact order, long-range order, and total contact distance) and protein folding rates. Further, we found that the secondary structure content and solvent accessibility play a marginal role in determining the folding rates of two-state proteins. Multiple regression analysis carried out with the combination of three properties, beta-strand tendency, enthalpy change, and total contact distance improved the correlation to 0.92 with protein folding rates. The relative importance of existing methods along with multiple-regression model proposed in this work will be discussed. Our results demonstrate that the native-state topology is the major determinant for the folding rates of two-state proteins.  相似文献   

6.
A novel representation of proteins was introduced. It is independent of arbitrary decisions with respect to the choice of labels to be assigned to the 20 natural amino acids. The approach is based on an assignment of 20 unit vectors in 20-dimensional vector space to the 20 natural amino acids. Proteins are then represented by a walk, that is, a sequence of steps in the 20-dimensional space analogous to a walk in the (x, y) plane in the case of binary strings. A straightforward numerical characterization of proteins is obtained from the distance matrix associated with the walk representing the protein in 20-dimensional space combining the information on the Euclidean distance between various amino acids in protein sequence. The Line Distance matrix offers additional numerical characterization of proteins, while the lengths of steps of the walk in 20-D space allow construction of a "protein profile," which represents distribution of average lengths of the steps and their powers.  相似文献   

7.
We consider a spectrum-like two-dimensional graphical representation of proteins based on a reduced protein model in which 20 amino acids are grouped into five classes. This particular grouping of amino acids was suggested by Riddle and co-workers in 1997. The graphical representation is based on depicting sequentially the amino acids on five horizontal lines at equal separations. One-letter codes, B, O, U, X and Y, to which numerical values 1 to 5 have been assigned, are suggested as labels for the fictional amino acids that represent all the amino acids within each group. The approach is illustrated on ND6 proteins of eight species having from 168 to 175 amino acids. While visual inspection of the novel spectral graphical representations of proteins may reveal local similarities and dissimilarities of protein sequences, arithmetic manipulations of spectra offer an elegant route to graphic visualization of the degree of similarity for selected pairs of proteins.  相似文献   

8.
Abstract— The triplet states of proteins, bovine serum albumin, ovalbumin and d-amino acid oxidase, were observed by electron paramagnetic resonance at 77°K.
The triplet state of aromatic amino acids, tryptophan, tyrosine and phenylalanine was also detected.
The protein triplet originates from the tryptophan residues of these proteins.
It is suggested that an energy transfer takes place between tyrosine and tryptophan.  相似文献   

9.
Based on the chaos game representation, a 2D graphical representation of protein sequences was introduced in which the 20 amino acids are rearranged in a cyclic order according to their physicochemical properties. The Euclidean distances between the corresponding amino acids from the 2‐D graphical representations are computed to find matching (or conserved) fragments of amino acids between the two proteins. Again, the cumulative distance of the 2D‐graphical representations is defined to compare the similarity of protein. And, the examination of the similarity among sequences of the ND5 proteins of nine species shows the utility of our approach. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

10.
Neutralizing antibodies often recognize conformational, discontinuous epitopes. Linear peptides mimicking such conformational epitopes can be selected from phage display peptide libraries by screening with the respective antibodies. However, it is difficult to localize these "mimotopes" within the three-dimensional (3D) structures of the target proteins. Knowledge of conformational epitopes of neutralizing antibodies would help to design antigens able to elicit protective immune responses. Therefore, we provide here a software that allows to localize linear peptide sequences within 3D structures of proteins. The 3D-Epitope-Explorer (3DEX) software allows to map conformational epitopes in 3D protein structures based on an algorithm that takes into account the physicochemical neighborhood of C(alpha)- or C(beta)-atoms of individual amino acids. A given amino acid of a peptide sequence is localized within the protein and the software searches within predefined distances for the amino acids neighboring that amino acid in the peptide. Surface exposure of the amino acids can also be taken into consideration. The procedure is then repeated for the remaining amino acids of the peptide. The introduction of a joker function allows to map peptide mimotopes, which do not necessarily have 100% sequence homology to the protein. Using this software we were able to localize mimotopes selected from phage displayed peptide libraries with polyclonal antibodies from HIV-positive patient plasma within the 3D structure of gp120, the exterior glycoprotein of HIV-1. We also analyzed two recently published peptide sequences corresponding to known conformational epitopes to further confirm the integrity of 3DEX.  相似文献   

11.
折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.  相似文献   

12.
13.
Truncation by the presence of many short-range residual dipolar couplings (RDCs) hinders the observation of long-range RDCs in weakly aligned biomacromolecules. Perdeuteration of proteins followed by reprotonation of labile hydrogen positions greatly alleviates this problem. Here we show that for small perdeuterated proteins, a large number (up to 10 in protein G) of long-range RDCs to 13C and 1HN can be observed from individual amide protons. The 1HN <--> 13C RDCs comprise correlations to 13Calpha, 13Cbeta, and 13C' nuclei of the same and the preceding amino acid, as well as 13C' nuclei of hydrogen-bonded amino acids. The accuracy of the coupling constants is very high and defines individual internuclear distances to within few picometers. Deviations between measured RDC values and values predicted from the 1.1 A crystal structure of protein G are mainly found in two surface-exposed loop regions. The deviations show a strong correlation to the B-factor of the crystal structure.  相似文献   

14.
15.
16.
17.
18.
基于氨基酸模糊聚类分析的跨膜区域预测   总被引:2,自引:0,他引:2  
邓勇  刘琪  李亦学 《化学学报》2004,62(19):1968-1972
跨膜蛋白在进化过程中,序列保守性较差,即使是同源蛋白序列的一致性程度也较低,因而在跨膜区预测算法中,通过序列的一致性程度来选取训练集并不能有效地消除预测结果对训练集的过度适应性.本文提出了一种基于氨基酸模糊聚类分析的预测算法,通过氨基酸在各个区域分布的相似性程度进行模糊聚类,从而根据一类氨基酸的分布特性而不是各个氨基酸的分布特性进行跨膜区预测.结果表明,该方法能在一定程度上消除训练集的选取对测试结果的影响,提高跨膜蛋白拓扑结构预测的准确度,特别是提高对目前知之甚少的跨膜蛋白的预测准确度.  相似文献   

19.
MotivationPrimary and secondary active transport are two types of active transport that involve using energy to move the substances. Active transport mechanisms do use proteins to assist in transport and play essential roles to regulate the traffic of ions or small molecules across a cell membrane against the concentration gradient. In this study, the two main types of proteins involved in such transport are classified from transmembrane transport proteins. We propose a Support Vector Machine (SVM) with contextualized word embeddings from Bidirectional Encoder Representations from Transformers (BERT) to represent protein sequences. BERT is a powerful model in transfer learning, a deep learning language representation model developed by Google and one of the highest performing pre-trained model for Natural Language Processing (NLP) tasks. The idea of transfer learning with pre-trained model from BERT is applied to extract fixed feature vectors from the hidden layers and learn contextual relations between amino acids in the protein sequence. Therefore, the contextualized word representations of proteins are introduced to effectively model complex structures of amino acids in the sequence and the variations of these amino acids in the context. By generating context information, we capture multiple meanings for the same amino acid to reveal the importance of specific residues in the protein sequence.ResultsThe performance of the proposed method is evaluated using five-fold cross-validation and independent test. The proposed method achieves an accuracy of 85.44 %, 88.74 % and 92.84 % for Class-1, Class-2, and Class-3, respectively. Experimental results show that this approach can outperform from other feature extraction methods using context information, effectively classify two types of active transport and improve the overall performance.  相似文献   

20.
Four N-alkylaminooxy amino acids have been synthesized in 22-56% overall yield from readily available amino acid precursors. Each amino acid can be efficiently incorporated into peptides using Boc-chemistry-based solid-phase peptide synthesis, and in three of the four cases the resulting peptides can be chemoselectively glycosylated at the aminooxy side chains to generate neoglycopeptides. The range of N-alkylaminooxy amino acids prepared allows attachment of sugars at two-, three-, or four-atom distances from the peptide backbone, and each ensures that attached sugars adopt cyclic conformations. These derivatives provide convenient access to arrays of biologically relevant neoglycopeptides that may be used to probe the influence of attached sugars on the structure and function of peptides and proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号