首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 169 毫秒
1.
膜蛋白跨膜区段的预测分析   总被引:6,自引:0,他引:6  
将连续小波变换技术的时频局部化特点和氨基酸的疏水特性相结合,提出了一种用于预测膜蛋白跨膜区段数目和位置的新方法,以代码为1YST的膜蛋白为例,对小波尺度和疏水值的种类进行了选择,同时描述了该法对跨膜螺旋区数目和位置的预测分析过程.从膜蛋白数据库中随机抽取36个蛋白质(含跨膜螺旋区232)作为测试集,采用该方法对其跨膜螺旋区进行预测,其中222个跨膜螺旋区能被准确预测,准确率为96.1%.结果表明,该法具有较高的预测准确性.  相似文献   

2.
基于疏水性小波分析的膜蛋白结构预测   总被引:5,自引:0,他引:5  
膜蛋白在细胞膜上具有重要的生理功能,大部分膜蛋白在药物设计、转运蛋白和免疫识别等方面起着关键的作用,从分子水平上预测这类蛋白质的结构具有非常重要的意义。本文提出一种基于氨基酸疏水性小波变换技术预测膜蛋白跨膜区段数目和位置的新方法。以代码为upkb_bovin的膜蛋白为例,对跨膜螺旋区数目和位置的预测分析进行了描述。从膜蛋白数据库中随机抽取36个蛋白质(含跨膜螺旋区232)作为测试集检验小波分析的预测方法,其中226个跨膜螺旋区能被准确预测,准确率为96.8%。结果表明,这种预测方法具有较高的准确性。  相似文献   

3.
基于疏水性小波分析的膜蛋白结构预测   总被引:2,自引:0,他引:2  
膜蛋白在细胞膜上具有重要的生理功能,大部分膜蛋白在药物设计、转运蛋白和免疫识别等方面起着关键的作用,从分子水平上预测这类蛋白质的结构具有非常重要的意义.本文提出一种基于氨基酸疏水性小波变换技术预测膜蛋白跨膜区段数目和位置的新方法.以代码为upkb-bovin的膜蛋白为例,对跨膜螺旋区数目和位置的预测分析进行了描述.从膜蛋白数据库中随机抽取36个蛋白质(含跨膜螺旋区232)作为测试集检验小波分析的预测方法,其中226个跨膜螺旋区能被准确预测,准确率为96.8%.结果表明,这种预测方法具有较高的预测准确性.  相似文献   

4.
应用序列同源性cDNA作探针,从大鼠小肠壁细胞cDNA文库中筛选出了葡萄糖转运蛋白cDNA克隆。经双脱氧法测定该克隆的cDNA全序列为2466bp,翻译区1566bp,编码522个氨基酸。从氨基酸全序列分析得到12个疏水区段,每区段为21个氨基酸。此葡萄糖转运蛋白可能在细胞膜上跨膜12次,构成葡萄糖通道。  相似文献   

5.
本文将前列腺癌(Prostate Cancer,PCa)PC-3M-1E8细胞为标靶的核酸适配体序列翻译成氨基酸序列,计算氨基酸序列的分子参数,然后用这些分子参数建立核酸适配体亲和性的构-效关系模型。所用的候选核酸适配体序列是采用以细胞为靶标的指数富集配体系统进化(Cell-SELEX)技术筛选得到。模型训练集、测试集分别包含150、50条核酸序列,均由第3轮和11轮的候选核酸适配体组成。将第3轮的核酸序列类标签值设置为"1",代表低亲和性、低特异性的候选核酸适配序列;将第11轮筛选所得核酸序列类标签值设置为"2",代表高亲和性、高特异性候选核酸适配序列。基于二值分类问题的支持向量机分类(SVC)算法用于建模。SVC模型对训练集、测试集的预测准确度分别为87.3%、86%。另外,采用SVC模型对第5、7、9轮的序列也进行了预测。第3、5、7、9、11轮的高亲和性与高特异性核酸适配体的分率分别是0.23、0.41、0.61、0.64、0.87,预测结果符合SELEX筛选的适配体进行规律。  相似文献   

6.
TRPM8通道的温度感知等生理功能依赖于正常的门控, 但现有晶体结构中S6跨膜螺旋C末端形成的门控结构存在氨基酸缺失, 所以其门控特性未能揭晓. 本文基于已有的晶体结构和AlphaFold算法构建了 11个完整不同构象的TRPM8通道, 发现其S6跨膜螺旋C末端构成的门控存在回环和螺旋2种构象. 在回环构象中, 多个氨基酸参与形成阻碍离子通透的孔道区; 而在螺旋构象中, 仅有关键氨基酸V956发挥门控作用. 由于回环构象的柔性大于螺旋构象, 导致回环构象参与阻碍离子通透的关键氨基酸构象和数量变化多样. 二级结构预测与模建结果表明, S6跨膜螺旋C末端存在回环构象向螺旋构象的转变, 此过程中柔性的回环构象结构域向胞外侧上移, 关键氨基酸向孔道衬外扭转, 增强了与相邻跨膜螺旋S5的相互作用以及S5与TRP螺旋之间的相互作用, 进而形成刚性、 稳定且有序的螺旋构象. 这增加了TRPM8通道各结构域间的协同性, 使能量信息更高效地传递到门控结构域, 为TRPM8通道开启蓄势.  相似文献   

7.
多肽序列的结构特征与其MHC限制性   总被引:6,自引:1,他引:5  
从多肽序列的一级结构出发,基于多肽序列中氨基酸侧链间距离和氨基酸侧链的电性特征,构建了多肽序列特征矢量,简称ζ矢量,选取文献中主要组织相容性复合物(MHC)中的14个Ad限制性和14个非Ad限制性多肽序列作为训练集建立辅助性T细胞(helperTlymphocytc,Th)表位预测的定量模型,为了检验该预报系统的精确性,进行了随机抽样检验和交互检验.结果表明,该预报系统具有稳定性好、预测能力强的特点,该方法可用于人的MHCⅠ和Ⅱ类表位的预测、蛋白质抗原免疫识别、亚单位疫苗分子设计及研制.  相似文献   

8.
董素梅  宋哲  刘涛  朱鸣华  刘伟 《化学学报》2010,68(18):1821-1828
基于独立成分分析方法分别采用3 z-scale和5 z-scale氨基酸结构描述符, 建立了抗原肽与MHC分子(major histocompatibility complex, MHC)相互作用结合的定量构效关系模型. 该两个模型训练集样本数是316, 预测集样本数是786. 结果表明: 3 z-scale模型的预测准确度和AUC值分别为70.3%, 0.70; 5 z-scale模型的预测准确度和AUC值分别为70.9%, 0.79. 本文建立CTL表位预测模型对进一步了解抗原肽与MHC I类分子相互作用机理具有一定的帮助.  相似文献   

9.
运用生物信息学软件对苏云金芽孢杆菌毒素Cry1Aa、Cry2Aa、Cry3Aa和Cry4Aa的 序列和基本参数、二级结构、三级结构、跨膜区和表面电势进行了预测比较。它们在一级 结构上有较大差异,但二级结构和跨膜区相似,三级结构中各毒素的结构域Ⅰ之间基本相 似,结构域Ⅱ之间差异较大,其中Cry2Aa为差异最大成员。4种毒素的表面电势分布不同。 毒素之间的结构相似性和差异性与其作用机理和杀虫特异性有关。  相似文献   

10.
SARS冠状病毒E蛋白的结构研究及功能预测   总被引:3,自引:0,他引:3  
结合生物信息学方法及分子模拟手段,选择较高准确度的方法,预测了SARSE蛋白的分子结构并探讨其潜在的生物学活性和功能.研究结果表明,SARSE蛋白跨膜区25个疏水的氨基酸形成α-螺旋结构,包埋于病毒外壳磷脂双分子层中;N端10个氨基酸残基位于膜外;C端41个残基则附着于磷脂双分子膜内侧.同时发现,C端由9个氨基酸组成的劈裂是一个可能的活性部位.对分子进行进一步静电势分析证实,E蛋白C端可能的活性部位具有较大的静电势,可能的活性残基具有最大电荷密度,故有较强的结合受体或与其它蛋白相互作用的能力.  相似文献   

11.
Accurately predicting phosphorylation sites in proteins is an important issue in postgenomics, for which how to efficiently extract the most predictive features from amino acid sequences for modeling is still challenging. Although both the distributed encoding method and the bio-basis function method work well, they still have some limits in use. The distributed encoding method is unable to code the biological content in sequences efficiently, whereas the bio-basis function method is a nonparametric method, which is often computationally expensive. As hidden Markov models (HMMs) can be used to generate one model for one cluster of aligned protein sequences, the aim in this study is to use HMMs to extract features from amino acid sequences, where sequence clusters are determined using available biological knowledge. In this novel method, HMMs are first constructed using functional sequences only. Both functional and nonfunctional training sequences are then inputted into the trained HMMs to generate functional and nonfunctional feature vectors. From this, a machine learning algorithm is used to construct a classifier based on these feature vectors. It is found in this work that (1) this method provides much better prediction accuracy than the use of HMMs only for prediction, and (2) the support vector machines (SVMs) algorithm outperforms decision trees and neural network algorithms when they are constructed on the features extracted using the trained HMMs.  相似文献   

12.
蛋白质折叠类型的分类建模与识别   总被引:2,自引:0,他引:2  
刘岳  李晓琴  徐海松  乔辉 《物理化学学报》2009,25(12):2558-2564
蛋白质的氨基酸序列如何决定空间结构是当今生命科学研究中的核心问题之一. 折叠类型反映了蛋白质核心结构的拓扑模式, 折叠识别是蛋白质序列-结构研究的重要内容. 我们以占Astral 1.65序列数据库中α, β和α/β三类蛋白质总量41.8%的36个无法独立建模的折叠类型为研究对象, 选取其中序列一致性小于25%的样本作为训练集, 以均方根偏差(RMSD)为指标分别进行系统聚类, 生成若干折叠子类, 并对各子类建立基于多结构比对算法(MUSTANG)结构比对的概形隐马尔科夫模型(profile-HMM). 将Astral 1.65中序列一致性小于95%的9505个样本作为检验集, 36个折叠类型的平均识别敏感性为90%, 特异性为99%, 马修斯相关系数(MCC)为0.95. 结果表明: 对于成员较多, 无法建立统一模型的折叠类型, 基于RMSD的系统分类建模均可实现较高准确率的识别, 为蛋白质折叠识别拓展了新的方法和思路, 为进一步研究奠定了基础.  相似文献   

13.
On the basis of information on the evolution of the 20 amino acids and their physiochemical characteristics, we propose a new two-dimensional (2D) graphical representation of protein sequences in this article. By this representation method, we use 2D data to represent three-dimensional information constructed by the amino acids' evolution index, the class information of amino acid based on physiochemical characteristics, and the order of the amino acids appearing in the protein sequences. Then, using discrete Fourier transform, the sequence signals with different lengths can be transformed to the frequency domain, in which the sequences are with the same length. A new method is used to analyze the protein sequence similarity and to predict the protein structural class. The experiments indicate that our method is effective and useful.  相似文献   

14.
In the design of peptide inhibitors the huge possible variety of the peptide sequences is of high concern. In collaboration with the fast accumulation of the peptide experimental data and database, a statistical method is suggested for peptide inhibitor design. In the two-level peptide prediction network (2L-QSAR) one level is the physicochemical properties of amino acids and the other level is the peptide sequence position. The activity contributions of amino acids are the functions of physicochemical properties and the sequence positions. In the prediction equation two weight coefficient sets {ak} and {bl} are assigned to the physicochemical properties and to the sequence positions, respectively. After the two coefficient sets are optimized based on the experimental data of known peptide inhibitors using the iterative double least square (IDLS) procedure, the coefficients are used to evaluate the bioactivities of new designed peptide inhibitors. The two-level prediction network can be applied to the peptide inhibitor design that may aim for different target proteins, or different positions of a protein. A notable advantage of the two-level statistical algorithm is that there is no need for host protein structural information. It may also provide useful insight into the amino acid properties and the roles of sequence positions.  相似文献   

15.
In this paper, we propose a method to create the 60-dimensional feature vector for protein sequences via the general form of pseudo amino acid composition. The construction of the feature vector is based on the contents of amino acids, total distance of each amino acid from the first amino acid in the protein sequence and the distribution of 20 amino acids. The obtained cosine distance metric (also called the similarity matrix) is used to construct the phylogenetic tree by the neighbour joining method. In order to show the applicability of our approach, we tested it on three proteins: 1) ND5 protein sequences from nine species, 2) ND6 protein sequences from eight species, and 3) 50 coronavirus spike proteins. The results are in agreement with known history and the output from the multiple sequence alignment program ClustalW, which is widely used. We have also compared our phylogenetic results with six other recently proposed alignment-free methods. These comparisons show that our proposed method gives a more consistent biological relationship than the others. In addition, the time complexity is linear and space required is less as compared with other alignment-free methods that use graphical representation. It should be noted that the multiple sequence alignment method has exponential time complexity.  相似文献   

16.
The protein disulfide bond is a covalent bond that forms during post-translational modification by the oxidation of a pair of cysteines. In protein, the disulfide bond is the most frequent covalent link between amino acids after the peptide bond. It plays a significant role in three-dimensional (3D) ab initio protein structure prediction (aiPSP), stabilizing protein conformation, post-translational modification, and protein folding. In aiPSP, the location of disulfide bonds can strongly reduce the conformational space searching by imposing geometrical constraints. Existing experimental techniques for the determination of disulfide bonds are time-consuming and expensive. Thus, developing sequence-based computational methods for disulfide bond prediction becomes indispensable. This study proposed a stacking-based machine learning approach for disulfide bond prediction (diSBPred). Various useful sequence and structure-based features are extracted for effective training, including conservation profile, residue solvent accessibility, torsion angle flexibility, disorder probability, a sequential distance between cysteines, and more. The prediction of disulfide bonds is carried out in two stages: first, individual cysteines are predicted as either bonding or non-bonding; second, the cysteine-pairs are predicted as either bonding or non-bonding by including the results from cysteine bonding prediction as a feature.The examination of the relevance of the features employed in this study and the features utilized in the existing nearest neighbor algorithm (NNA) method shows that the features used in this study improve about 7.39 % in jackknife validation balanced accuracy. Moreover, for individual cysteine bonding prediction and cysteine-pair bonding prediction, diSBPred provides a 10-fold cross-validation balanced accuracy of 82.29 % and 94.20 %, respectively. Altogether, our predictor achieves an improvement of 43.25 % based on balanced accuracy compared to the existing NNA based approach. Thus, diSBPred can be utilized to annotate the cysteine bonding residues of protein sequences whose structures are unknown as well as improve the accuracy of the aiPSP method, which can further aid in experimental studies of the disulfide bond and structure determination.  相似文献   

17.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

18.
Prediction of transmembrane beta-strands in outer membrane proteins (OMP) is one of the important problems in computational chemistry and biology. In this work, we propose a method based on neural networks for identifying the membrane-spanning beta-strands. We introduce the concept of "residue probability" for assigning residues in transmembrane beta-strand segments. The performance of our method is evaluated with single-residue accuracy, correlation, specificity, and sensitivity. Our predicted segments show a good agreement with experimental observations with an accuracy level of 73% solely from amino acid sequence information. Further, the predictive power of N- and C-terminal residues in each segments, number of segments in each protein, and the influence of cutoff probability for identifying membrane-spanning beta-strands will be discussed. We have developed a Web server for predicting the transmembrane beta-strands from the amino acid sequence, and the prediction results are available at http://psfs.cbrc.jp/tmbeta-net/.  相似文献   

19.
A computational model, IMP-TYPE, is proposed for the classification of five types of integral membrane proteins from protein sequence. The proposed model aims not only at providing accurate predictions but most importantly it incorporates interesting and transparent biological patterns. When contrasted with the best-performing existing models, IMP-TYPE reduces the error rates of these methods by 19 and 34% for two out-of-sample tests performed on benchmark datasets. Our empirical evaluations also show that the proposed method provides even bigger improvements, i.e., 29 and 45% error rate reductions, when predictions are performed for sequences that share low (40%) identity with sequences from the training dataset. We also show that IMP-TYPE can be used in a standalone mode, i.e., it duplicates significant majority of correct predictions provided by other leading methods, while providing additional correct predictions which are incorrectly classified by the other methods. Our method computes predictions using a Support Vector Machine classifier that takes feature-based encoded sequence as its input. The input feature set includes hydrophobic AA pairs, which were selected by utilizing a consensus of three feature selection algorithms. The hydrophobic residues that build up the AA pairs used by our method are shown to be associated with the formation of transmembrane helices in a few recent studies concerning integral membrane proteins. Our study also indicates that Met and Phe display a certain degree of hydrophobicity, which may be more crucial than their polarity or aromaticity when they occur in the transmembrane segments. This conclusion is supported by a recent study on potential of mean force for membrane protein folding and a study of scales for membrane propensity of amino acids.  相似文献   

20.
The study of type III RNases constitutes an important area in molecular biology. It is known that the pac1+ gene encodes a particular RNase III that shares low amino acid similarity with other genes despite having a double-stranded ribonuclease activity. Bioinformatics methods based on sequence alignment may fail when there is a low amino acidic identity percentage between a query sequence and others with similar functions (remote homologues) or a similar sequence is not recorded in the database. Quantitative structure-activity relationships (QSAR) applied to protein sequences may allow an alignment-independent prediction of protein function. These sequences of QSAR-like methods often use 1D sequence numerical parameters as the input to seek sequence-function relationships. However, previous 2D representation of sequences may uncover useful higher-order information. In the work described here we calculated for the first time the spectral moments of a Markov matrix (MMM) associated with a 2D-HP-map of a protein sequence. We used MMMs values to characterize numerically 81 sequences of type III RNases and 133 proteins of a control group. We subsequently developed one MMM-QSAR and one classic hidden Markov model (HMM) based on the same data. The MMM-QSAR showed a discrimination power of RNAses from other proteins of 97.35% without using alignment, which is a result as good as for the known HMM techniques. We also report for the first time the isolation of a new Pac1 protein (DQ647826) from Schizosaccharomyces pombe strain 428-4-1. The MMM-QSAR model predicts the new RNase III with the same accuracy as other classical alignment methods. Experimental assay of this protein confirms the predicted activity. The present results suggest that MMM-QSAR models may be used for protein function annotation avoiding sequence alignment with the same accuracy of classic HMM models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号