首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The subcellular location of a protein is closely correlated with it biological function. In this paper, two new pattern classification methods termed as Nearest Feature Line (NFL) and Tunable Nearest Neighbor (TNN) have been introduced to predict the subcellular location of proteins based on their amino acid composition alone. The simulation experiments were performed with the jackknife test on a previously constructed data set, which consists of 2,427 eukaryotic and 997 prokaryotic proteins. All protein sequences in the data set fall into four eukaryotic subcellular locations and three prokaryotic subcellular locations. The NFL classifier reached the total prediction accuracies of 82.5% for the eukaryotic proteins and 91.0% for the prokaryotic proteins. The TNN classifier reached the total prediction accuracies of 83.6 and 92.2%, respectively. It is clear that high prediction accuracies have been achieved. Compared with Support Vector Machine (SVM) and Nearest Neighbor methods, these two methods display similar or even higher prediction accuracies. Hence, we conclude that NFL and TNN can be used as complementary methods for prediction of protein subcellular locations.  相似文献   

2.
The function of eukaryotic protein is closely correlated with its subcellular location. The number of newly found protein sequences entering into data banks is rapidly increasing with the success of human genome project. It is highly desirable to predict a protein subcellular automatically from its amino acid sequence. In this paper, amino acid hydrophobic patterns and average power-spectral density (APSD) are introduced to define pseudo amino acid composition. The covariant-discriminant predictor is used to predict subcellular location. Immune-genetic algorithm (IGA) is used to find the fittest weight factors which are very important in this method. As such, high success rates are obtained by both self-consistency test (86%) and jackknife test (73%). More than 80% predictive accuracy is achieved in independent dataset test. The results demonstrate that the proposed method is practical. And, the method illuminates that the protein subcellular location can be predicted from its surface physio-chemical characteristic of protein folding.  相似文献   

3.
Apoptosis proteins play an essential role in the development and homeostasis of an organism. The accurate prediction of subcellular location for apoptosis proteins is helpful for understanding the mechanism of programmed cell death and their biological functions. In this article, a new apoptosis proteins localization algorithm, named PSSP, is proposed based on the predicted cleavage sites of primary protein sequences. First, protein chains are divided into N‐terminal signal parts and mature protein parts according to their predicted cleavage sites by SignalP. Then, amino acid composition (ACC) of the individual subsequence together with pseudo‐ACC and stereochemical properties of whole chain were extracted to represent a given protein sequence. Jackknife test by support vector machine on three broadly used datasets (ZD98, ZW225, and CL317 datasets) of apoptosis proteins demonstrated that the total accuracies by this approach are 93.9, 87.6, and 91.5%, respectively. In addition, an independent nonapoptosis benchmark dataset (NNPSL) was also used to evaluate the performance of this method, and predictive accuracies for eukaryotic and prokaryotic proteins are also comparable to existing methods. © 2013 Wiley Periodicals, Inc.  相似文献   

4.
The function of transmembrane (TM) proteins is closely correlated to their TM topology; large quantities of highly reliable TM topology data are becoming increasingly required. We present a new consensus approach for TM topology prediction (ConPred_elite) that can predict the whole topology with accuracies of 0.98 for prokaryotic and 0.95 for eukaryotic proteins on a dataset of experimentally-characterized TM topologies. The predicted yield on the dataset is 30.4% for prokaryotic and 21.5% for eukaryotic proteins. Applying ConPred_elite to predicted TM proteins extracted from 29 prokaryotic and 10 eukaryotic proteomes, we obtained 3871 and 7271 highly reliable TM topologies (yields, 19.8 and 13.3%), respectively. The predicted TM topology data may contribute to further research into a comprehensive functional classification and identification of TM proteins based on information of the topology.  相似文献   

5.
Predicting the location where a protein resides within a cell is important in cell biology. Computational approaches to this issue have attracted more and more attentions from the community of biomedicine. Among the protein features used to predict the subcellular localization of proteins, the feature derived from Gene Ontology (GO) has been shown to be superior to others. However, most of the sights in this field are set on the presence or absence of some predefined GO terms. We proposed a method to derive information from the intrinsic structure of the GO graph. The feature vector was constructed with each element in it representing the information content of the GO term annotating to a protein investigated, and the support vector machines was used as classifier to test our extracted features. Evaluation experiments were conducted on three protein datasets and the results show that our method can enhance eukaryotic and human subcellular location prediction accuracy by up to 1.1% better than previous studies that also used GO-based features. Especially in the scenario where the cellular component annotation is absent, our method can achieved satisfied results with an overall accuracy of more than 87%.  相似文献   

6.
7.
The protein sequence of the hyoscyamine 6β-hydroxylase gene from Hyoscyamusniger was analysed in silico for its potential of heterologous expression. Therefore different parameters determining the proteins properties and structure in prokaryotic or eukaryotic protein expression systems were taken into account. In silico prediction of co- and post-translational modifications revealed 25 putative glycosylation sites, one of which reported to be a co-factor stabilizing residue in 2-oxoglutarate dependent dioxygenases. Potential protein solubility and degradation (PEST) motifs were also evaluated. Together with the calculated physico-chemical properties the results indicated reasonable solubility but potential instability of the protein in Escherichia coli and Saccharomyces cerevisiae. Further a synthetic h6h-gene was introduced into the prokaryotic or eukaryotic hostsEscherichia coli and Saccharomyces cerevisiae to determine protein expression. The protein could be expressed in both organisms, though stability was confirmed to be an issue.  相似文献   

8.
Protein structural class prediction solely from protein sequences is a challenging problem in bioinformatics. Numerous efficient methods have been proposed for protein structural class prediction, but challenges remain. Using novel combined sequence information coupled with predicted secondary structural features (PSSF), we proposed a novel scheme to improve prediction of protein structural classes. Given an amino acid sequence, we first transformed it into a reduced amino acid sequence and calculated its word frequencies and word position features to combine novel sequence information. Then we added the PSSF to the combine sequence information to predict protein structural classes. The proposed method was tested on four benchmark datasets in low homology and achieved the overall prediction accuracies of 83.1%, 87.0%, 94.5%, and 85.2%, respectively. The comparison with existing methods demonstrates that the overall improvements range from 2.3% to 27.5%, which indicates that the proposed method is more efficient, especially for low-homology amino acid sequences.  相似文献   

9.
Using the pseudo amino acid (PseAA) composition to represent the sample of a protein can incorporate a considerable amount of sequence pattern information so as to improve the prediction quality for its structural or functional classification. However, how to optimally formulate the PseAA composition is an important problem yet to be solved. In this article the grey modeling approach is introduced that is particularly efficient in coping with complicated systems such as the one consisting of many proteins with different sequence orders and lengths. On the basis of the grey model, four coefficients derived from each of the protein sequences concerned are adopted for its PseAA components. The PseAA composition thus formulated is called the "grey-PseAA" composition that can catch the essence of a protein sequence and better reflect its overall pattern. In our study we have demonstrated that introduction of the grey-PseAA composition can remarkably enhance the success rates in predicting the protein structural class. It is anticipated that the concept of grey-PseAA composition can be also used to predict many other protein attributes, such as subcellular localization, membrane protein type, enzyme functional class, GPCR type, protease type, among many others.  相似文献   

10.
BackgroundDiscover possible Drug Target Interactions (DTIs) is a decisive step in the detection of the effects of drugs as well as drug repositioning. There is a strong incentive to develop effective computational methods that can effectively predict potential DTIs, as traditional DTI laboratory experiments are expensive, time-consuming, and labor-intensive. Some technologies have been developed for this purpose, however large numbers of interactions have not yet been detected, the accuracy of their prediction still low, and protein sequences and structured data are rarely used together in the prediction process.MethodsThis paper presents DTIs prediction model that takes advantage of the special capacity of the structured form of proteins and drugs. Our model obtains features from protein amino-acid sequences using physical and chemical properties, and from drugs smiles (Simplified Molecular Input Line Entry System) strings using encoding techniques. Comparing the proposed model with different existing methods under K-fold cross validation, empirical results show that our model based on ensemble learning algorithms for DTI prediction provide more accurate results from both structures and features data.ResultsThe proposed model is applied on two datasets:Benchmark (feature only) datasets and DrugBank (Structure data) datasets. Experimental results obtained by Light-Boost and ExtraTree using structures and feature data results in 98 % accuracy and 0.97 f-score comparing to 94 % and 0.92 achieved by the existing methods. Moreover, our model can successfully predict more yet undiscovered interactions, and hence can be used as a practical tool to drug repositioning.A case study of applying our prediction model on the proteins that are known to be affected by Corona viruses in order to predict the possible interactions among these proteins and existing drugs is performed. Also, our model is applied on Covid-19 related drugs announced on DrugBank. The results show that some drugs like DB00691 and DB05203 are predicted with 100 % accuracy to interact with ACE2 protein. This protein is a self-membrane protein that enables Covid-19 infection. Hence, our model can be used as an effective tool in drug reposition to predict possible drug treatments for Covid-19.  相似文献   

11.
A new method has been developed for prediction of transmembrane helices using support vector machines. Different coding schemes of protein sequences were explored, and their performances were assessed by crossvalidation tests. The best performance method can predict the transmembrane helices with sensitivity of 93.4% and precision of 92.0%. For each predicted transmembrane segment, a score is given to show the strength of transmembrane signal and the prediction reliability. In particular, this method can distinguish transmembrane proteins from soluble proteins with an accuracy of approximately 99%. This method can be used to complement current transmembrane helix prediction methods and can be used for consensus analysis of entire proteomes. The predictor is located at http://genet.imb.uq.edu.au/predictors/SVMtm.  相似文献   

12.
Prediction of protein structural classes and subcellular locations   总被引:1,自引:0,他引:1  
The structural class and subcellular location are the two important features of proteins that are closely related to their biological functions. With the rapid increase in new protein sequences entering into data banks, it is highly desirable to develop a fast and accurate method for predicting the attributes of these features for them. This can expedite the functionality determination of new proteins and the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Various prediction methods have been developed during the last two decades. This review is devoted to presenting a systematic introduction and comparison of the existing methods in respect to the prediction algorithm and classification scheme. The attention is focused on the state-of-the-art, which is featured by the covarient-discriminant algorithm developed very recently, as well as some new classification schemes for protein structural classes and subcellular locations. Particularly, addressed are also the physical chemistry foundation of the existing prediction methods, and the essence why the covariant-discriminant algorithm is so powerful.  相似文献   

13.
This is the continuation of our studies to use very basic information on enzyme to predict optimal reaction parameters in enzymatic reactions because the gap between available enzyme sequences and their available reaction parameters is widening. In this study, 23 features selected from 540 plus features of individual amino acid as well as a feature combined whole protein information were screened as independents in a 20-1 feedforward backpropagation neural network for predicting optimal pH in beta-glucosidase’s hydrolytic reaction because this enzyme drew attention recently due to its role in biofuel industry. The results show that 11 features can be used as independents for the prediction, while the feature of amino acid distribution probability works better than the rest independents for the prediction. Our study paves a way to predict the optimal reaction parameters of enzymes based on the amino acid features of enzyme sequences.  相似文献   

14.
Sirtuins are a family of proteins that play a key role in regulating a wide range of cellular processes including DNA regulation, metabolism, aging/longevity, cell survival, apoptosis, and stress resistance. Sirtuins are protein deacetylases and include in the class III family of histone deacetylase enzymes (HDACs). The class III HDACs contains seven members of the sirtuin family from SIRT1 to SIRT7. The seven members of the sirtuin family have various substrates and are present in nearly all subcellular localizations including the nucleus, cytoplasm, and mitochondria. In this study, a deep neural network approach using one-dimensional Convolutional Neural Networks (CNN) was proposed to build a prediction model that can accurately identify the outcome of the sirtuin protein by targeting their subcellular localizations. Therefore, the function and localization of sirtuin targets were analyzed and annotated to compartmentalize into distinct subcellular localizations. We further reduced the sequence similarity between protein sequences and three feature extraction methods were applied in datasets. Finally, the proposed method has been tested and compared with various machine-learning algorithms. The proposed method is validated on two independent datasets and showed an average of up to 85.77 % sensitivity, 97.32 % specificity, and 0.82 MCC for seven members of the sirtuin family of proteins.  相似文献   

15.
Mitochondria are eukaryotic organelles originated from a single bacterial endosymbiosis about 2 billion years ago. One of the earliest events in the evolution of mitochondria was the acquisition of a mechanism that facilitated the import of proteins from cytosol. The mitochondrial protein import machinery consists of dozens of subunits, and they are of modular design. However, to date, it is not clear when certain component was added to the machinery. Using extensive homology searches, the evolutionary history of the mitochondrial protein import machinery was reconstructed. The results indicated that 6 of the 35 subunits have homologs in prokaryote, suggesting that they were prokaryotic origin; the major subunit gains were occurred in the earliest stage of eukaryotic evolution; subsequent to the gain of these conserved set of subunits, the mitochondrial protein import machinery components diversified along the eukaryotic lineages and a number of lineage-specific subunits can be observed. Furthermore, protein import systems of mitochondria-like organelles (hydrogenosomes and mitosomes) have dramatically reduced their subunit contents, however, they share most of the prokaryotic origin components with mitochondrion.  相似文献   

16.
It is known that several prokaryotic protein sequences, characterized by high homology with the eukaryotic Cu,ZnSODs, lack some of the metal ligands. In the present work, we have stepwise reintroduced the two missing copper ligands in the SOD-like protein of Bacillus subtilis, through site-directed mutagenesis. The mutant with three out of the four His that bind copper is not active, whereas the fully reconstituted mutant displays an activity of about 10% that of human Cu,ZnSOD. The mutated proteins have been characterized in solution and in the solid state. In solution, the proteins experience conformational disorder, which is believed to be partly responsible for the decreased enzymatic activity and sheds light on the tendency of several human SOD mutants to introduce mobility in the protein frame. In the crystal, on the contrary, the protein has a well-defined conformation, giving rise to dimers through the coordination of an exogenous zinc ion. The catalytic properties of the double mutant, which might be regarded as a step in an artificial evolution from a nonactive SOD to a fully functioning enzyme, are discussed on the basis of the structural and dynamical properties.  相似文献   

17.
The ability to predict protein folding rates constitutes an important step in understanding the overall folding mechanisms. Although many of the prediction methods are structure based, successful predictions can also be obtained from the sequence. We developed a novel method called prediction of protein folding rates (PPFR), for the prediction of protein folding rates from protein sequences. PPFR implements a linear regression model for each of the mainstream folding dynamics including two-, multi-, and mixed-state proteins. The proposed method provides predictions characterized by strong correlations with the experimental folding rates, which equal 0.87 for the two- and multistate proteins and 0.82 for the mixed-state proteins, when evaluated with out-of-sample jackknife test. Based on in-sample and out-of-sample tests, the PPFR's predictions are shown to be better than most of other sequence only and structure-based predictors and complementary to the predictions of the most recent sequence-based QRSM method. We show that simultaneous incorporation of several characteristics, including the sequence, physiochemical properties of residues, and predicted secondary structure provides improved quality. This hybridized prediction model was analyzed to reveal the complementary factors that can be used in tandem to predict folding rates. We show that bigger proteins require more time for folding, higher helical and coil content and the presence of Phe, Asn, and Gln may accelerate the folding process, the inclusion of Ile, Val, Thr, and Ser may slow down the folding process, and for the two-state proteins increased beta-strand content may decelerate the folding process. Finally, PPFR provides strong correlation when predicting sequences with low similarity.  相似文献   

18.
As several structural proteomic projects are producing an increasing number of protein structures with unknown function, methods that can reliably predict protein functions from protein structures are in urgent need. In this paper, we present a method to explore the clustering patterns of amino acids on the 3-dimensional space for protein function prediction. First, amino acid residues on a protein structure are clustered into spatial groups using hierarchical agglomerative clustering, based on the distance between them. Second, the protein structure is represented using a graph, where each node denotes a cluster of amino acids. The nodes are labeled with an evolutionary profile derived from the multiple alignment of homologous sequences. Then, a shortest-path graph kernel is used to calculate similarities between the graphs. Finally, a support vector machine using this graph kernel is used to train classifiers for protein function prediction. We applied the proposed method to two separate problems, namely, prediction of enzymes and prediction of DNA-binding proteins. In both cases, the results showed that the proposed method outperformed other state-of-the-art methods.  相似文献   

19.
硒蛋白的分子生物学及与疾病的关系*   总被引:3,自引:0,他引:3  
刘琼  姜亮  田静  倪嘉缵 《化学进展》2009,21(5):819-830
硒蛋白是微量元素硒在体内存在和发挥生物功能的主要形式。因硒蛋白的活性中心硒代半胱氨酸由传统终止码TGA编码,故从基因组中预测硒蛋白以及用基因工程技术表达硒蛋白均很困难。有关硒抗氧化、对癌症、神经退行性疾病和病毒作用的报导较多,但结论并不一致。本文综述了硒蛋白基因预测、蛋白质表达调控以及硒和硒蛋白对癌症、神经退行性疾病和病毒的作用及机制等方面的近期进展,研究提高硒蛋白生物信息学预测准确率和基因工程表达量的方法,分析了解硒蛋白与疾病发生发展的关系和机制,探索不同硒蛋白作为预防药物开发、作为癌症治疗和药物筛选靶标的可能性。  相似文献   

20.
Protein structural class describes the overall folding type of a protein or its domain. A number of methods were developed to predict protein structural class based on its primary sequence. The homology of the predicted sequences with respect to the training sequences is a key attribute for the prediction performance. In this article we investigated the FDOD method developed by Jin et al. [Jin, L., Fang, W., Tang, H., 2003. Prediction of protein structural classes by a new measure of information discrepancy. Comput. Biol. Chem. 27, 373-380], which gave high prediction accuracy on a low homology dataset, and we empirically confirmed that the reported results were an artifact of improper implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号