共查询到13条相似文献,搜索用时 9 毫秒
1.
Gene ontology (GO) is a standardized and controlled vocabulary of terms that describe the molecular functions, biological roles and cellular locations of proteins. GO terms and GO hierarchy are regularly updated as the accumulated biological knowledge. More than 50,000 terms are included in GO and each protein is annotated with several or dozens of these terms. Therefore, accurately predicting the association between proteins and massive GO terms is rather challenging. To accurately predict the association between massive GO terms and proteins, we proposed a method called Hashing GO for protein function prediction (HashGO in short). HashGO firstly adopts a protein-term association matrix to store available GO annotations of proteins. Then, it tailors a graph hashing method to explore the underlying structure between GO terms and to obtain a series of hash functions to compress the high-dimensional protein-term association matrix into a low-dimensional one. Next, HashGO computes the semantic similarity between proteins based on Hamming distance on that low-dimensional matrix. After that, it predicts missing annotations of a protein based on the annotations of its semantic neighbors. Experimental results on archived GO annotations of two model species (Yeast and Human) show that HashGO not only more accurately predicts functions than other related approaches, but also runs faster than them. 相似文献
2.
Protein amino acid sequences can be used to determine the functions of the protein. However, determining the function of a single protein requires many resources and a tremendous amount of time. Computational Intelligence methods such as Deep learning have been shown to predict the proteins' functions. This paper proposes a hybrid deep neural network model to predict an unknown protein's functions from sequences. The proposed model is named Deep_CNN_LSTM_GO. Deep_CNN_LSTM_GO is an Integration between Convolutional Neural network (CNN) and Long Short-Term Memory (LSTM) Neural Network to learn features from amino acid sequences and outputs the three different Gene Ontology (GO). The gene ontology represents the protein functions in the three sub-ontologies: Molecular Functions (MF), Biological Process (BP), and Cellular Component (CC). The proposed model has been trained and tested using UniProt-SwissProt's dataset. Another test has been done using Computational Assessment of Function Annotation (CAFA) on the three sub-ontologies. The proposed model outperforms different methods proposed in the field with better performance using three different evaluation metrics (Fmax, Smin, and AUPR) in the three sub-ontologies (MF, BP, CC). 相似文献
3.
Proteins are one of the most important molecules that govern the cellular processes in most of the living organisms. Various functions of the proteins are of paramount importance to understand the basics of life. Several supervised learning approaches are applied in this field to predict the functionality of proteins. In this paper, we propose a convolutional neural network based approach ProtConv to predict the functionality of proteins by converting the amino-acid sequences to a two dimensional image. We have used a protein embedding technique using transfer learning to generate the feature vector. Feature vector is then converted into a square sized single channel image to be fed into a convolutional network. The neural network architecture used here is a combination of convolutional filters and average pooling layers followed by dense fully connected layers to predict a binary function. We have performed experiments on standard benchmark datasets taken from two very important protein function prediction task: proinflammatory cytokines and anticancer peptides. Our experiments show that the proposed method, ProtConv achieves state-of-the-art performances on both of the datasets. All necessary details about implementation with source code and datasets are made available at: https://github.com/swakkhar/ProtConv. 相似文献
4.
蛋白质相互作用预测、设计与调控 总被引:1,自引:0,他引:1
蛋白质相互作用是生命活动在分子水平上的基本事件. 蛋白质相互作用的三维图像可以给出关键生命活动过程的分子细节. 了解蛋白质相互作用的原理有助于揭示生命活动的机制, 并在此基础上开展有重要价值的蛋白质设计. 本文对于蛋白质相互作用预测、设计和调控研究的近期进展进行了总结归纳, 介绍了作者实验室在相关领域的研究进展, 并对今后的研究方向进行了展望. 主要包括: (1) 蛋白质相互作用网络、蛋白质相互作用机制和蛋白质复合物结构计算分析; (2) 基于序列、结合位点以及复合物结构的蛋白质相互作用预测; (3)蛋白质相互作用设计方法; (4) 利用化学分子调控蛋白质相互作用的方法; (5) 针对蛋白质相互作用的蛋白质药物设计方法. 相似文献
5.
Drug discovery processes require drug-target interaction (DTI) prediction by virtual screenings with high accuracy. Compared with traditional methods, the deep learning method requires less time and domain expertise, while achieving higher accuracy. However, there is still room for improvement for higher performance with simplified structures. Meanwhile, this field is calling for multi-task models to solve different tasks. Here we report the GanDTI, an end-to-end deep learning model for both interaction classification and binding affinity prediction tasks. This model employs the compound graph and protein sequence data. It only consists of a graph neural network, an attention module and a multiple-layer perceptron, yet outperforms the state-of-the art methods to predict binding affinity and interaction classification on the DUD-E, human, and bindingDB benchmark datasets. This demonstrates our refined model is highly effective and efficient for DTI prediction and provides a new strategy for performance improvement. 相似文献
6.
Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence, physicochemical property, subsequence and annotation features with a total of 9890 features extracted and/or calculated for 171,212 reviewed prokaryotic proteins of 9 bacterial phyla from UniProtKB, to train a supervised deep learning ensemble model with the aim to categorize a bacterial hypothetical/unreviewed protein’s function into 1739 GO terms as functional classes. The proposed system being fully dedicated to bacterial organisms is a novel attempt amongst various existing machine learning based protein function prediction systems based on mixed organisms. Experimental results demonstrate the success of the proposed deep learning ensemble model based on deep neural network method with F1 measure of 0.7912 on the prepared Test dataset 1 of reviewed proteins. 相似文献
7.
Literature contains over fifty years of accumulated methods proposed by researchers for predicting the secondary structures of proteins in silico. A large part of this collection is comprised of artificial neural network-based approaches, a field of artificial intelligence and machine learning that is gaining increasing popularity in various application areas. The primary objective of this paper is to put together the summary of works that are important but sparse in time, to help new researchers have a clear view of the domain in a single place. An informative introduction to protein secondary structure and artificial neural networks is also included for context. This review will be valuable in designing future methods to improve protein secondary structure prediction accuracy. The various neural network methods found in this problem domain employ varying architectures and feature spaces, and a handful stand out due to significant improvements in prediction. Neural networks with larger feature scope and higher architecture complexity have been found to produce better protein secondary structure prediction. The current prediction accuracy lies around the 84% marks, leaving much room for further improvement in the prediction of secondary structures in silico. It was found that the estimated limit of 88% prediction accuracy has not been reached yet, hence further research is a timely demand. 相似文献
8.
Accurate prediction of protein secondary structure is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. The accuracy of ab initio secondary structure prediction from sequence, however, has only increased from around 77 to 80% over the past decade. Here, we developed a multistep neural-network algorithm by coupling secondary structure prediction with prediction of solvent accessibility and backbone torsion angles in an iterative manner. Our method called SPINE X was applied to a dataset of 2640 proteins (25% sequence identity cutoff) previously built for the first version of SPINE and achieved a 82.0% accuracy based on 10-fold cross validation (Q(3)). Surpassing 81% accuracy by SPINE X is further confirmed by employing an independently built test dataset of 1833 protein chains, a recently built dataset of 1975 proteins and 117 CASP 9 targets (critical assessment of structure prediction techniques) with an accuracy of 81.3%, 82.3% and 81.8%, respectively. The prediction accuracy is further improved to 83.8% for the dataset of 2640 proteins if the DSSP assignment used above is replaced by a more consistent consensus secondary structure assignment method. Comparison to the popular PSIPRED and CASP-winning structure-prediction techniques is made. SPINE X predicts number of helices and sheets correctly for 21.0% of 1833 proteins, compared to 17.6% by PSIPRED. It further shows that SPINE X consistently makes more accurate prediction in helical residues (6%) without over prediction while PSIPRED makes more accurate prediction in coil residues (3-5%) and over predicts them by 7%. SPINE X Server and its training/test datasets are available at http://sparks.informatics.iupui.edu/ 相似文献
9.
Protein structure prediction from inaccurate and sparse NMR data using an enhanced genetic algorithm
Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic algorithm uses (i) a greedy mutation and crossover operator to intensify the search; (ii) a twin removal technique for diversification in the population; (iii) a random restart method to recover from stagnation; and (iv) a compaction factor to reduce the search space. Reducing the search space drastically, our approach improves the quality of the search. We tested our algorithms on a set of standard benchmarks. Experimentally, we show that our enhanced genetic algorithms significantly outperforms the traditional genetic algorithms and a previously proposed state-of-the-art method. Our method is capable of producing structures that are very close to the native structures and hence, the experimental biologists could adopt it to determine more accurate protein structures from NMR data. 相似文献
10.
Genetic evolution of carbonic anhydrase enzyme provides an interesting instance of functional similarity in spite of structural diversity of the members of a given family of enzymes. Phylogenetic analysis of α-, β- and γ-carbonic anhydrase was carried out to determine the evolutionary relationships among various members of the family with the enzyme marking its presence in a wide range of cellular and chromosomal locations. The presence of more than one class of enzymes in a particular organism was revealed by phylogenetic time tree. The evolutionary relationships among the members of animal, plant and microbial kingdom were developed. The study revises a long-established notion of kingdom-specificity of the different classes of carbonic anhydrases and provides a new version of the presence of multiple classes of carbonic anhydrases in a single organism and the presence of a given class of carbonic anhydrase across different kingdoms. 相似文献
11.
Huitao Liu Yingying Wen Feng Luan Yuan Gao Xiuyong Li 《Central European Journal of Chemistry》2009,7(3):439-445
The half-wave potential (E1/2) is an important electrochemical property of organic compounds. In this work, a quantitative structure-property relationship
(QSPR) analysis has been conducted on the half-wave reduction potential (E1/2) of 40 substituted benzoxazines by means of both a heuristic method (HM) and a non-linear radial basis function neural network
(RBFNN) modeling method. The statistical parameters provided by the HM model (R2 =0.946; F=152.576; RMSCV=0.0141) and the RBFNN model (R2=0.982; F=1034.171 and RMS =0.0209) indicated satisfactory stability and predictive ability. The obtained models showed that
benzoxazines with larger Min valency of a S atom (MVSA), lower Relative number of H atom (RNHA) and Min n-n repulsion for
a C-H bond (MnnRCHB) and Minimal Electrophilic Reactivity Index for a C atom (MERICA) can be more easily reduced. This QSPR
approach can contribute to a better understanding of structural factors of the organic compounds that contribute to the E1/2, and can be useful in predicting the E1/2 of other compounds.
相似文献
12.
13.
Iris Stappen Gerhard Buchbauer Wolfgang Robien Peter Wolschann 《Magnetic resonance in chemistry : MRC》2009,47(9):720-726
A systematic investigation of a series of santalol and epi‐santalol derivatives by means of ab initio and density functional theory (DFT) calculations together with database‐oriented prediction methods leads to a configurational reassignment within this compound class. The DFT calculations as well as the HOSE‐code and neural network‐based predictions allow deriving a general rule set for unambiguous assignment within this compound class. The methyl group in position 2′ serves as an indication for the configuration at this stereocenter allowing easy differentiation between santalol derivatives and their diastereomers belonging to the epi‐santalol series. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献