首页 | 本学科首页   官方微博 | 高级检索  
     检索      


A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species
Institution:1. Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India;2. Department of Biophysics, All India Institute of Medical Sciences (AIIMS), New Delhi, 110 029, Delhi, India;1. Medical Mycology Research Center, Graduate School of Medicine, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8673, Japan;2. Department of Medicine, School of Medicine, University of Montreal, Montreal, Quebec, Canada;3. Department of Molecular Virology, Graduate School of Medicine, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba 260-8670, Japan;1. Department of CSE, SRM Institute of Science and Technology, Chennai, India;2. Department of ECE, Government Polytechnic College, Trichy, India;3. School of Computing Science and Engineering, Galgotias University, Greater Noida, India;4. Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail 1902, Bangladesh;5. Department of IT, Galgotias College of Engineering and Technology, Greater Noida, India
Abstract:Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence, physicochemical property, subsequence and annotation features with a total of 9890 features extracted and/or calculated for 171,212 reviewed prokaryotic proteins of 9 bacterial phyla from UniProtKB, to train a supervised deep learning ensemble model with the aim to categorize a bacterial hypothetical/unreviewed protein’s function into 1739 GO terms as functional classes. The proposed system being fully dedicated to bacterial organisms is a novel attempt amongst various existing machine learning based protein function prediction systems based on mixed organisms. Experimental results demonstrate the success of the proposed deep learning ensemble model based on deep neural network method with F1 measure of 0.7912 on the prepared Test dataset 1 of reviewed proteins.
Keywords:Hypothetical proteins  Function prediction  Molecular function  Deep learning  Reviewed protein  Motif  Physicochemical feature  LeakyRelu  Nadam  Deep neural network  Sequence based feature  Annotation based feature
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号