HashGO: hashing gene ontology for protein function prediction |
| |
Affiliation: | 1. Department of Pharmacology, Faculty of Medicine, University of Jordan, Amman, Jordan;2. Department of Biology, University of Jordan, Amman, Jordan;3. School of Medicine, University of Adelaide, Adelaide, South Australia, Australia;4. South Australian Health and Medical Research Institute, Adelaide, South Australia, Australia;5. Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, Jordan;1. SiSaf Ltd, Innovation Centre, Northern Ireland Science Park, Queen''s Island, Belfast, BT3 9DT, UK;2. Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Via Archirafi 32, 90123, Palermo, Italy;1. Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, Dalian University, Dalian, 116622, China;2. School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China;1. School of Mathematics and Information Science, Xianyang Normal University, Wenlin Road, Xianyang, 712000, China;2. Department of Computer and Information Science, Fordham University, Lincoln Center, New York, NY, 10023, USA;1. Molecular Modeling Research Laboratory, Department of Chemistry, University College of Science, Osmania University, Hyderabad 500007, Telangana, India;2. Department of Chemistry, South Valley University, Qena 83523, Egypt;3. Department of Chemistry, Nizam College, Osmania University, Basheerbagh, Hyderabad, Telangana, India |
| |
Abstract: | Gene ontology (GO) is a standardized and controlled vocabulary of terms that describe the molecular functions, biological roles and cellular locations of proteins. GO terms and GO hierarchy are regularly updated as the accumulated biological knowledge. More than 50,000 terms are included in GO and each protein is annotated with several or dozens of these terms. Therefore, accurately predicting the association between proteins and massive GO terms is rather challenging. To accurately predict the association between massive GO terms and proteins, we proposed a method called Hashing GO for protein function prediction (HashGO in short). HashGO firstly adopts a protein-term association matrix to store available GO annotations of proteins. Then, it tailors a graph hashing method to explore the underlying structure between GO terms and to obtain a series of hash functions to compress the high-dimensional protein-term association matrix into a low-dimensional one. Next, HashGO computes the semantic similarity between proteins based on Hamming distance on that low-dimensional matrix. After that, it predicts missing annotations of a protein based on the annotations of its semantic neighbors. Experimental results on archived GO annotations of two model species (Yeast and Human) show that HashGO not only more accurately predicts functions than other related approaches, but also runs faster than them. |
| |
Keywords: | Gene ontology Protein function prediction Graph hashing Semantic similarity |
本文献已被 ScienceDirect 等数据库收录! |
|