predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance
Affiliation:
1. Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia;2. Department of Computer Science & Engineering, Pabna University of Science and Technology, Pabna, Bangladesh;3. Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh;4. Department of Computer Science & Engineering, Rajshahi University, Rajshahi, Bangladesh;1. Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany;2. Institute for Virology and Immunobiology, University of Würzburg, Versbacher Straße 7, 97078 Würzburg, Germany;3. Institute of Medical Microbiology, Virology and Hygiene, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20251 Hamburg, Germany;4. Institute of Virology, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17, 30559 Hannover, Germany;1. College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China;2. Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China;3. School of Mathematics and Statistics, Central South University, Changsha, 410083, China;4. Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA;5. School of Mathematics and Statistics, Shandong University, Weihai, 264209, China;6. School of Life Sciences, University of Science and Technology of China, Hefei, 230027, China;1. College of Science, Shenyang Aerospace University, 110136, People''s Republic of China;2. College of Information and Communication Engineering, Dalian Minzu University, 116600, People''s Republic of China
Abstract:
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.