An ensemble learning framework for potential miRNA-disease association prediction with positive-unlabeled data |
| |
Institution: | 1. School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China;2. School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin 300350, China;3. Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA;1. School of Information Engineering, East China Jiaotong University, Nanchang, China;2. College of Computer Science and Electronic Engineering, Hunan University, Changsha, China;3. College of Information Science and Engineering, Hunan Normal University, Changsha, China;4. School of Information Science and Engineering, Shandong Normal University, Jinan, China;1. School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China;2. School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China;3. Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China;1. College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, China;2. College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, Hunan 410003, China;1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;2. Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;3. College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277100, China |
| |
Abstract: | To explore the pathogenic mechanisms of MicroRNA (miRNA) on diverse diseases, many researchers have concentrated on discovering the potential associations between miRNA and disease using machine learning methods. However, the prediction accuracy of supervised machine learning methods is limited by lacking of experimentally-validated uncorrelated miRNA-disease pairs. Without these negative samples, training a highly accurate model is much more difficult. Different from traditional miRNA-disease prediction models using randomly selected unknown samples as negative training samples, we propose an ensemble learning framework to solve this positive-unlabeled (PU) learning problem. The framework incorporates two steps, i.e., a novel semi-supervised Kmeans (SS-Kmeans) to extract reliable negative samples from unknown miRNA-disease pairs and subagging method to generate diverse training sample sets to make full use of those reliable negative samples for ensemble learning. Combined with effective random vector functional link (RVFL) network as prediction model, the proposed framework showed superior prediction accuracy comparing with other popular approaches. A case study on lung and gastric neoplasms further confirms the framework’s efficacy at identifying miRNA disease associations. |
| |
Keywords: | Semi-supervised Kmeans (SS-Kmeans) Random vector functional link (RVFL) Subagging Ensemble learning MiRNA-disease association |
本文献已被 ScienceDirect 等数据库收录! |
|