首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Multi-instance multi-label distance metric learning for genome-wide protein function prediction
Institution:1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China;2. School of Software Engineering, South China University of Technology, Guangzhou 510006, China;1. Department of Biochemistry and Molecular Biology, College of Medicine, Howard University, 520 W Street, NW, Washington, DC 20059, USA;1. College of Chemistry and Material Science, Hebei Normal University, Shijiazhuang, 050024, Hebei Province, PR China;2. Beijing National Laboratory for Molecular Sciences, State Key Laboratory for Rare Earth Materials Chemistry and Applications, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, PR China;3. Ninghai Doubly Advanced Material Co, Ltd., Ninghai, 315602, Zhejiang Province, PR China;4. School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, Sichuan Province, PR China;5. Department of Materials Science and Engineering, University of Delaware, Newark, DE, 19716, United States;1. Institute of Information Optics, Zhejiang Normal University, Jinhua 321004, China;2. Joint Research Laboratory of Optics of Zhejiang Normal University and Zhejiang University, Hangzhou 310058, China;3. Xingzhi College, Zhejiang Normal University, Jinhua 321004, China
Abstract:Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms.
Keywords:Protein function prediction  Genome wide  Distance metric learning  Machine learning  Multi-instance multi-label learning
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号