首页 | 本学科首页   官方微博 | 高级检索  
     


Variable importance analysis based on rank aggregation with applications in metabolomics for biomarker discovery
Authors:Yong-Huan Yun  Bai-Chuan Deng  Dong-Sheng Cao  Wei-Ting Wang  Yi-Zeng Liang
Affiliation:1. College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, PR China;2. College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China;3. College of Pharmaceutical Sciences, Central South University, Changsha, 410083, PR China
Abstract:Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single “super”-list reflective of the overall preference or importance within the population. This “super”-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable.
Keywords:Variable importance   Variable ranking   Biomarker discovery   Rank aggregation   Metabolomics
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号