Variable importance analysis based on rank aggregation with applications in metabolomics for biomarker discovery |
| |
Authors: | Yong-Huan Yun Bai-Chuan Deng Dong-Sheng Cao Wei-Ting Wang Yi-Zeng Liang |
| |
Affiliation: | 1. College of Chemistry and Chemical Engineering, Central South University, Changsha, 410083, PR China;2. College of Animal Science, South China Agricultural University, Guangzhou, 510642, PR China;3. College of Pharmaceutical Sciences, Central South University, Changsha, 410083, PR China |
| |
Abstract: | Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single “super”-list reflective of the overall preference or importance within the population. This “super”-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable. |
| |
Keywords: | Variable importance Variable ranking Biomarker discovery Rank aggregation Metabolomics |
本文献已被 ScienceDirect 等数据库收录! |
|