An efficient random forests algorithm for high dimensional data classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

An efficient random forests algorithm for high dimensional data classification

Authors:	Qiang Wang Thanh-Tung Nguyen Joshua Z Huang Thuy Thi Nguyen

Institution:	1.College of Computer Science and Software Engineering,Shenzhen University,Shenzhen,China;2.Faculty of Computer Science and Engineering,Thuyloi University,Hanoi,Vietnam;3.Faculty of Information Technology,Vietnam National University of Agriculture,Hanoi,Vietnam;4.Sorbonne Université, IRD, JEAI WARM,Unité de Modélisation Mathématiques et Informatique des Systèmes Complexes,Bondy,France

Abstract:	In this paper, we propose a new random forest (RF) algorithm to deal with high dimensional data for classification using subspace feature sampling method and feature value searching. The new subspace sampling method maintains the diversity and randomness of the forest and enables one to generate trees with a lower prediction error. A greedy technique is used to handle cardinal categorical features for efficient node splitting when building decision trees in the forest. This allows trees to handle very high cardinality meanwhile reducing computational time in building the RF model. Extensive experiments on high dimensional real data sets including standard machine learning data sets and image data sets have been conducted. The results demonstrated that the proposed approach for learning RFs significantly reduced prediction errors and outperformed most existing RFs when dealing with high-dimensional data.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏