Relative density-based classification noise detection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Relative density-based classification noise detection

Authors:	Shu-yin Xia Zhong-yang Xiong Yun He Kuang Li Li-mei Dong Man Zhang

Affiliation:	1. College of Computer Science, Chongqing University, Chongqing 400044, China;2. Institute of Electrical Engineering and Information, Sichuan University, Chengdu 400015, China;3. Department of Electronics and Information Engineering, Chongqing Technology and Business Institute, Chongqing 400042, China

Abstract:	Classification noise is a common byproduct of traditional data mining approaches, and no specialized approach for detecting classification noise is currently available. Methods for outlier detection are well-developed, but outliers and classification noise have characteristics different enough to make outlier detection algorithms unsuitable for classification noise detection. In this paper, a new, specialized approach to detect classification noise is proposed, named relative density based classification noise detection (RDBCND). Computational experiments in artificial data sets described herein show that RDBCND has time complexity of O(n log n), indicating greater efficiency than traditional approaches, which exhibit time complexity of at least O(n²). The use of classification noise detection to improve the generalization ability of common classifier algorithms is also described. In particular, a new unified approach based on RDBCND is compared to a cross validation approach applied to a BP neural network. Trials in both artificial and real-life datasets show that the RDBCND-based approach can greatly accelerate the process of identifying the best decision function. The novel method can also eliminate underfitting, as the algorithm simply searches for the highest training accuracy. The experiments also show that the RDBCND-based method has greater accuracy and lower cpu time in reaching global solutions than the cross-validation method. Since the relative density is a local concept, our new approach can be directly used in nonlinear datasets without data transformation. It is a great advantage compared to some linear classifier algorithms. As in current linear classifiers, the kernel functions or other transformations need to be used to make them suitable for non-linear datasets, and that will increase their complexity.

Keywords:	Classifcation noise Relative density RDBCND Generalizability Overfitting
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏