首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Relative density-based classification noise detection
Authors:Shu-yin Xia  Zhong-yang Xiong  Yun He  Kuang Li  Li-mei Dong  Man Zhang
Institution:1. College of Computer Science, Chongqing University, Chongqing 400044, China;2. Institute of Electrical Engineering and Information, Sichuan University, Chengdu 400015, China;3. Department of Electronics and Information Engineering, Chongqing Technology and Business Institute, Chongqing 400042, China
Abstract:Classification noise is a common byproduct of traditional data mining approaches, and no specialized approach for detecting classification noise is currently available. Methods for outlier detection are well-developed, but outliers and classification noise have characteristics different enough to make outlier detection algorithms unsuitable for classification noise detection. In this paper, a new, specialized approach to detect classification noise is proposed, named relative density based classification noise detection (RDBCND). Computational experiments in artificial data sets described herein show that RDBCND has time complexity of O(n log n), indicating greater efficiency than traditional approaches, which exhibit time complexity of at least O(n2). The use of classification noise detection to improve the generalization ability of common classifier algorithms is also described. In particular, a new unified approach based on RDBCND is compared to a cross validation approach applied to a BP neural network. Trials in both artificial and real-life datasets show that the RDBCND-based approach can greatly accelerate the process of identifying the best decision function. The novel method can also eliminate underfitting, as the algorithm simply searches for the highest training accuracy. The experiments also show that the RDBCND-based method has greater accuracy and lower cpu time in reaching global solutions than the cross-validation method. Since the relative density is a local concept, our new approach can be directly used in nonlinear datasets without data transformation. It is a great advantage compared to some linear classifier algorithms. As in current linear classifiers, the kernel functions or other transformations need to be used to make them suitable for non-linear datasets, and that will increase their complexity.
Keywords:Classifcation noise  Relative density  RDBCND  Generalizability  Overfitting
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号