首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Adaptive data reduction for large-scale transaction data
Authors:Xiao-Bai Li  Varghese S Jacob
Institution:1. College of Management, University of Massachusetts Lowell, Lowell, MA 01854, USA;2. School of Management, University of Texas at Dallas, Richardson, TX 75083, USA
Abstract:Data reduction is an important issue in the field of data mining. The goal of data reduction techniques is to extract a subset of data from a massive dataset while maintaining the properties and characteristics of the original data in the reduced set. This allows an otherwise difficult or impossible data mining task to be carried out efficiently and effectively. This paper describes a new method for selecting a subset of data that closely represents the original data in terms of its joint and univariate distributions. A pair of distance criteria, motivated by the χ2-statistic, are used for measuring the goodness-of-fit between the distributions of the reduced and full datasets. Under these criteria, the data reduction problem can be formulated as a bi-objective quadratic program. A genetic algorithm technique is used in the search/optimization process. Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.
Keywords:Data mining  Data reduction  Genetic algorithms  Distance measure
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号