Adaptive data reduction for large-scale transaction data |
| |
Authors: | Xiao-Bai Li Varghese S Jacob |
| |
Institution: | 1. College of Management, University of Massachusetts Lowell, Lowell, MA 01854, USA;2. School of Management, University of Texas at Dallas, Richardson, TX 75083, USA |
| |
Abstract: | Data reduction is an important issue in the field of data mining. The goal of data reduction techniques is to extract a subset of data from a massive dataset while maintaining the properties and characteristics of the original data in the reduced set. This allows an otherwise difficult or impossible data mining task to be carried out efficiently and effectively. This paper describes a new method for selecting a subset of data that closely represents the original data in terms of its joint and univariate distributions. A pair of distance criteria, motivated by the χ2-statistic, are used for measuring the goodness-of-fit between the distributions of the reduced and full datasets. Under these criteria, the data reduction problem can be formulated as a bi-objective quadratic program. A genetic algorithm technique is used in the search/optimization process. Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method. |
| |
Keywords: | Data mining Data reduction Genetic algorithms Distance measure |
本文献已被 ScienceDirect 等数据库收录! |
|