Adaptive data reduction for large-scale transaction data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Adaptive data reduction for large-scale transaction data

Authors:	Xiao-Bai Li Varghese S Jacob

Institution:	1. College of Management, University of Massachusetts Lowell, Lowell, MA 01854, USA;2. School of Management, University of Texas at Dallas, Richardson, TX 75083, USA

Abstract:	Data reduction is an important issue in the field of data mining. The goal of data reduction techniques is to extract a subset of data from a massive dataset while maintaining the properties and characteristics of the original data in the reduced set. This allows an otherwise difficult or impossible data mining task to be carried out efficiently and effectively. This paper describes a new method for selecting a subset of data that closely represents the original data in terms of its joint and univariate distributions. A pair of distance criteria, motivated by the χ²-statistic, are used for measuring the goodness-of-fit between the distributions of the reduced and full datasets. Under these criteria, the data reduction problem can be formulated as a bi-objective quadratic program. A genetic algorithm technique is used in the search/optimization process. Experiments conducted on several real-world data sets demonstrate the effectiveness of the proposed method.

Keywords:	Data mining Data reduction Genetic algorithms Distance measure
本文献已被 ScienceDirect 等数据库收录！