A framework of irregularity enlightenment for data pre-processing in data mining |
| |
Authors: | Siu-Tong Au Rong Duan Siamak G Hesar Wei Jiang |
| |
Institution: | (1) Bell Labs, Alcatel-Lucent, 600-700 Mountain Avenue, Murray Hill, NJ, 07974, U.S.A.;(2) Division of Mathematics and Sciences, Roane State Community College, 276 Patton Lane, Harriman, TN, 37748, U.S.A.;(3) School of Electrical and Computer Engineering, Georgia Institute of Technology, 777 Atlantic Drive, Atlanta, GA, 30332, U.S.A. |
| |
Abstract: | Irregularities are widespread in large databases and often lead to erroneous conclusions with respect to data mining and statistical
analysis. For example, considerable bias is often resulted from many parameter estimation procedures without properly handling
significant irregularities. Most data cleaning tools assume one known type of irregularity. This paper proposes a generic
Irregularity Enlightenment (IE) framework for dealing with the situation when multiple irregularities are hidden in large
volumes of data in general and cross sectional time series in particular. It develops an automatic data mining platform to
capture key irregularities and classify them based on their importance in a database. By decomposing time series data into
basic components, we propose to optimize a penalized least square loss function to aid the selection of key irregularities
in consecutive steps and cluster time series into different groups until an acceptable level of variation reduction is achieved.
Finally visualization tools are developed to help analysts interpret and understand the nature of data better and faster before
further data modeling and analysis. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|