首页 | 本学科首页   官方微博 | 高级检索  
     


Locally centred Mahalanobis distance: A new distance measure with salient features towards outlier detection
Authors:Roberto Todeschini  Davide Ballabio  Viviana Consonni  Faizan Sahigara  Peter Filzmoser
Affiliation:1. Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1, 20126 Milano, Italy;2. Department of Statistics and Probability Theory, Vienna University of Technology, Vienna, Austria
Abstract:
Outlier detection is a prerequisite to identify the presence of aberrant samples in a given set of data. The identification of such diverse data samples is significant particularly for multivariate data analysis where increasing data dimensionality can easily hinder the data exploration and such outliers often go undetected. This paper is aimed to introduce a novel Mahalanobis distance measure (namely, a pseudo-distance) termed as locally centred Mahalanobis distance, derived by centering the covariance matrix at each data sample rather than at the data centroid as in the classical covariance matrix. Two parameters, called as Remoteness and Isolation degree, were derived from the resulting pairwise distance matrix and their salient features facilitated a better identification of atypical samples isolated from the rest of the data, thus reflecting their potential application towards outlier detection. The Isolation degree demonstrated to be able to detect a new kind of outliers, that is, isolated samples within the data domain, thus resulting in a useful diagnostic tool to evaluate the reliability of predictions obtained by local models (e.g. k-NN models).
Keywords:Mahalanobis distance   Outlier detection   Similarity   Isolation degree   Remoteness   Covariance matrix   Data mining
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号