首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Clustering Categorical Data via Ensembling Dissimilarity Matrices
Authors:Saeid Amiri  Bertrand S Clarke  Jennifer L Clarke
Institution:1. Department of Natural and Applied Sciences, University of Wisconsin-Green Bay, Green Bay, WI;2. Department of Statistics, University of Nebraska-Lincoln, Lincoln, NE
Abstract:We present a technique for clustering categorical data by generating many dissimilarity matrices and combining them. We begin by demonstrating our technique on low-dimensional categorical data and comparing it to several other techniques that have been proposed. We show through simulations and examples that our method is both more accurate and more stable. Then we give conditions under which our method should yield good results in general. Our method extends to high-dimensional categorical data of equal lengths by ensembling over many choices of explanatory variables. In this context, we compare our method with two other methods. Finally, we extend our method to high-dimensional categorical data vectors of unequal length by using alignment techniques to equalize the lengths. We give an example to show that our method continues to provide useful results, in particular, providing a comparison with phylogenetic trees. Supplementary material for this article is available online.
Keywords:Categorical data  Classification and clustering  Hamming distance  High-dimensional data  Sequence alignment  Stability
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号