On the parallel complexity of hierarchical clustering and CC‐complete problems |
| |
Authors: | Raymond Greenlaw Sanpawat Kantabutra |
| |
Affiliation: | 1. Department of Computer Science, Armstrong Atlantic State University, Savannah, Georgia 31419‐1997;2. The Theory of Computation Group, Computer Science Department, Chiang Mai University, Chiang Mai 50200, Thailand |
| |
Abstract: | Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well‐studied clustering method involving both top‐down and bottom‐up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top‐down and bottom‐up hierarchical clustering. The top‐down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)‐time, n2‐processor Crew Pram algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom‐up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC‐complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC‐complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top‐down sequential approach, and the result surprisingly shows that the parallel complexities of the top‐down and bottom‐up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC‐complete problems. © 2008 Wiley Periodicals, Inc. Complexity, 2008 |
| |
Keywords: | CC‐completeness data mining hierarchical clustering parallel depth |
|
|