首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Approximation algorithms for Hamming clustering problems
Authors:Leszek G sieniec  Jesper Jansson  Andrzej Lingas
Institution:a Department of Computer Science, University of Liverpool, Peach Street, Liverpool, L69 7ZF, UK;b Department of Computer Science, Lund University, Box 118, 221 00, Lund, Sweden
Abstract:We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by . The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.We provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k is constant. We also observe that HDC admits straightforward polynomial-time solutions when k=O(logn) and p=O(1), or when p=2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any >0 it is NP-hard to split S into at most pk1/7− clusters whose Hamming diameter does not exceed the p-diameter, and that solving HDC exactly is an NP-complete problem already for p=3. Furthermore, we note that by adapting Gonzalez' farthest-point clustering algorithm T. Gonzalez, Theoret. Comput. Sci. 38 (1985) 293–306], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(p/)kO(p/)n2-time (1+)-approximation algorithm for HRC. In particular, it runs in polynomial time when p=O(1) and =O(log(k+n)). Finally, we show how to find in

time a set L of O(plogk) strings of length n such that for each string in S there is at least one string in L within distance (1+), for any constant 0<<1.
Keywords:Hamming distance  p-clustering problem  NP-hardness  Approximation algorithms  Integer programming
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号