首页 | 本学科首页   官方微博 | 高级检索  
     


Approximate string matching on Ziv–Lempel compressed text
Authors:Juha K  rkk  inen, Gonzalo Navarro,Esko Ukkonen
Affiliation:a Department of Computer Science, University of, Helsinki, Finland;b Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile
Abstract:
We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv–Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to kinsertions, deletions and substitutions. On LZ78/LZW we need O(mkn+R) time in the worst case and O(k2n+mkmin(n,(mσ)k)+R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.
Keywords:Ziv–  Lempel compression   Collage systems   Edit distance   Dynamic programming   Compressed pattern matching
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号