Approximate string matching on Ziv–Lempel compressed text |
| |
Authors: | Juha K rkk inen, Gonzalo Navarro,Esko Ukkonen |
| |
Affiliation: | a Department of Computer Science, University of, Helsinki, Finland;b Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile |
| |
Abstract: | ![]() We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv–Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to kinsertions, deletions and substitutions. On LZ78/LZW we need O(mkn+R) time in the worst case and O(k2n+mkmin(n,(mσ)k)+R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models. |
| |
Keywords: | Ziv– Lempel compression Collage systems Edit distance Dynamic programming Compressed pattern matching |
本文献已被 ScienceDirect 等数据库收录! |
|