Regular expression searching on compressed text |
| |
Authors: | Gonzalo Navarro |
| |
Affiliation: | Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile |
| |
Abstract: | We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv–Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m+mn+Rmlogm) worst case time. On average this drops to O(m2+(n+Rm)logm) or O(m2+n+Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative. |
| |
Keywords: | Ziv–Lempel Bit-parallelism Compressed pattern matching Automata Regular languages |
本文献已被 ScienceDirect 等数据库收录! |
|