Efficient generation of super condensed neighborhoods |
| |
Authors: | Luís MS Russo Arlindo L Oliveira |
| |
Institution: | aINESC-ID/IST, R. Alves Redol 9, 1000 Lisboa, Portugal |
| |
Abstract: | Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. Condensed neighborhoods, however, are not a minimal representation of a pattern neighborhood. Super condensed neighborhoods, proposed in this work, are smaller, provably minimal and can be used to locate approximate matches that can later be extended by on-line search. We present an algorithm for generating Super Condensed Neighborhoods. The algorithm can be implemented either by using dynamic programming or non-deterministic automata. The time complexity is O(ms) for the first case and O(kms) for the second, where m is the pattern size, s is the size of the super condensed neighborhood and k the number of errors. Previous algorithms depended on the size of the condensed neighborhood instead. These algorithms can be implemented using Bit-Parallelism and Increased Bit-Parallelism techniques. Our experimental results show that the resulting algorithms are fast and achieve significant speedups, when compared with the existing proposals that use condensed neighborhoods. |
| |
Keywords: | Suffix trees Edit distance Approximate string matching |
本文献已被 ScienceDirect 等数据库收录! |
|