Multi-pattern matching with bidirectional indexes |
| |
Affiliation: | 1. Department of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia;2. Department of Computer Science and Engineering, Aalto University, Espoo, Finland;3. Department of Computer Science, University of Helsinki, Helsinki, Finland |
| |
Abstract: | We study multi-pattern matching in a scenario where the pattern set is to be matched to several texts and hence indexing the pattern set is affordable. This kind of scenarios arise, for example, in metagenomics, where pattern set represents DNA of several species and the goal is to find out which species are represented in the sample and in which quantity. We develop a generic search method that exploits bidirectional indexes both for the pattern set and texts, and analyze the best and worst case running time of the method on worst case text. We show that finding the instance of the search method with minimum best case running time on worst case text is NP-hard. The positive result is that an instance with logarithm factor approximation to minimum best case running time can be found in polynomial time using a bidirectional index called affix tree. We further show that affix trees can be simulated, in reduced space, using bidirectional variant of compressed suffix trees. Lastly, we provide a practical implementation of this approach. We show that one can obtain 3-fold speed up against the basic scenario of searching each pattern independently with data sets typical in high-throughput DNA sequencing. |
| |
Keywords: | Combinatorial pattern matching Compressed data structures Computational genomics |
本文献已被 ScienceDirect 等数据库收录! |
|