On the Common Substring Alignment Problem |
| |
Authors: | Gad M Landau Michal Ziv-Ukelson |
| |
Institution: | a Department of Computer Science, Haifa University, Haifa, 31905, Israel;Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, New York, 11201-3840, f1;IBM T. J. W. Research Center, Yorktown Heights, New York, 10598, , f2 |
| |
Abstract: | The Common Substring Alignment Problem is defined as follows: Given a set of one or more strings S1, S2 … Sc and a target string T, Y is a common substring of all strings Si, that is, Si = BiYFi. The goal is to compute the similarity of all strings Si with T, without computing the part of Y again and again. Using the classical dynamic programming tables, each appearance of Y in a source string would require the computation of all the values in a dynamic programming table of size O(nℓ) where ℓ is the size of Y. Here we describe an algorithm which is composed of an encoding stage and an alignment stage. During the first stage, a data structure is constructed which encodes the comparison of Y with T. Then, during the alignment stage, for each comparison of a source Si with T, the pre-compiled data structure is used to speed up the part of Y. We show how to reduce the O(nℓ) alignment work, for each appearance of the common substring Y in a source string, to O(n)-at the cost of O(nℓ) encoding work, which is executed only once. |
| |
Keywords: | design and analysis of algorithms dynamic programming sequence comparison repeated substrings shared substrings Monge arrays candidate lists |
本文献已被 ScienceDirect 等数据库收录! |
|