Regular expression constrained sequence alignment |
| |
Authors: | Abdullah N Arslan |
| |
Institution: | aDepartment of Computer Science, The University of Vermont, Burlington, VT 05405, USA |
| |
Abstract: | We introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between given strings and over all alignments such that in these alignments there exists a segment where some substring of is aligned to some substring of , and both and match a given regular expression R, i.e. where is the regular language described by R. For complexity results we assume, without loss of generality, that . A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an time algorithm for the regular expression constrained sequence alignment problem where , and t is the number of states of a nondeterministic finite automaton N that accepts . We use in our algorithm a nondeterministic weighted finite automaton M that we construct from N. M has states where the transition-weights are obtained from the given costs of edit operations, and state-weights correspond to optimum alignment scores we compute using the underlying dynamic programming solution for sequence alignment. If we are given a deterministic finite automaton D accepting with states then our construction creates a deterministic finite automaton with states. In this case, our algorithm takes time. Using results in faster computation than using M when . If we only want to compute the optimum score, the space required by our algorithm is ( if we use a given ). If we also want to compute an optimal alignment then our algorithm uses space ( space if we use a given ) where and are substrings of and , respectively, , and and are aligned together in the optimal alignment that we construct. We also show that our method generalizes for the case of the problem with affine gap penalties, and for finding optimal regular expression constrained local sequence alignments. |
| |
Keywords: | Regular expression Sequence alignment Dynamic programming Pattern matching Finite automaton |
本文献已被 ScienceDirect 等数据库收录! |
|