首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Regular expression constrained sequence alignment
Authors:Abdullah N Arslan  
Institution:

aDepartment of Computer Science, The University of Vermont, Burlington, VT 05405, USA

Abstract:We introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between given strings S1 and S2 over all alignments such that in these alignments there exists a segment where some substring s1 of S1 is aligned to some substring s2 of S2, and both s1 and s2 match a given regular expression R, i.e. s1,s2L(R) where L(R) is the regular language described by R. For complexity results we assume, without loss of generality, that n=|S1|?|m|=|S2|. A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an O(nmr) time algorithm for the regular expression constrained sequence alignment problem where r=O(t4), and t is the number of states of a nondeterministic finite automaton N that accepts L(R). We use in our algorithm a nondeterministic weighted finite automaton M that we construct from N. M has O(t2) states where the transition-weights are obtained from the given costs of edit operations, and state-weights correspond to optimum alignment scores we compute using the underlying dynamic programming solution for sequence alignment. If we are given a deterministic finite automaton D accepting L(R) with td states then our construction creates a deterministic finite automaton Md with td2 states. In this case, our algorithm takes O(td2nm) time. Using Md results in faster computation than using M when td<t2. If we only want to compute the optimum score, the space required by our algorithm is O(t2n) (O(td2m) if we use a given Md). If we also want to compute an optimal alignment then our algorithm uses O(t2m+t2|s1||s2|) space (O(td2m+td2|s1||s2|) space if we use a given Md) where s1 and s2 are substrings of S1 and S2, respectively, s1,s2L(R), and s1 and s2 are aligned together in the optimal alignment that we construct. We also show that our method generalizes for the case of the problem with affine gap penalties, and for finding optimal regular expression constrained local sequence alignments.
Keywords:Regular expression  Sequence alignment  Dynamic programming  Pattern matching  Finite automaton
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号