Regular expression constrained sequence alignment期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Regular expression constrained sequence alignment

Authors:	Abdullah N Arslan

Institution:	^aDepartment of Computer Science, The University of Vermont, Burlington, VT 05405, USA

Abstract:	We introduce regular expression constrained sequence alignment as the problem of finding the maximum alignment score between given strings $S_{1}$ and $S_{2}$ over all alignments such that in these alignments there exists a segment where some substring $s_{1}$ of $S_{1}$ is aligned to some substring $s_{2}$ of $S_{2}$ , and both $s_{1}$ and $s_{2}$ match a given regular expression R, i.e. $s_{1}, s_{2} \in L (R)$ where $L (R)$ is the regular language described by R. For complexity results we assume, without loss of generality, that $n = \| S_{1} \| ? \| m \| = \| S_{2} \|$ . A motivation for the problem is that protein sequences can be aligned in a way that known motifs guide the alignments. We present an $O (n m r)$ time algorithm for the regular expression constrained sequence alignment problem where $r = O (t^{4})$ , and t is the number of states of a nondeterministic finite automaton N that accepts $L (R)$ . We use in our algorithm a nondeterministic weighted finite automaton M that we construct from N. M has $O (t^{2})$ states where the transition-weights are obtained from the given costs of edit operations, and state-weights correspond to optimum alignment scores we compute using the underlying dynamic programming solution for sequence alignment. If we are given a deterministic finite automaton D accepting $L (R)$ with $t_{d}$ states then our construction creates a deterministic finite automaton $M_{d}$ with $t_{d}^{2}$ states. In this case, our algorithm takes $O (t_{d}^{2} n m)$ time. Using $M_{d}$ results in faster computation than using M when $t_{d} < t^{2}$ . If we only want to compute the optimum score, the space required by our algorithm is $O (t^{2} n)$ ( $O (t_{d}^{2} m)$ if we use a given $M_{d}$ ). If we also want to compute an optimal alignment then our algorithm uses $O (t^{2} m + t^{2} \| s_{1} \| \| s_{2} \|)$ space ( $O (t_{d}^{2} m + t_{d}^{2} \| s_{1} \| \| s_{2} \|)$ space if we use a given $M_{d}$ ) where $s_{1}$ and $s_{2}$ are substrings of $S_{1}$ and $S_{2}$ , respectively, $s_{1}, s_{2} \in L (R)$ , and $s_{1}$ and $s_{2}$ are aligned together in the optimal alignment that we construct. We also show that our method generalizes for the case of the problem with affine gap penalties, and for finding optimal regular expression constrained local sequence alignments.

Keywords:	Regular expression Sequence alignment Dynamic programming Pattern matching Finite automaton
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏