首页 | 本学科首页   官方微博 | 高级检索  
     


Novel evolutionary models and applications to sequence alignment problems
Authors:Eva K. Lee  Todd Easton  Kapil Gupta
Affiliation:(1) Center for Operations Research in Medicine, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia;(2) Winship Cancer Institute, Emory University School of Medicine, Atlanta, Georgia
Abstract:
In this paper, we present a novel graph-theoretical approach for representing a wide variety of sequence analysis problems within a single model. The model allows incorporation of the operations “insertion”, “deletion”, and “substitution”, and various parameters such as relative distances and weights. Conceptually, we refer the problem as the minimum weight common mutated sequence (MWCMS) problem. The MWCMS model has many applications including multiple sequence alignment problem, the phylogenetic analysis, the DNA sequencing problem, and sequence comparison problem, which encompass a core set of very difficult problems in computational biology. Thus the model presented in this paper lays out a mathematical modeling framework that allows one to investigate theoretical and computational issues, and to forge new advances for these distinct, but related problems. Through the introduction of supernodes, and the multi-layer supergraph, we proved that MWCMS is $${NP}$$-complete. Furthermore, it was shown that a conflict graph derived from the multi-layer supergraph has the property that a solution to the associated node-packing problem of the conflict graph corresponds to a solution of the MWCMS problem. In this case, we proved that when the number of input sequences is a constant, MWCMS is polynomial-time solvable. We also demonstrated that some well-known combinatorial problems can be viewed as special cases of the MWCMS problem. In particular, we presented theoretical results implied by the MWCMS theory for the minimum weight supersequence problem, the minimum weight superstring problem, and the longest common subsequence problem. Two integer programming formulations were presented and a simple yet elegant decomposition heuristic was introduced. The integer programming instances have proven to be computationally intensive. Consequently, research involving simultaneous column and row generation and parallel computing will be explored. The heuristic algorithm, introduced herein for multiple sequence alignment, overcomes the order-dependent drawbacks of many of the existing algorithms, and is capable of returning good sequence alignments within reasonable computational time. It is able to return the optimal alignment for multiple sequences of length less than 1500 base pairs within 30 minutes. Its algorithmic decomposition nature lends itself naturally for parallel distributed computing, and we continue to explore its flexibility and scalability in a massive parallel environment.
Keywords:Evolutionary distance problem  Multiple sequence alignment  Phylogenetic analysis  DNA sequencing  Sequence comparison  Minimum weight common mutated sequence  Supernode  Conflict graph  Node-packing polytope
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号