The Complexity of Gene Placement |
| |
Authors: | Leslie Ann Goldberg, Paul W. Goldberg, Mike Paterson, Pavel Pevzner, Sü leyman Cenk Sahinalp,Elizabeth Sweedyk |
| |
Affiliation: | a Department of Computer Science, University of Warwick, Coventry, CV4 7AL, United Kingdom;b Department of Computer Science, University of California, San Diego, San Diego, California, 92093;c Department of Electrical Engineering and Computer Science, Center for Computational Genomics, Case Western Reserve University, Cleveland, Ohio, 44106;d Department of Computer Science, Harvey Mudd College, Claremont, California, 91711 |
| |
Abstract: | We focus on algorithmic problems related to deriving gene locations on DNA sequences of closely related species by using comparative mapping data. Conventional genetic mapping generates intervals on the DNA sequence of given species for potential gene positions. The simultaneous analysis of gene intervals in related species, e.g., human and mouse, may eliminate some of the ambiguities and lead to better estimates of gene locations. We address the problem of eliminating the ambiguities in gene orders by means of minimizing the number of conserved regions among the species. This is equivalent to the problem of choosing gene coordinates (gene placement) that satisfy the genetic mapping constraints and minimize the breakpoint distance between genomes. We first show that the gene ordering problem is hard: there is no polynomial-time approximation scheme unless P = NP, even under the restrictions that: (1) the order of genes in one of species is known, or (2) at most two intervals overlap at any location on the map of any of the species. Then we provide two polynomial-time algorithms under restriction (1) above; the first approximates the problem within a factor of 3, and the second exactly solves the problem under the additional restriction that (3) no more than O((log n)/(log log n)) intervals overlap at a location on the map of any of the species. We also prove the tractability of the general problem when there is a single conserved region (i.e., when there exists a gene placement resulting in identical gene orders). |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|