Abstract
| - Phylogenetic footprints are short pieces of noncoding DNA sequence in the vicinity of a gene that areconserved between evolutionary distant species. A seemingly simple problem is to sort footprints in theirorder along the genomes. It is complicated by the fact that not all footprints are collinear: they may crosseach other. The problem thus becomes the identification of the crossing footprints, the sorting of the remainingcollinear cliques, and finally the insertion of the noncollinear ones at “reasonable” positions. We show thatsolving the footprint sorting problem requires the solution of the “Minimum Weight Vertex Feedback SetProblem”, which is known to be NP-complete and APX-hard. Nevertheless good approximations can beobtained for data sets of interest. The remaining steps of the sorting process are straightforward: computationof the transitive closure of an acyclic graph, linear extension of the resulting partial order, and finally sortingw.r.t. the linear extension. Alternatively, the footprint sorting problem can be rephrased as a combinatorialoptimization problem for which approximate solutions can be obtained by means of general purpose heuristics.Footprint sortings obtained with different methods can be compared using a version of multiple sequencealignment that allows the identification of unambiguously ordered sublists. As an application we show thatthe rat has a slighly increased insertion/deletion rate in comparison to the mouse genome.
|