GRIMM
Description | Run it | Instructions | Human/cat/mouse data | Publications
Genome Rearrangements In Man and Mouse
A tool for analyzing rearrangements in pairs of genomes,
including unichromosomal and multichromosomal genomes,
and signed and unsigned data
by Glenn Tesler
Two genomes may have many genes in common, but the genes may be
arranged in a different sequence or be moved between chromosomes.
Such differences in gene orders are the results of
rearrangement events that are common in molecular evolution.
For example, in unichromosomal genomes,
the most common rearrangement events are
reversals, in which a contiguous interval of genes is put into
the reverse order. For multichromosomal genomes, the most common
rearrangement events are reversals, translocations, fissions, and
fusions, which are described later.
The pairwise genome rearrangement problem is to find an
optimal scenario transforming one genome to another via
these rearrangement events.
We provide a C program and a web tool combining rearrangement
algorithms for unichromosomal and multichromosomal genomes,
with either signed or unsigned gene data.
In each case, it computes the minimum possible number
of rearrangement steps, and determines a possible scenario
taking this number of steps.
GRIMM implements the Hannenhalli-Pevzner algorithms for
computing unichromosomal and multichromosomal genomic distances, making extensive use
of code that was adapted from GRAPPA for the unichromosomal problem.
GRIMM also implements the Hannenhalli-Pevzner algorithm
for computing the reversal distance between
two unsigned unichromosomal genomes,
and Tesler's algorithm for computing the distance
between two unsigned multichromosomal genomes.
Representation of a genome
We consider a unichromosomal genome to be of a sequence of n genes.
The genes are represented by numbers 1, 2, ..., n.
The two orientations of gene i are represented by i and -i. A genome is represented as a signed permutation of the
numbers 1, 2, ..., n. For example, a unichromosomal
genome with n=5 genes is
A multichromosomal genome consists of n genes spread over m chromosomes. We represent it as a signed permutation of
1, 2, ..., n, with delimiters "$" or ";" inserted between
the chromosomes. For example, a genome with
12 genes spread over 3 chromosomes is
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
|
The order of the chromosomes and the direction of the
chromosomes do not matter in the multichromosomal algorithms. Thus, we could
represent this same genome by flipping the first chromosome
(reverse the order of its entries and negate them) and then
moving the last chromosome to the beginning:
11 4 10 $
-3 -8 2 -7 $
5 9 -6 -1 12 $
|
Further information regarding the syntax of genomes is available
on the help page.
Unichromosomal genomes: sorting by reversal
A reversal in a signed permutation is an operation
that takes an interval in a permutation, reverses the
order of the numbers, and changes all their signs.
For example,
5 1 3 2 -9 7 -4 6 8
5 1 -7 9 -2 -3 -4 6 8
The reversal distance between two genomes is the minimum number of
reversals it takes to get from one genome to the other.
For a given pair of genomes, the reversal distance is unique, but
there are usually many possible reversal scenarios with this distance.
However, it is possible that this mathematical notion of
reversal distance can underestimate the actual number of
steps that occurred biologically.
Multichromosomal genomes: rearrangement operations
We treat four elementary rearrangement events in multichromosomal
genomes:
reversals, translocations, fusions, and fissions.
- Reversal: An interval within a single chromosome may be reversed in the same
fashion as a reversal acts in the unichromosomal case:
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
|
 |
7 -2 8 3 $
5 9 -12 1 6 $
11 4 10 $
|
Note: When the programs are run in unichromosomal mode, the genomes
3 1 2 and -2 -1 -3
are considered different (one reversal apart, distance=1),
while in multichromosomal
mode, those same genomes are considered equivalent (distance=0)
because we have simply flipped an entire chromosome, which
gives an equivalent genome in the multichromosomal mode.
- Translocation: Two chromosomes "A B" and "C D"
may be rearranged into "A D" and "C B".
(The letters A, B, C, D stand for sequences of genes.)
Because flipping
chromosomes does not alter a genome (only its representation is altered),
"A -C" and "-B D" is another possible translocation, and is the one
actually done by our algorithm.
(-B means to reverse the order of the genes in sequence B
and negate each one.) For example, a translocation on chromosomes 1 and 3
is
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
|
 |
7 -2 8 -4 -11 $
5 9 -6 -1 12 $
-3 10 $
|
- Fusion: Two chromosomes may be fused together into a single chromosome.
Due to chromosome flippings,
there are four distinct fusions between each pair of chromosomes.
Here is one of the fusions between chromosomes 1 and 3:
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
|
 |
7 -2 8 3 -10 -4 -11 $
5 9 -6 -1 12 $
|
- Fission: A chromosome may be broken into two chromosomes between any pair of
genes:
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
|
 |
7 -2 8 3 $
5 9 $
-6 -1 12 $
11 4 10 $
|
Signed and unsigned genomes
Most comparative mapping techniques determine
the physical locations and relative order of genes in each chromosome, but
do not determine which of two orientations each gene has.
Current sequencing methods do provide the orientations.
It turns out that the genome rearrangement problem (uni- and
multichromosomal) for unsigned permutations is NP-hard, but
the same problems for signed data can be done in polynomial time.
Fortunately, with many genomes currently being sequenced,
it is likely that many comparative maps (corresponding to
unsigned permutations) will soon be replaced by sequencing
data (corresponding to signed permutations).
Existing data for which signs are not known may be entered into the
programs without specifying the signs. The programs will find
an optimal assignment of signs, if the data is not too complex,
or will approximate it otherwise. For example, to turn
the unsigned genome
1 2 3 4 5
into the unsigned genome
1 4 3 2 5
requires one unsigned reversal. The program determines
an assignment of signs in the source and destination genomes
that give a signed reversal scenario requiring this same number of
steps. Here, we get
1 2 3 4 5
1 -4 -3 -2 5
which also takes one step. Note that there may be other sign assignments
taking this minimum number of steps.
It is possible that correctly
signed data would have increased the number of steps:
1 2 3 4 5
1 -4 -3 -2 5
1 -4 3 -2 5
If the data collection method did not determine signs, it is impossible to
know mathematically whether the one step or two step scenario is
more biologically accurate; the mathematical problem
the program solves is to find the signs giving the
minimum possible distance.
Word problems and insertions/deletions
We do not currently consider "word problems" in which some genes are
repeated,
1 2 -1 3 4
nor do we allow gaps in the numbering (as may arise from insertion/deletion),
1 3 -9 -7 5
Related web pages and software
These sites offer programs to sort permutations by reversals.
At their roots is the Hannenhalli-Pevzner algorithm for sorting
signed permutations by reversals. Successive authors
improved the algorithm.
In addition to these programs, Guillaume Bourque has
developed an algorithm and implementation MGR for computing phylogenetic trees for
multiple genomes, unichromosomal or multichromosomal.
It uses GRIMM to do its distance computations.
References
Links to journals require subscriptions.
- Bader, D.A., Moret, B.M.E., and Yan, M. (2001)
A linear-time algorithm for computing inversion distances between signed
permutations with an experimental study, J. Comput. Biol. 8(5), 483-491.
Paper at journal or author.
- Caprara, A. (1999) Sorting permutations by reversals and Eulerian cycle
decompositions. SIAM J. Discrete Math. 12(1), 91-110
(electronic).
Paper at journal or author.
- Grimm, J. and Grimm, W. (1812) Puss in Boots. In Grimm's Fairy
Tales.
- Hannenhalli, S. and Pevzner, P.A. (1995) Transforming men into mice
(polynomial algorithm for genomic distance problem).
In 36th Annual
Symposium on Foundations of Computer Science (Milwaukee, WI, 1995), IEEE Comput. Soc. Press, Los Alamitos, CA, pp. 581-592.
Paper at journal or author.
- Hannenhalli, S. and Pevzner, P. (1996) To cut ... or not to cut
(applications of comparative physical maps in molecular evolution).
In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete
Algorithms (Atlanta, GA, 1996), ACM, New York, pp. 304-313.
Paper at journal or author.
- Hannenhalli, S. and Pevzner, P.A. (1999)
Transforming cabbage into turnip:
polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1-27.
Paper at journal or author.
- Kaplan, H., Shamir, R., and Tarjan, R.E. (1999)
A faster and simpler algorithm for sorting signed permutations by reversals. SIAM J. Comput. 29(3), 880--892 (electronic).
Paper at journal or author.
- Pevzner, P.A. (2000) Computational molecular biology, An algorithmic
approach, MIT Press, Cambridge, MA,
Chap. 10.
Authors
GRIMM began as a project by Glenn Tesler and Yang Yu
for Pavel Pevzner's course CSE 291, in Spring,
2001, at UCSD. The original project
was to implement the Hannenhalli-Pevzner
algorithm for computing optimal reversal
scenarios between multichromosomal signed genomes.
This was written in C, and made extensive use of GRAPPA's invdist.c.
Subsequent development, including the extensions of the
algorithm, unsigned genomes, the web
tool, and the related web pages, was done by Glenn Tesler.
This page was created by Glenn Tesler,
University of California, San Diego.