Professor Katy Simonson's recombination and coalescence simulator
can be used to simulate how interesting pieces of DNA (called
loci) in a sample population may have descended from a common
ancestor. The trees which describe this passing of genetic
information from one generation to the next may then be useful
for studying the ways in which mutations and other traits are
passed on.
When working from the bottom of the tree (where the known sample
population resides) to the top (where the common ancestor
resides) two events are possible: coalescence and recombination.
Coalescence occurs when one parent passes the same locus on to
multiple children.
Recombination occurs when a child receives some of its loci
from one parent and some of its loci from the other parent.
This process of selecting either recombination or coalescence at
each generation until a single ancestor is reached is driven by
a complicated Markov Chain.
Professor Simonson's program performed these basic
operations. However, it was very limited. Due to the huge amount
of memory it required large simulations were not possible. As a
result she contacted Chinh Le, Dan Noland, and Faisal Saied in RCS for
help. They performed a number of improvements to optimize the
program.
• A new data structure for representing the Markov Chain
was developed and implemented.
• A new, faster method for updating the Markov Chain was
implemented.
• The GNU Multi-precision library was integrated into
the code to represent the extremely large integers necessary for
large simulations.
• The Mersenne Twister pseudo-random number generator was
integrated into the code to replace an older, slower
pseudo-random number generator.
Collectively these improvements have allowed the program to run
simulations that are two orders of magnitude larger than was
previously possible and to run smaller test cases considerably
faster than was previously possible.
Currently the group is working to reduce the amount of memory
necessary to represent the trees.