Recombination/Coalescence Simulator


Professor Katy Simonson's recombination and coalescence simulator can be used to simulate how interesting pieces of DNA (called loci) in a sample population may have descended from a common ancestor. The trees which describe this passing of genetic information from one generation to the next may then be useful for studying the ways in which mutations and other traits are passed on.

Recombination Tree
When working from the bottom of the tree (where the known sample population resides) to the top (where the common ancestor resides) two events are possible: coalescence and recombination.

Coalescence occurs when one parent passes the same locus on to multiple children.

Coalesence
Recombination occurs when a child receives some of its loci from one parent and some of its loci from the other parent.

Recombination
This process of selecting either recombination or coalescence at each generation until a single ancestor is reached is driven by a complicated Markov Chain.

A Simple Markov Chain
Professor Simonson's program performed these basic operations. However, it was very limited. Due to the huge amount of memory it required large simulations were not possible. As a result she contacted Chinh Le, Dan Noland, and Faisal Saied in RCS for help. They performed a number of improvements to optimize the program.

• A new data structure for representing the Markov Chain was developed and implemented.
• A new, faster method for updating the Markov Chain was implemented.
• The GNU Multi-precision library was integrated into the code to represent the extremely large integers necessary for large simulations.
• The Mersenne Twister pseudo-random number generator was integrated into the code to replace an older, slower pseudo-random number generator.

Collectively these improvements have allowed the program to run simulations that are two orders of magnitude larger than was previously possible and to run smaller test cases considerably faster than was previously possible.

Currently the group is working to reduce the amount of memory necessary to represent the trees.