Introduction

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.

Publications

  1. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.
    Cite this
    Kelleher J, Etheridge AM, McVean G, 2016-05-01 - PLoS Computational Biology

Credits

  1. Jerome Kelleher
    Developer

    Wellcome Trust Centre for Human Genetics, University of Oxford, United Kingdom of Great Britain and Northern Ireland

  2. Alison M Etheridge
    Developer

    Department of Statistics, University of Oxford, United Kingdom of Great Britain and Northern Ireland

  3. Gilean McVean
    Investigator

    Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom of Great Britain and Northern Ireland

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT002422
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC
User InterfaceTerminal Command Line
Download Count0
Country/RegionUnited Kingdom of Great Britain and Northern Ireland
Submitted ByGilean McVean