Introduction

Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD)-based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed.Our method is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low- and high-coverage samples.The source code is available at https://github.com/illumina/marvinrarthur@illumina.comSupplementary data are available at Bioinformatics online.

Publications

  1. Rapid genotype refinement for whole-genome sequencing data using multi-variate normal distributions.
    Cite this
    Arthur R, O'Connell J, Schulz-Trieglaff O, Cox AJ, 2016-08-01 - Bioinformatics (Oxford, England)

Credits

  1. Rudy Arthur
    Developer

    Illumina Cambridge Ltd, Chesterford Research Park

  2. Jared O'Connell
    Developer

    Illumina Cambridge Ltd, Chesterford Research Park

  3. Ole Schulz-Trieglaff
    Developer

    Illumina Cambridge Ltd, Chesterford Research Park

  4. Anthony J Cox
    Investigator

    Illumina Cambridge Ltd, Chesterford Research Park

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT000255
Tool TypeApplication
Category
PlatformsLinux/Unix
Technologies
User InterfaceTerminal Command Line
Download Count0
Submitted ByAnthony J Cox