Introduction

Genome-wide association studies are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed gdsfmt and SNPRelate (R packages for multi-core symmetric multiprocessing computer architectures) to accelerate two key computations on SNP data: principal component analysis (PCA) and relatedness analysis using identity-by-descent measures. The kernels of our algorithms are written in C/C++ and highly optimized. Benchmarks show the uniprocessor implementations of PCA and identity-by-descent are ∼8-50 times faster than the implementations provided in the popular EIGENSTRAT (v3.0) and PLINK (v1.07) programs, respectively, and can be sped up to 30-300-fold by using eight cores. SNPRelate can analyse tens of thousands of samples with millions of SNPs. For example, our package was used to perform PCA on 55 324 subjects from the 'Gene-Environment Association Studies' consortium studies.

Publications

  1. A high-performance computing toolset for relatedness and principal component analysis of SNP data.
    Cite this
    Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS, 2012-12-01 - Bioinformatics (Oxford, England)

Credits

  1. Xiuwen Zheng
    Developer

    Department of Biostatistics, University of Washington, United States of America

  2. David Levine
    Developer

  3. Jess Shen
    Developer

  4. Stephanie M Gogarten
    Developer

  5. Cathy Laurie
    Developer

  6. Bruce S Weir
    Investigator

Community Ratings

UsabilityEfficiencyReliabilityRated By
0 user
Sign in to rate
Summary
AccessionBT005830
Tool TypeApplication
Category
PlatformsLinux/Unix
TechnologiesC, C++, R
User InterfaceTerminal Command Line
Download Count0
Submitted ByBruce S Weir