GrandPrix: scaling up the Bayesian GPLVM for single-cell data.

Sumon Ahmed, Magnus Rattray, Alexis Boukouvalas
Author Information
  1. Sumon Ahmed: Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
  2. Magnus Rattray: Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.
  3. Alexis Boukouvalas: Division of Informatics, Imaging and Data Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK.

Abstract

Motivation: The Gaussian Process Latent Variable Model (GPLVM) is a popular approach for dimensionality reduction of single-cell data and has been used for pseudotime estimation with capture time information. However, current implementations are computationally intensive and will not scale up to modern droplet-based single-cell datasets which routinely profile many tens of thousands of cells.
Results: We provide an efficient implementation which allows scaling up this approach to modern single-cell datasets. We also generalize the application of pseudotime inference to cases where there are other sources of variation such as branching dynamics. We apply our method on microarray, nCounter, RNA-seq, qPCR and droplet-based datasets from different organisms. The model converges an order of magnitude faster compared to existing methods whilst achieving similar levels of estimation accuracy. Further, we demonstrate the flexibility of our approach by extending the model to higher-dimensional latent spaces that can be used to simultaneously infer pseudotime and other structure such as branching. Thus, the model has the capability of producing meaningful biological insights about cell ordering as well as cell fate regulation.
Availability and implementation: Software available at github.com/ManchesterBioinference/GrandPrix.
Supplementary information: Supplementary data are available at Bioinformatics online.

References

  1. Bioinformatics. 2015 Sep 15;31(18):2989-98 [PMID: 26002886]
  2. Bioinformatics. 2016 Oct 1;32(19):2973-80 [PMID: 27318198]
  3. Nat Biotechnol. 2015 Feb;33(2):155-60 [PMID: 25599176]
  4. Nat Methods. 2017 Oct;14(10):979-982 [PMID: 28825705]
  5. Nat Immunol. 2014 Dec;15(12):1181-9 [PMID: 25306126]
  6. Cell Stem Cell. 2015 Sep 3;17(3):360-72 [PMID: 26299571]
  7. Dev Cell. 2010 Apr 20;18(4):675-85 [PMID: 20412781]
  8. Nat Methods. 2016 Oct;13(10):845-8 [PMID: 27571553]
  9. Bioinformatics. 2012 Sep 15;28(18):i626-i632 [PMID: 22962491]
  10. Nat Biotechnol. 2014 Apr;32(4):381-386 [PMID: 24658644]
  11. Nat Commun. 2017 Jan 16;8:14049 [PMID: 28091601]
  12. Sci Immunol. 2017 Mar 3;2(9): [PMID: 28345074]
  13. Cell. 2015 May 21;161(5):1187-1201 [PMID: 26000487]
  14. Proc Natl Acad Sci U S A. 2014 Dec 30;111(52):E5643-50 [PMID: 25512504]
  15. Nature. 2014 Jun 19;510(7505):363-9 [PMID: 24919153]
  16. Cell. 2014 Apr 24;157(3):714-25 [PMID: 24766814]
  17. PLoS Comput Biol. 2014 Jul 17;10(7):e1003696 [PMID: 25032992]
  18. PLoS Comput Biol. 2016 Nov 21;12(11):e1005212 [PMID: 27870852]
  19. Nucleic Acids Res. 2016 Jul 27;44(13):e117 [PMID: 27179027]
  20. Plant Cell. 2012 Sep;24(9):3530-57 [PMID: 23023172]

Grants

  1. /Wellcome Trust
  2. MR/M008908/1/Medical Research Council
  3. 204832/B/16/Z/Wellcome Trust

MeSH Term

Bayes Theorem
Models, Statistical
Normal Distribution
Single-Cell Analysis
Software

Word Cloud

Created with Highcharts 10.0.0single-cellapproachdatapseudotimedatasetsmodelGPLVMusedestimationmoderndroplet-basedscalingbranchingcellavailableMotivation:GaussianProcessLatentVariableModelpopulardimensionalityreductioncapturetimeinformationHowevercurrentimplementationscomputationallyintensivewillscaleroutinelyprofilemanytensthousandscellsResults:provideefficientimplementationallowsalsogeneralizeapplicationinferencecasessourcesvariationdynamicsapplymethodmicroarraynCounterRNA-seqqPCRdifferentorganismsconvergesordermagnitudefastercomparedexistingmethodswhilstachievingsimilarlevelsaccuracydemonstrateflexibilityextendinghigher-dimensionallatentspacescansimultaneouslyinferstructureThuscapabilityproducingmeaningfulbiologicalinsightsorderingwellfateregulationAvailabilityimplementation:Softwaregithubcom/ManchesterBioinference/GrandPrixSupplementaryinformation:SupplementaryBioinformaticsonlineGrandPrix:Bayesian

Similar Articles

Cited By