Cytomulate: accurate and efficient simulation of CyTOF data.

Yuqiu Yang, Kaiwen Wang, Zeyu Lu, Tao Wang, Xinlei Wang
Author Information
  1. Yuqiu Yang: Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA.
  2. Kaiwen Wang: Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA.
  3. Zeyu Lu: Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA.
  4. Tao Wang: Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA. tao.wang@utsouthwestern.edu.
  5. Xinlei Wang: Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA. xinlei.wang@uta.edu. ORCID

Abstract

Recently, many analysis tools have been devised to offer insights into data generated via cytometry by time-of-flight (CyTOF). However, objective evaluations of these methods remain absent as most evaluations are conducted against real data where the ground truth is generally unknown. In this paper, we develop Cytomulate, a reproducible and accurate simulation algorithm of CyTOF data, which could serve as a foundation for future method development and evaluation. We demonstrate that Cytomulate can capture various characteristics of CyTOF data and is superior in learning overall data distributions than single-cell RNA-seq-oriented methods such as scDesign2, Splatter, and generative models like LAMBDA.

Keywords

References

  1. BMC Bioinformatics. 2009 Apr 09;10:106 [PMID: 19358741]
  2. Genome Biol. 2017 Sep 12;18(1):174 [PMID: 28899397]
  3. J Immunol. 2015 Aug 1;195(3):773-9 [PMID: 26188071]
  4. Anal Chem. 2009 Aug 15;81(16):6813-22 [PMID: 19601617]
  5. Cytometry A. 2020 Mar;97(3):268-278 [PMID: 31633883]
  6. Nat Biotechnol. 2018 Dec 03;: [PMID: 30531897]
  7. Genome Biol. 2016 Apr 27;17:75 [PMID: 27122128]
  8. Nat Biotechnol. 2014 Apr;32(4):381-386 [PMID: 24658644]
  9. Nat Commun. 2018 May 21;9(1):2002 [PMID: 29784946]
  10. PLoS Comput Biol. 2015 Jun 24;11(6):e1004333 [PMID: 26107944]
  11. BMC Genomics. 2018 Jun 19;19(1):477 [PMID: 29914354]
  12. BMC Bioinformatics. 2021 Mar 22;22(1):138 [PMID: 33752602]
  13. Genome Biol. 2023 Nov 16;24(1):262 [PMID: 37974276]
  14. Nat Commun. 2021 Nov 25;12(1):6911 [PMID: 34824223]
  15. BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):393 [PMID: 32938365]
  16. Nat Methods. 2021 Jan;18(1):92-99 [PMID: 33408405]
  17. Nat Biotechnol. 2019 May;37(5):547-554 [PMID: 30936559]
  18. Cell. 2016 May 5;165(4):780-91 [PMID: 27153492]
  19. Nat Methods. 2016 Jun;13(6):493-6 [PMID: 27183440]
  20. Bioinformatics. 2017 Aug 15;33(16):2539-2546 [PMID: 28419223]
  21. Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Dec;70(6 Pt 2):066111 [PMID: 15697438]
  22. Angew Chem Int Ed Engl. 2007;46(32):6111-4 [PMID: 17533637]
  23. Nat Mach Intell. 2021 Oct;3(10):864-875 [PMID: 36003885]
  24. Genome Biol. 2019 Dec 23;20(1):297 [PMID: 31870419]
  25. Nat Methods. 2019 Dec;16(12):1289-1296 [PMID: 31740819]
  26. Cytometry A. 2016 Dec;89(12):1084-1096 [PMID: 27992111]
  27. Cell. 2021 Jun 24;184(13):3573-3587.e29 [PMID: 34062119]
  28. Cytometry A. 2011 Jan;79(1):6-13 [PMID: 21182178]
  29. Bioinformatics. 2020 Dec 30;36(Suppl_2):i875-i883 [PMID: 33381813]
  30. Bioinformatics. 2021 Nov 18;37(22):4164-4171 [PMID: 34037686]
  31. IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):7327-7347 [PMID: 34591756]
  32. Bioinformatics. 2019 Oct 15;35(20):4063-4071 [PMID: 30874801]
  33. J Immunol Methods. 2018 Feb;453:37-43 [PMID: 29174717]
  34. Nucleic Acids Res. 2013 Jan;41(Database issue):D991-5 [PMID: 23193258]
  35. Cell. 2015 Jul 2;162(1):184-97 [PMID: 26095251]
  36. Genome Biol. 2021 May 25;22(1):163 [PMID: 34034771]
  37. Bioinformatics. 2010 Jan 1;26(1):139-40 [PMID: 19910308]
  38. PLoS One. 2017 Jun 29;12(6):e0179385 [PMID: 28662063]
  39. Elife. 2020 Sep 07;9: [PMID: 32894218]
  40. Cytometry A. 2013 May;83(5):483-94 [PMID: 23512433]
  41. Nat Commun. 2018 Jan 18;9(1):284 [PMID: 29348443]
  42. Nat Commun. 2023 Apr 1;14(1):1836 [PMID: 37005472]
  43. Cytometry A. 2015 Jul;87(7):636-45 [PMID: 25573116]
  44. Nat Biotechnol. 2008 Mar;26(3):303-4 [PMID: 18327243]
  45. Bioinformatics. 2017 Jun 01;33(11):1689-1695 [PMID: 28158442]
  46. Genome Biol. 2023 Mar 29;24(1):62 [PMID: 36991470]
  47. BMC Bioinformatics. 2021 Mar 22;22(1):137 [PMID: 33752595]
  48. iScience. 2022 Jan 12;25(2):103764 [PMID: 35128358]
  49. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34850807]
  50. Cytometry A. 2012 Sep;81(9):727-31 [PMID: 22887982]
  51. Genes (Basel). 2019 Jul 12;10(7): [PMID: 31336988]
  52. Cancer Inform. 2014 Dec 10;13(Suppl 4):79-89 [PMID: 25574129]
  53. NAR Genom Bioinform. 2020 Sep;2(3):lqaa078 [PMID: 33015620]

Grants

  1. R01 CA258584/NCI NIH HHS
  2. R15 GM131390/NIGMS NIH HHS
  3. U01 AI156189/NIAID NIH HHS

MeSH Term

Computer Simulation
Algorithms
Single-Cell Analysis
Flow Cytometry

Word Cloud

Created with Highcharts 10.0.0dataCyTOFevaluationsmethodsCytomulateaccuratesimulationRecentlymanyanalysistoolsdevisedofferinsightsgeneratedviacytometrytime-of-flightHoweverobjectiveremainabsentconductedrealgroundtruthgenerallyunknownpaperdevelopreproduciblealgorithmservefoundationfuturemethoddevelopmentevaluationdemonstratecancapturevariouscharacteristicssuperiorlearningoveralldistributionssingle-cellRNA-seq-orientedscDesign2SplattergenerativemodelslikeLAMBDACytomulate:efficientProteomicsSimulation

Similar Articles

Cited By