A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa.

Huanan Zhang, Catherine A A Lee, Zhuliu Li, John R Garbe, Cindy R Eide, Raphael Petegrosso, Rui Kuang, Jakub Tolar
Author Information
  1. Huanan Zhang: Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America. ORCID
  2. Catherine A A Lee: Department of Genetics, Cell Biology and Development, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America.
  3. Zhuliu Li: Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America.
  4. John R Garbe: Minnesota Supercomputing Institute, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America. ORCID
  5. Cindy R Eide: Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America. ORCID
  6. Raphael Petegrosso: Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America.
  7. Rui Kuang: Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America. ORCID
  8. Jakub Tolar: Department of Pediatrics, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America.

Abstract

Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.

References

  1. Matrix Biol. 2015 Sep;47:13-33 [PMID: 25963142]
  2. Mol Ther. 2015 Jun;23 (6):987-992 [PMID: 25803200]
  3. BMC Bioinformatics. 2016 Sep 13;17 (1):363 [PMID: 27620863]
  4. Nat Rev Genet. 2015 Mar;16(3):133-45 [PMID: 25628217]
  5. Exp Dermatol. 2013 Jun;22(6):433-5 [PMID: 23711070]
  6. Cancer Lett. 2016 Nov 28;382(2):203-214 [PMID: 27609069]
  7. N Engl J Med. 2010 Aug 12;363(7):629-39 [PMID: 20818854]
  8. Am J Physiol Renal Physiol. 2017 Jun 1;312(6):F1141-F1157 [PMID: 28100499]
  9. Clin Exp Dermatol. 2002 Nov;27(8):707-10 [PMID: 12472552]
  10. Genome Biol. 2016 Apr 07;17 :63 [PMID: 27052890]
  11. Nat Biotechnol. 2014 Apr;32(4):381-386 [PMID: 24658644]
  12. J Cell Biol. 1987 Mar;104(3):611-21 [PMID: 3818794]
  13. Nucleic Acids Res. 2017 Jan 4;45(D1):D723-D729 [PMID: 27899570]
  14. Genome Res. 2011 Jul;21(7):1160-7 [PMID: 21543516]
  15. Nature. 2014 Jun 19;510(7505):363-9 [PMID: 24919153]
  16. Biology (Basel). 2012 Nov 16;1(3):658-67 [PMID: 24832513]
  17. Dermatol Clin. 2010 Jan;28(1):107-14 [PMID: 19945622]
  18. Proc Natl Acad Sci U S A. 2011 Apr 19;108(16):6609-14 [PMID: 21464317]
  19. Cell. 2015 May 21;161(5):1202-1214 [PMID: 26000488]
  20. Nat Struct Mol Biol. 2013 Sep;20(9):1131-9 [PMID: 23934149]
  21. Bioinformatics. 2015 Jun 15;31(12):1974-80 [PMID: 25805722]
  22. J Clin Invest. 1992 Sep;90(3):1032-6 [PMID: 1355776]
  23. PLoS Genet. 2017 Mar 23;13(3):e1006599 [PMID: 28333934]
  24. Nature. 2014 May 15;509(7500):371-5 [PMID: 24739965]
  25. Nat Commun. 2017 Jan 16;8:14049 [PMID: 28091601]
  26. Cell. 2015 May 21;161(5):1187-1201 [PMID: 26000487]
  27. Genome Biol. 2015 Nov 02;16:241 [PMID: 26527291]
  28. Trends Immunol. 2017 Feb;38(2):140-149 [PMID: 28094102]
  29. Proc Natl Acad Sci U S A. 2016 Jul 19;113(29):8242-7 [PMID: 27364009]
  30. J Mol Med (Berl). 2015 Oct;93(10 ):1085-1093 [PMID: 26141517]
  31. Science. 2015 Mar 6;347(6226):1138-42 [PMID: 25700174]
  32. Cell Stem Cell. 2015 Oct 1;17(4):471-85 [PMID: 26431182]
  33. Nature. 2013 Jun 13;498(7453):236-40 [PMID: 23685454]
  34. Am J Respir Crit Care Med. 2016 May 15;193(10 ):1151-60 [PMID: 26669357]
  35. Science. 2012 Apr 13;336(6078):183-7 [PMID: 22499939]
  36. J Immunol. 2015 Feb 15;194(4):1996-2003 [PMID: 25601922]
  37. Genome Biol. 2013;14(10):R118 [PMID: 24156252]
  38. Dermatol Clin. 2010 Jan;28(1):171-8 [PMID: 19945632]
  39. Nat Biotechnol. 2015 May;33(5):495-502 [PMID: 25867923]
  40. Nat Methods. 2017 May;14 (5):483-486 [PMID: 28346451]

Grants

  1. R01 AR063070/NIAMS NIH HHS
  2. T32 GM113846/NIGMS NIH HHS
  3. T32GM113846/NIH HHS

MeSH Term

Algorithms
Animals
Case-Control Studies
Cluster Analysis
Collagen Type VII
Computational Biology
Computer Simulation
Embryonic Stem Cells
Epidermolysis Bullosa Dystrophica
Gene Expression Profiling
Genetic Markers
High-Throughput Nucleotide Sequencing
Humans
Leukocytes, Mononuclear
Lung
Machine Learning
Mice
Models, Genetic
RNA
Sequence Analysis, RNA
Single-Cell Analysis

Chemicals

COL7A1 protein, human
Collagen Type VII
Genetic Markers
RNA