Sub-Cluster Identification through Semi-Supervised Optimization of Rare-Cell Silhouettes (SCISSORS) in single-cell RNA-sequencing.

Jack R Leary, Yi Xu, Ashley B Morrison, Chong Jin, Emily C Shen, Peyton C Kuhlers, Ye Su, Naim U Rashid, Jen Jen Yeh, Xianlu Laura Peng
Author Information
  1. Jack R Leary: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States. ORCID
  2. Yi Xu: Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  3. Ashley B Morrison: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  4. Chong Jin: Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  5. Emily C Shen: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  6. Peyton C Kuhlers: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  7. Ye Su: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  8. Naim U Rashid: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  9. Jen Jen Yeh: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  10. Xianlu Laura Peng: Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States. ORCID

Abstract

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) has enabled the molecular profiling of thousands to millions of cells simultaneously in biologically heterogenous samples. Currently, the common practice in scRNA-seq is to determine cell type labels through unsupervised clustering and the examination of cluster-specific genes. However, even small differences in analysis and parameter choosing can greatly alter clustering results and thus impose great influence on which cell types are identified. Existing methods largely focus on determining the optimal number of robust clusters, which can be problematic for identifying cells of extremely low abundance due to their subtle contributions toward overall patterns of gene expression.
RESULTS: Here, we present a carefully designed framework, SCISSORS, which accurately profiles subclusters within broad cluster(s) for the identification of rare cell types in scRNA-seq data. SCISSORS employs silhouette scoring for the estimation of heterogeneity of clusters and reveals rare cells in heterogenous clusters by a multi-step semi-supervised reclustering process. Additionally, SCISSORS provides a method for the identification of marker genes of high specificity to the cell type. SCISSORS is wrapped around the popular Seurat R package and can be easily integrated into existing Seurat pipelines.
AVAILABILITY AND IMPLEMENTATION: SCISSORS, including source code and vignettes, are freely available at https://github.com/jr-leary7/SCISSORS.

References

  1. Bioinformatics. 2020 Feb 15;36(4):1159-1166 [PMID: 31501851]
  2. Nat Immunol. 2019 Feb;20(2):163-172 [PMID: 30643263]
  3. Cell Syst. 2016 Oct 26;3(4):346-360.e4 [PMID: 27667365]
  4. Genome Biol. 2019 Dec 23;20(1):296 [PMID: 31870423]
  5. Front Immunol. 2019 Jul 26;10:1761 [PMID: 31402918]
  6. Nat Commun. 2020 Jun 22;11(1):3155 [PMID: 32572028]
  7. Bioinformatics. 2019 Apr 15;35(8):1269-1277 [PMID: 30202935]
  8. Cancer Cell. 2017 Aug 14;32(2):185-203.e13 [PMID: 28810144]
  9. Genome Biol. 2019 Jul 17;20(1):142 [PMID: 31315641]
  10. Genome Biol. 2020 Feb 7;21(1):31 [PMID: 32033589]
  11. Nat Methods. 2009 May;6(5):377-82 [PMID: 19349980]
  12. Genome Biol. 2021 Aug 19;22(1):232 [PMID: 34412669]
  13. Nat Commun. 2019 Oct 18;10(1):4729 [PMID: 31628300]
  14. Front Immunol. 2019 Aug 30;10:2035 [PMID: 31543877]
  15. Nucleic Acids Res. 2020 Sep 18;48(16):e94 [PMID: 32633778]
  16. Nature. 2015 Sep 10;525(7568):251-5 [PMID: 26287467]
  17. Nat Methods. 2017 May;14(5):483-486 [PMID: 28346451]
  18. J Exp Med. 2017 Mar 6;214(3):579-596 [PMID: 28232471]
  19. Clin Cancer Res. 2020 Sep 15;26(18):4901-4910 [PMID: 32156747]
  20. Nat Genet. 2020 Feb;52(2):231-240 [PMID: 31932696]
  21. Genome Biol. 2018 Feb 6;19(1):15 [PMID: 29409532]
  22. Cancer Discov. 2019 Aug;9(8):1102-1123 [PMID: 31197017]
  23. Cell. 2021 Jun 24;184(13):3573-3587.e29 [PMID: 34062119]
  24. Clin Cancer Res. 2020 Jan 1;26(1):82-92 [PMID: 31754050]
  25. Nature. 2016 Mar 3;531(7592):47-52 [PMID: 26909576]
  26. Mol Aspects Med. 2018 Feb;59:114-122 [PMID: 28712804]
  27. Genome Biol. 2016 Apr 07;17:63 [PMID: 27052890]
  28. Gigascience. 2019 Oct 1;8(10): [PMID: 31574155]
  29. Nat Genet. 2015 Oct;47(10):1168-78 [PMID: 26343385]
  30. Nat Biotechnol. 2020 Mar;38(3):333-342 [PMID: 31932730]
  31. Nat Med. 2011 Apr;17(4):500-3 [PMID: 21460848]
  32. Sci Rep. 2019 Mar 26;9(1):5233 [PMID: 30914743]
  33. Cell. 2019 Jun 13;177(7):1888-1902.e21 [PMID: 31178118]
  34. F1000Res. 2018 Jul 26;7:1141 [PMID: 30271584]
  35. BMC Bioinformatics. 2020 Apr 25;21(1):158 [PMID: 32334526]
  36. FASEB J. 2020 Sep;34(9):12214-12228 [PMID: 32686876]
  37. Nat Commun. 2017 Jan 16;8:14049 [PMID: 28091601]
  38. Immunity. 2019 May 21;50(5):1317-1334.e10 [PMID: 30979687]
  39. Bioinformatics. 2018 Sep 15;34(18):3217-3219 [PMID: 29897414]

Grants

  1. P50 CA257911/NCI NIH HHS
  2. R01 CA199064/NCI NIH HHS
  3. U01 CA274298/NCI NIH HHS
  4. U24 CA211000/NCI NIH HHS

MeSH Term

Algorithms
Gene Expression Profiling
Sequence Analysis, RNA
Single-Cell Analysis
Cluster Analysis
RNA

Chemicals

RNA

Word Cloud

Created with Highcharts 10.0.0SCISSORScellscRNA-seqcellscanclustersRNA-sequencingheterogenoustypeclusteringgenestypesidentificationrareSeuratMOTIVATION:Single-cellenabledmolecularprofilingthousandsmillionssimultaneouslybiologicallysamplesCurrentlycommonpracticedeterminelabelsunsupervisedexaminationcluster-specificHoweverevensmalldifferencesanalysisparameterchoosinggreatlyalterresultsthusimposegreatinfluenceidentifiedExistingmethodslargelyfocusdeterminingoptimalnumberrobustproblematicidentifyingextremelylowabundanceduesubtlecontributionstowardoverallpatternsgeneexpressionRESULTS:presentcarefullydesignedframeworkaccuratelyprofilessubclusterswithinbroadclustersdataemployssilhouettescoringestimationheterogeneityrevealsmulti-stepsemi-supervisedreclusteringprocessAdditionallyprovidesmethodmarkerhighspecificitywrappedaroundpopularRpackageeasilyintegratedexistingpipelinesAVAILABILITYANDIMPLEMENTATION:includingsourcecodevignettesfreelyavailablehttps://githubcom/jr-leary7/SCISSORSSub-ClusterIdentificationSemi-SupervisedOptimizationRare-CellSilhouettessingle-cell

Similar Articles

Cited By