scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy.

Zechuan Chen, Zeruo Yang, Xiaojun Yuan, Xiaoming Zhang, Pei Hao
Author Information
  1. Zechuan Chen: College of Life Sciences, Shanghai University, Shanghai, China.
  2. Zeruo Yang: Natural Medicine Institute of Zhejiang YangShengTang Co., Ltd., No. 181, Geyazhuang, Xihu District, Hangzhou, Zhejiang, China.
  3. Xiaojun Yuan: College of Life Sciences, Shanghai University, Shanghai, China.
  4. Xiaoming Zhang: Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China. xmzhang@ips.ac.cn.
  5. Pei Hao: Key Laboratory of Molecular Virology and Immunology, Institut Pasteur of Shanghai, Chinese Academy of Sciences, Shanghai, China. phao@ips.ac.cn.

Abstract

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation.
RESULT: In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as "sensitive genes". To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools.
CONCLUSION: Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future.

Keywords

References

  1. Cell Stem Cell. 2019 Oct 3;25(4):558-569.e7 [PMID: 31474560]
  2. Nat Protoc. 2014 Jan;9(1):171-81 [PMID: 24385147]
  3. Cell. 2015 May 21;161(5):1202-1214 [PMID: 26000488]
  4. Sci Data. 2020 Jan 2;7(1):4 [PMID: 31896769]
  5. OMICS. 2012 May;16(5):284-7 [PMID: 22455463]
  6. Nat Biotechnol. 2015 Feb;33(2):155-60 [PMID: 25599176]
  7. Mol Aspects Med. 2018 Feb;59:114-122 [PMID: 28712804]
  8. Trends Genet. 2013 Oct;29(10):569-74 [PMID: 23810203]
  9. Genome Biol. 2019 Dec 31;21(1):1 [PMID: 31892341]
  10. Nucleic Acids Res. 2021 Jan 8;49(D1):D545-D551 [PMID: 33125081]
  11. Genome Biol. 2020 Jan 16;21(1):12 [PMID: 31948481]
  12. Cells. 2019 Dec 19;9(1): [PMID: 31861624]
  13. Protein Sci. 2019 Nov;28(11):1947-1951 [PMID: 31441146]
  14. Nat Rev Genet. 2019 May;20(5):273-282 [PMID: 30617341]
  15. F1000Res. 2016 Aug 31;5:2122 [PMID: 27909575]
  16. Cell Res. 2018 Dec;28(12):1141-1157 [PMID: 30315278]
  17. Cell Syst. 2019 Apr 24;8(4):329-337.e4 [PMID: 30954475]
  18. Science. 2012 Apr 27;336(6080):425-6 [PMID: 22539709]
  19. Nat Med. 2018 Jul;24(7):978-985 [PMID: 29942094]
  20. Cell. 2008 Oct 17;135(2):216-26 [PMID: 18957198]
  21. Brief Bioinform. 2019 Jul 19;20(4):1583-1589 [PMID: 29481632]
  22. BMC Bioinformatics. 2009 Jan 30;10 Suppl 1:S56 [PMID: 19208159]
  23. Nat Commun. 2018 Oct 22;9(1):4383 [PMID: 30348985]
  24. Nat Methods. 2019 Jun;16(6):479-487 [PMID: 31133762]
  25. Nucleic Acids Res. 2000 Jan 1;28(1):27-30 [PMID: 10592173]
  26. Front Immunol. 2018 Oct 23;9:2425 [PMID: 30405621]
  27. BMC Genomics. 2016 Aug 22;17 Suppl 7:508 [PMID: 27556924]
  28. Nat Commun. 2017 Jan 16;8:14049 [PMID: 28091601]
  29. Nat Methods. 2009 May;6(5):377-82 [PMID: 19349980]
  30. Nat Biotechnol. 2015 May;33(5):495-502 [PMID: 25867923]

MeSH Term

Gene Expression Profiling
Humans
RNA
Reproducibility of Results
Sequence Analysis, RNA
Single-Cell Analysis

Chemicals

RNA

Word Cloud

Created with Highcharts 10.0.0genesgenedatasensitivescRNA-seqexpressioncellRNAsequencingsingle-celltoolsunsupervisedclusteringSingle-celltissuessubsetsstatespatternsmethodShannonentropynoisesetsresultsground-truthlabelsBACKGROUND:widelyusedtechniqueobtainprofilescomplexCelldevelopmentaloftenidentifiedviadifferentialutilizedhighlyvariableannotateHoweverdiscoveredgroupsensitivelyrespondenvironmentalstimulihighcoefficientsvariationCVmightimposeoverwhelminginfluencestypeannotationRESULT:researchdevelopedbasedCV-rankidentifytermed"sensitivegenes"validatereliabilitymethodsapplied11differenthumanshowedenrichedpathwaysrelatedcellularstressresponseFurthermorenoticedresultcloserremovingdetectedCONCLUSION:studyrevealedprevalencestochastictypescellscompareddifferencesamongmarkerhousekeepingHKdemonstratedsimilaritiesfunctionsvariousimprovedtowardshopeprovidenewinsightsreductionanalysiscontributedevelopmentbetteralgorithmsfuturescSensitiveGeneDefine:detectionSensitiveStochasticUnsupervised

Similar Articles

Cited By