Transformer for one stop interpretable cell type annotation.

Jiawei Chen, Hao Xu, Wanyu Tao, Zhaoxiong Chen, Yuxuan Zhao, Jing-Dong J Han
Author Information
  1. Jiawei Chen: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China. ORCID
  2. Hao Xu: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
  3. Wanyu Tao: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
  4. Zhaoxiong Chen: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
  5. Yuxuan Zhao: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
  6. Jing-Dong J Han: Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China. jackie.han@pku.edu.cn. ORCID

Abstract

Consistent annotation transfer from reference dataset to query dataset is fundamental to the development and reproducibility of single-cell research. Compared with traditional annotation methods, deep learning based methods are faster and more automated. A series of useful single cell analysis tools based on autoencoder architecture have been developed but these struggle to strike a balance between depth and interpretability. Here, we present TOSICA, a multi-head self-attention deep learning model based on Transformer that enables interpretable cell type annotation using biologically understandable entities, such as pathways or regulons. We show that TOSICA achieves fast and accurate one-stop annotation and batch-insensitive integration while providing biologically interpretable insights for understanding cellular behavior during development and disease progressions. We demonstrate TOSICA's advantages by applying it to scRNA-seq data of tumor-infiltrating immune cells, and CD14+ monocytes in COVID-19 to reveal rare cell types, heterogeneity and dynamic trajectories associated with disease progression and severity.

References

  1. Front Genet. 2020 May 12;11:490 [PMID: 32477414]
  2. Bioinformatics. 2020 Jan 15;36(2):533-538 [PMID: 31359028]
  3. BMC Bioinformatics. 2017 Mar 2;18(1):142 [PMID: 28249561]
  4. Genome Biol. 2019 Dec 12;20(1):264 [PMID: 31829268]
  5. Biomed Res. 2015;36(1):21-9 [PMID: 25749148]
  6. Development. 2019 Jun 17;146(12): [PMID: 31160421]
  7. iScience. 2020 Mar 27;23(3):100914 [PMID: 32151972]
  8. Nature. 2018 Oct;562(7727):367-372 [PMID: 30283141]
  9. Science. 2019 Mar 15;363(6432): [PMID: 30872492]
  10. Nat Rev Mol Cell Biol. 2022 May;23(5):303-304 [PMID: 35197610]
  11. Commun Biol. 2022 Oct 12;5(1):1084 [PMID: 36224302]
  12. Nat Methods. 2022 Jan;19(1):41-50 [PMID: 34949812]
  13. PLoS One. 2018 Oct 10;13(10):e0205499 [PMID: 30304022]
  14. Bioinformatics. 2015 Sep 15;31(18):2989-98 [PMID: 26002886]
  15. Nat Immunol. 2019 Feb;20(2):163-172 [PMID: 30643263]
  16. Immunity. 2019 Sep 17;51(3):573-589.e8 [PMID: 31474513]
  17. Science. 2015 Mar 6;347(6226):1138-42 [PMID: 25700174]
  18. Nat Commun. 2020 Apr 14;11(1):1818 [PMID: 32286268]
  19. Comput Struct Biotechnol J. 2022 Jun 14;20:3120-3132 [PMID: 35782735]
  20. Cell. 2018 Aug 9;174(4):1015-1030.e16 [PMID: 30096299]
  21. Genome Res. 2017 Feb;27(2):208-222 [PMID: 27864352]
  22. Cell. 2021 Feb 4;184(3):792-809.e23 [PMID: 33545035]
  23. Cell Syst. 2016 Oct 26;3(4):346-360.e4 [PMID: 27667365]
  24. Cell Metab. 2016 Oct 11;24(4):608-615 [PMID: 27667665]
  25. Nat Commun. 2020 Jul 10;11(1):3458 [PMID: 32651388]
  26. Cell. 2021 Nov 11;184(23):5838 [PMID: 34767776]
  27. Science. 2018 Apr 13;360(6385):176-182 [PMID: 29545511]
  28. Cell Syst. 2019 Aug 28;9(2):207-213.e2 [PMID: 31377170]
  29. Nat Methods. 2014 Jan;11(1):22-4 [PMID: 24524133]
  30. Genome Biol. 2018 Feb 6;19(1):15 [PMID: 29409532]
  31. Nat Rev Immunol. 2013 Dec;13(12):875-87 [PMID: 24157572]
  32. Comput Struct Biotechnol J. 2021 Oct 20;19:5874-5887 [PMID: 34815832]
  33. Nature. 2020 Jul;583(7817):590-595 [PMID: 32669714]
  34. Nat Methods. 2018 May;15(5):359-362 [PMID: 29608555]
  35. Bioinformatics. 2015 Jun 15;31(12):1974-80 [PMID: 25805722]
  36. Nat Biotechnol. 2018 Jun;36(5):411-420 [PMID: 29608179]
  37. Genome Biol. 2019 Sep 9;20(1):194 [PMID: 31500660]
  38. Science. 2021 Dec 17;374(6574):abe6474 [PMID: 34914499]
  39. Sci Adv. 2020 Oct 30;6(44): [PMID: 33127686]
  40. Cell Syst. 2016 Oct 26;3(4):385-394.e3 [PMID: 27693023]
  41. Nucleic Acids Res. 2019 Sep 19;47(16):e95 [PMID: 31226206]
  42. Sci Rep. 2020 Jul 2;10(1):10868 [PMID: 32616761]
  43. Nat Biotechnol. 2018 Jan;36(1):89-94 [PMID: 29227470]
  44. BMC Bioinformatics. 2020 Aug 4;21(1):342 [PMID: 32753029]
  45. Cell. 2019 Jun 13;177(7):1888-1902.e21 [PMID: 31178118]
  46. Hum Mol Genet. 2015 May 1;24(9):2552-64 [PMID: 25616962]
  47. Nat Methods. 2019 Oct;16(10):983-986 [PMID: 31501545]
  48. Sci Rep. 2019 Feb 25;9(1):2711 [PMID: 30804366]
  49. Am J Physiol Cell Physiol. 2013 Apr 15;304(8):C760-7 [PMID: 23407880]
  50. Genes (Basel). 2019 Jul 12;10(7): [PMID: 31336988]
  51. Cell Metab. 2016 Oct 11;24(4):593-607 [PMID: 27667667]

MeSH Term

Humans
Reproducibility of Results
COVID-19
Single-Cell Analysis
Disease Progression
Exome Sequencing
Sequence Analysis, RNA

Word Cloud

Created with Highcharts 10.0.0annotationcellbasedinterpretabledatasetdevelopmentmethodsdeeplearningTOSICATransformertypebiologicallydiseaseConsistenttransferreferencequeryfundamentalreproducibilitysingle-cellresearchComparedtraditionalfasterautomatedseriesusefulsingleanalysistoolsautoencoderarchitecturedevelopedstrugglestrikebalancedepthinterpretabilitypresentmulti-headself-attentionmodelenablesusingunderstandableentitiespathwaysregulonsshowachievesfastaccurateone-stopbatch-insensitiveintegrationprovidinginsightsunderstandingcellularbehaviorprogressionsdemonstrateTOSICA'sadvantagesapplyingscRNA-seqdatatumor-infiltratingimmunecellsCD14+monocytesCOVID-19revealraretypesheterogeneitydynamictrajectoriesassociatedprogressionseverityonestop

Similar Articles

Cited By