Uncovering functional lncRNAs by scRNA-seq with ELATUS.

Enrique Goñi, Aina Maria Mas, Jovanna Gonzalez, Amaya Abad, Marta Santisteban, Puri Fortes, Maite Huarte, Mikel Hernaez
Author Information
  1. Enrique Goñi: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain. ORCID
  2. Aina Maria Mas: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain. ORCID
  3. Jovanna Gonzalez: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain.
  4. Amaya Abad: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain.
  5. Marta Santisteban: Institute of Health Research of Navarra (IdiSNA), Pamplona, Spain. ORCID
  6. Puri Fortes: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain. ORCID
  7. Maite Huarte: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain. maitehuarte@unav.es. ORCID
  8. Mikel Hernaez: Center for Applied Medical Research, University of Navarra, PIO XII 55 Ave, Pamplona, Spain. mhernaez@unav.es. ORCID

Abstract

Long non-coding RNAs (lncRNAs) play fundamental roles in cellular processes and pathologies, regulating gene expression at multiple levels. Despite being highly cell type-specific, their study at single-cell (sc) level is challenging due to their less accurate annotation and low expression compared to protein-coding genes. Here, we systematically benchmark different preprocessing methods and develop a computational framework, named ELATUS, based on the combination of the pseudoaligner Kallisto with selective functional filtering. ELATUS enhances the detection of functional lncRNAs from scRNA-seq data, detecting their expression with higher concordance than standard methods with the ATAC-seq profiles in single-cell multiome data. Interestingly, the better results of ELATUS are due to its advanced performance with an inaccurate reference annotation such as that of lncRNAs. We independently confirm the expression patterns of cell type-specific lncRNAs exclusively detected with ELATUS and unveil biologically important lncRNAs, such as AL121895.1, a previously undocumented cis-repressor lncRNA, whose role in breast cancer progression is unnoticed by traditional methodologies. Our results emphasize the necessity for an alternative scRNA-seq workflow tailored to lncRNAs that sheds light on the multifaceted roles of lncRNAs.

References

  1. Rahman, R. U. et al. Singletrome: a method to analyze and enhance the transcriptome with long noncoding RNAs for single cell analysis. https://doi.org/10.1101/2022.10.31.514182 .
  2. Luo, H. et al. Single-cell long non-coding RNA landscape of T cells in human cancer immunity. Genomics Proteom. Bioinforma. 19, 377–393 (2021). [DOI: 10.1016/j.gpb.2021.02.006]
  3. Zheng, L. L. et al. ColorCells: a database of expression, classification and functions of lncRNAs in single cells. Brief. Bioinform 22, 1–11 (2021). [DOI: 10.1093/bib/bbaa325]
  4. Santus, L. et al. Single-cell profiling of lncRNA expression during Ebola virus infection in rhesus macaques. Nat. Commun. 2023 14:1 14, 1–14 (2023).
  5. Statello, L., Guo, C.-J., Chen, L.-L. & Huarte, M. Gene regulation by long non-coding RNAs and its biological functions. Nat. Rev. Mol. Cell Biol. 22, 96–118 (2021).
  6. Mattick, J. S. et al. Long non-coding RNAs: definitions, functions, challenges and recommendations. Nat. Rev. Mol. Cell Biol. 24, 430–447 (2023). [PMID: 36596869]
  7. Cabili, M. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011). [PMID: 21890647]
  8. Liu, S. J. et al. Single-cell analysis of long non-coding RNAs in the developing human neocortex https://doi.org/10.1186/s13059-016-0932-1 (2016).
  9. Atanasovska, B. et al. A liver-specific long noncoding RNA with a role in cell viability is elevated in human nonalcoholic steatohepatitis. Hepatology 66, 794–808 (2017). [PMID: 28073183]
  10. Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science (1979) 355, aah7111 (2017).
  11. Huarte, M. The emerging role of lncRNAs in cancer. Nat. Med. 21, 1253–1261 (2015). [PMID: 26540387]
  12. Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015). [PMID: 25599403]
  13. Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023). [PMID: 36420896]
  14. Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015). [PMID: 25959816]
  15. Uszczynska-Ratajczak, B., Lagarde, J., Frankish, A., Guigó, R. & Johnson, R. Towards a complete map of the human long non-coding RNA transcriptome. Nat. Rev. Genet. 19, 535–548 (2018). [PMID: 29795125]
  16. Kornienko, A. E. et al. Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans. Genome Biol. 17, 1–23 (2016). [DOI: 10.1186/s13059-016-0873-8]
  17. Lagarde, J. et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. Nat. Genet. 49, 1731–1740 (2017). [PMID: 29106417]
  18. Aldridge, S. & Teichmann, S. A. Single cell transcriptomics comes of age. Nat. Commun. 11, 1–4 (2020). [DOI: 10.1038/s41467-020-18158-5]
  19. Zeisel, A. et al. Molecular architecture of the mouse nervous system. Cell 174, 999–1014.e22 (2018). [PMID: 30096314]
  20. Prescott, S. L., Umans, B. D., Williams, E. K., Brust, R. D. & Liberles, S. D. An airway protection program revealed by sweeping genetic control of vagal afferents. Cell 181, 574–589.e14 (2020). [PMID: 32259485]
  21. La Manno, G. et al. RNA velocity of single cells. Nat. 560, 494–498 (2018). [DOI: 10.1038/s41586-018-0414-6]
  22. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015). [PMID: 26000488]
  23. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015). [PMID: 26000487]
  24. Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 1–12 (2017). [DOI: 10.1038/ncomms14049]
  25. Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 13, 599–604 (2018). [PMID: 29494575]
  26. Papalexi, E. & Satija, R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat. Rev. Immunol. 18, 35–45 (2017). [PMID: 28787399]
  27. Hwang, B., Lee, J. H. & Bang, D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp. Mol. Med. 53, 1005–1005 (2021). [PMID: 34045654]
  28. Luecken, M. D. & Theis, F. J. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15, e8746 (2019). [PMID: 31217225]
  29. You, Y. et al. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol. 22, 339 (2021). [PMID: 34906205]
  30. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [PMID: 23104886]
  31. Kaminow, B., Yunusov, D. & Dobin, A. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Preprint at bioRxiv https://doi.org/10.1101/2021.05.05.442755 (2021).
  32. Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016). [PMID: 27043002]
  33. Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). [PMID: 28263959]
  34. Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat. Biotechnol. 39, 813–818 (2021). [PMID: 33795888]
  35. Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics 35, 4472–4473 (2019). [PMID: 31073610]
  36. Srivastava, A., Malik, L., Smith, T., Sudbery, I. & Patro, R. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol. 20, 1–16 (2019). [DOI: 10.1186/s13059-019-1670-y]
  37. Srivastava, A. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 21, 1–29 (2020). [DOI: 10.1186/s13059-020-02151-8]
  38. See, K. et al. Single cardiomyocyte nuclear transcriptomes reveal a lincRNA-regulated de-differentiation and cell cycle stress-response in vivo. Nat. Commun. 8, 1–13 (2017). [DOI: 10.1038/s41467-017-00319-8]
  39. Kim, D. H. et al. Single-cell transcriptome analysis reveals dynamic changes in lncRNA expression during reprogramming. Cell Stem Cell 16, 88–101 (2015). [PMID: 25575081]
  40. Hu, W., Wang, T., Yang, Y. & Zheng, S. Tumor heterogeneity uncovered by dynamic expression of long noncoding RNA at single-cell resolution. Cancer Genet. 208, 581–586 (2015). [PMID: 26556691]
  41. Johnsson, P. et al. Transcriptional kinetics and molecular functions of long noncoding RNAs. Nat. Genet. 54, 306–317 (2022). [PMID: 35241826]
  42. Svensson, V., da Veiga Beltrame, E. & Pachter, L. A curated database reveals trends in single-cell transcriptomics. Database 2020, baaa073 (2020).
  43. Bitar, M. et al. Redefining normal breast cell populations using long noncoding RNAs. Nucleic Acids Res. https://doi.org/10.1093/nar/gkad339 (2023).
  44. He, Z. et al. Single-cell transcriptome analysis dissects lncRNA-associated gene networks in Arabidopsis. Plant Commun. 5, 100717 (2024). [PMID: 37715446]
  45. Vieth, B., Parekh, S., Ziegenhain, C., Enard, W. & Hellmann, I. A systematic evaluation of single cell RNA-seq analysis pipelines. Nat. Commun. 10, 1–11 (2019). [DOI: 10.1038/s41467-019-12266-7]
  46. Du, Y., Huang, Q., Arisdakessian, C. & Garmire, L. X. Evaluation of STAR and Kallisto on single cell RNA-Seq data alignment. G3 Genes Genomes Genet. 10, 1775–1783 (2020). [DOI: 10.1534/g3.120.401160]
  47. He, D. et al. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat. Methods 19, 316–322 (2022). [PMID: 35277707]
  48. Brüning, R. S., Tombor, L., Schulz, M. H., Dimmeler, S. & John, D. Comparative analysis of common alignment tools for single-cell RNA sequencing. Gigascience 11, giac001 (2022). [PMID: 35084033]
  49. Fang, S. et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 46, D308–D314 (2018). [PMID: 29140524]
  50. Zheng, H., Brennan, K., Hernaez, M. & Gevaert, O. Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples. 8, 1–13 (2019).
  51. 1k Brain Cells from an E18 Mouse (v3 chemistry) − 10x Genomics. https://www.10xgenomics.com/resources/datasets/1-k-brain-cells-from-an-e-18-mouse-v-3-chemistry-3-standard-3-0-0 .
  52. PBMCs from a Healthy Donor: Whole Transcriptome Analysis - 10x Genomics. https://www.10xgenomics.com/resources/datasets/pbm-cs-from-a-healthy-donor-whole-transcriptome-analysis-3-1-standard-4-0-0 .
  53. Fawkner-Corbett, D. et al. Spatiotemporal analysis of human intestinal development at single-cell resolution ll Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell 184, 810–826 (2021). [PMID: 33406409]
  54. Schupp, J. C. et al. Integrated single-cell atlas of endothelial cells of the human lung. Circulation 144, 286–302 (2021). [PMID: 34030460]
  55. Habermann, A. C. et al. Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci. Adv. 6, eaba1972 (2020). [PMID: 32832598]
  56. 10k Mouse PBMCs Multiplexed, 2 CMOs - 10x Genomics. https://www.10xgenomics.com/resources/datasets/10-k-mouse-pbm-cs-multiplexed-2-cm-os-3-1-standard-6-0-0 .
  57. 5k Peripheral Blood Mononuclear Cells (PBMCs) from a Healthy Donor (Next GEM) - 10x Genomics. https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-next-gem-3-1-standard-3-0-2 .
  58. PBMC from a Healthy Donor - Granulocytes Removed Through Cell Sorting (3k) - 10x Genomics. https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-3-k-1-standard-2-0-0 .
  59. Kirk, J. M. et al. Functional classification of long non-coding RNAs by k-mer content. Nat. Genet. 50, 1474–1482 (2018). [PMID: 30224646]
  60. GENCODE - Human Release 19. https://www.gencodegenes.org/human/release_19.html .
  61. GENCODE - Human Release 45. https://www.gencodegenes.org/human/release_45.html .
  62. Wu, S. Z. et al. A single-cell and spatially resolved atlas of human breast cancers. Nat. Genet. 53, 1334–1347 (2021). [PMID: 34493872]
  63. Namba, M. et al. Establishment of five human myeloma cell lines. Vitr. Cell. Developmental Biol. 25, 723–729 (1989). [DOI: 10.1007/BF02623725]
  64. Edwards, J. C. W. & Cambridge, G. B-cell targeting in rheumatoid arthritis and other autoimmune diseases. Nat. Rev. Immunol. 6, 394–403 (2006). [PMID: 16622478]
  65. Jourdan, M. et al. An in vitro model of differentiation of memory B cells into plasmablasts and plasma cells including detailed phenotypic and molecular characterization. Blood 114, 5173–5181 (2009). [PMID: 19846886]
  66. Wang, H. et al. Selective effects of protein 4.1N deficiency on neuroendocrine and reproductive systems. Sci. Rep. 10, 1–14 (2020).
  67. Kim, A. C., Van Huffel, C., Lutchman, M. & Chishti, A. H. Radiation hybrid mapping ofEPB41L1,a novel protein 4.1 homologue, to human chromosome 20q11.2–q12. Genomics 49, 165–166 (1998). [PMID: 9570967]
  68. Petitjean, A., Achatz, M. I. W., Borresen-Dale, A. L., Hainaut, P. & Olivier, M. TP53 mutations in human cancers: functional selection and impact on cancer prognosis and outcomes. Oncogene 26, 2157–2165 (2007). [PMID: 17401424]
  69. AL121895.1. https://www.maherlab.com/pdaclncdb/al121895.1 .
  70. Hjörleifsson, K. E., Sullivan, D. K., Holley, G., Melsted, P. & Pachter, L. Accurate quantification of single-nucleus and single-cell RNA-seq transcripts. https://doi.org/10.1101/2022.12.02.518832 .
  71. He, D., Soneson, C. & Patro, R. Understanding and evaluating ambiguity in single-cell and single-nucleus RNA-sequencing. Preprint at bioRxiv https://doi.org/10.1101/2023.01.04.522742 (2023).
  72. Pool, A. H., Poldsam, H., Chen, S., Thomson, M. & Oka, Y. Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references. Nat. Methods 20, 1506–1515 (2023). [PMID: 37697162]
  73. Chakraborty, S. et al. Harnessing the tissue and plasma lncRNA-peptidome to discover peptide-based cancer biomarkers. Sci. Rep. 9, 1–17 (2019). [DOI: 10.1038/s41598-019-48774-1]
  74. Goyal, B. et al. Diagnostic, prognostic, and therapeutic significance of long non-coding RNA MALAT1 in cancer. BBA-Rev. Cancer 1875, 188502 (2021).
  75. SC5P-R2 sequencing · Issue #226 · pachterlab/kallisto. https://github.com/pachterlab/kallisto/issues/226 .
  76. Selective Alignment. https://combine-lab.github.io/alevin-tutorial/2019/selective-alignment/ .
  77. Amezquita, R. A. et al. Orchestrating single-cell analysis with bioconductor. Nat. Methods 17, 137–145 (2019). [PMID: 31792435]
  78. Lun, A. T. L. et al. EmptyDrops: Distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 1–9 (2019). [DOI: 10.1186/s13059-019-1662-y]
  79. Germain, P. L., Lun, A., Macnair, W. & Robinson, M. D. Doublet identification in single-cell sequencing data using scDblFinder. F1000Research 10, 979 (2021). [PMID: 35814628]
  80. LTLA/scuttle: Clone of the Bioconductor repository for the scuttle package. https://github.com/LTLA/scuttle/ .
  81. McCarthy, D. J., Campbell, K. R., Lun, A. T. L. & Wills, Q. F. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017). [PMID: 28088763]
  82. Lun, A. T. et al. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Research 5, 2122 (2016). [PMID: 27909575]
  83. Network Analysis and Visualization [R package igraph version 1.5.1]. (2023).
  84. igraph – Network analysis software. https://igraph.org/ .
  85. Goyal, M. et al. JIND: joint integration and discrimination for automated single-cell annotation. Bioinformatics 38, 2488–2495 (2022). [PMID: 35253844]
  86. Joint RNA and ATAC analysis: 10x multiomic • Signac. https://stuartlab.org/signac/articles/pbmc_multiomic .
  87. Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021). [PMID: 34062119]
  88. Weighted Nearest Neighbor Analysis • Seurat. https://satijalab.org/seurat/articles/weighted_nearest_neighbor_analysis .
  89. Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021). [PMID: 34725479]
  90. RepeatMasker Home Page. https://www.repeatmasker.org/ .
  91. Index of /shares/mhammelllab/www-data/TEtranscripts/TE_GTF. https://labshare.cshl.edu/shares/mhammelllab/www-data/TEtranscripts/TE_GTF/ .
  92. CalabreseLab/seekr: A library for counting small kmer frequencies in nucleotide sequences. https://github.com/CalabreseLab/seekr .
  93. Camargo, A. P., Vasconcelos, A. A., Fiamenghi, M. B., Pereira, G. A. G. & Carazzolle, M. F. tspex: a tissue-specificity calculator for gene expression data. 1–7 https://doi.org/10.21203/RS.3.RS-51998/V1 (2020).
  94. Zucca, S. et al. RNA-Seq profiling in peripheral blood mononuclear cells of amyotrophic lateral sclerosis patients and controls. Sci. Data 6, 1–8 (2019). [DOI: 10.1038/sdata.2019.6]
  95. Zhang, J. et al. Deep annotation of long noncoding RNAs by assembling RNA-seq and small RNA-seq data. J. Biol. Chem. 299, 105130 (2023). [PMID: 37543366]
  96. Melé, M. et al. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res. 27, 27–37 (2017). [PMID: 27927715]
  97. Deveson, I. W. et al. Universal alternative splicing of noncoding exons. Cell Syst. 6, 245–255.e5 (2018). [PMID: 29396323]
  98. Böttcher, A. & Wenzel, D. The Frobenius norm and the commutator. Linear Algebra Appl. 429, 1864–1885 (2008). [DOI: 10.1016/j.laa.2008.05.020]
  99. Benjaminit, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57, 289–300 (1995). [DOI: 10.1111/j.2517-6161.1995.tb02031.x]
  100. Conway, J. R., Lex, A. & Gehlenborg, N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics 33, 2938–2940 (2017). [PMID: 28645171]
  101. Goñi, E. et al. Uncovering functional lncRNAs by scRNA-seq with ELATUS. Preprint at bioRxiv https://doi.org/10.1101/2024.01.26.577344 (2024).

Grants

  1. LCF/PR/HR21/00176/"la Caixa" Foundation (Caixa Foundation)
  2. 0011-0537-2020-000038/Departamento de Educación, Gobierno de Navarra (Department of Education, Government of Navarra)
  3. 898356/EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 Marie Skłodowska-Curie Actions (H2020 Excellent Science - Marie Skłodowska-Curie Actions)

MeSH Term

RNA, Long Noncoding
Humans
Single-Cell Analysis
RNA-Seq
Computational Biology
Gene Expression Profiling
Sequence Analysis, RNA
Single-Cell Gene Expression Analysis

Chemicals

RNA, Long Noncoding

Word Cloud

Created with Highcharts 10.0.0lncRNAsELATUSexpressionfunctionalscRNA-seqrolescelltype-specificsingle-celldueannotationmethodsdataresultsLongnon-codingRNAsplayfundamentalcellularprocessespathologiesregulatinggenemultiplelevelsDespitehighlystudysclevelchallenginglessaccuratelowcomparedprotein-codinggenessystematicallybenchmarkdifferentpreprocessingdevelopcomputationalframeworknamedbasedcombinationpseudoalignerKallistoselectivefilteringenhancesdetectiondetectinghigherconcordancestandardATAC-seqprofilesmultiomeInterestinglybetteradvancedperformanceinaccuratereferenceindependentlyconfirmpatternsexclusivelydetectedunveilbiologicallyimportantAL1218951previouslyundocumentedcis-repressorlncRNAwhoserolebreastcancerprogressionunnoticedtraditionalmethodologiesemphasizenecessityalternativeworkflowtailoredshedslightmultifacetedUncovering

Similar Articles

Cited By