LncDC: a machine learning-based tool for long non-coding RNA detection from RNA-Seq data.

Minghua Li, Chun Liang
Author Information
  1. Minghua Li: Department of Biology, Miami University, Oxford, OH, 45056, USA. lim74@miamioh.edu.
  2. Chun Liang: Department of Biology, Miami University, Oxford, OH, 45056, USA. liangc@miamioh.edu.

Abstract

Long non-coding RNAs (lncRNAs) play an essential role in diverse biological processes and disease development. Accurate classification of lncRNAs and mRNAs is important for the identification of tissue- or disease-specific lncRNAs. Here, we present our tool LncDC (Long non-coding RNA detection) that is able to accurately predict lncRNAs with an XGBoost model using features extracted from RNA sequences, secondary structures, and translated proteins. Benchmarking experiments showed that LncDC consistently outperformed six state-of-the-art tools in distinguishing lncRNAs from mRNAs. Notably, the use of sequence and secondary structure (SASS) k-mer score features and flexible ORF features improved the classification capability of LncDC. We anticipate that LncDC will definitely promote the discovery of more and novel disease-specific lncRNAs. LncDC is implemented in Python and freely available at https://github.com/lim74/LncDC .

References

  1. Genes Dev. 2011 Sep 15;25(18):1915-27 [PMID: 21890647]
  2. Sci Rep. 2016 Oct 06;6:34838 [PMID: 27708423]
  3. Front Plant Sci. 2020 Mar 12;11:276 [PMID: 32226437]
  4. Genome Res. 2012 Sep;22(9):1775-89 [PMID: 22955988]
  5. Nucleic Acids Res. 1981 Jan 10;9(1):133-48 [PMID: 6163133]
  6. Nucleic Acids Res. 2017 May 5;45(8):e57 [PMID: 28053114]
  7. Nucleic Acids Res. 1992 Dec 25;20(24):6441-50 [PMID: 1480466]
  8. Front Oncol. 2021 May 25;11:632172 [PMID: 34113559]
  9. Bioinformatics. 2013 Jan 1;29(1):15-21 [PMID: 23104886]
  10. Molecules. 2016 Jul 28;21(8): [PMID: 27483216]
  11. Nat Rev Mol Cell Biol. 2021 Feb;22(2):96-118 [PMID: 33353982]
  12. Cell Physiol Biochem. 2017;42(4):1407-1419 [PMID: 28715796]
  13. Nucleic Acids Res. 2019 May 7;47(8):e43 [PMID: 30753596]
  14. Bioinformatics. 2006 Jul 1;22(13):1658-9 [PMID: 16731699]
  15. Nat Protoc. 2016 Sep;11(9):1650-67 [PMID: 27560171]
  16. Cell Mol Life Sci. 2016 Jul;73(13):2491-509 [PMID: 27007508]
  17. Electrophoresis. 1993 Oct;14(10):1023-31 [PMID: 8125050]
  18. Nucleic Acids Res. 1994 Aug 11;22(15):3174-80 [PMID: 8065933]
  19. Nucleic Acids Res. 2017 Jan 9;45(1):e2 [PMID: 27608726]
  20. Protein Eng. 1990 Dec;4(2):155-61 [PMID: 2075190]
  21. Algorithms Mol Biol. 2011 Nov 24;6:26 [PMID: 22115189]
  22. Nat Commun. 2017 Jul 5;8(1):59 [PMID: 28680106]
  23. Bioinformatics. 2009 Jun 1;25(11):1422-3 [PMID: 19304878]
  24. Crit Rev Biochem Mol Biol. 2020 Dec;55(6):662-690 [PMID: 33043695]
  25. Nat Rev Genet. 2009 Mar;10(3):155-9 [PMID: 19188922]
  26. Nat Rev Genet. 2018 Sep;19(9):535-548 [PMID: 29795125]
  27. Genome Res. 2012 May;22(5):885-98 [PMID: 22406755]
  28. Nat Protoc. 2012 Mar 01;7(3):562-78 [PMID: 22383036]
  29. Nucleic Acids Res. 2016 Jan 4;44(D1):D203-8 [PMID: 26586799]
  30. Rheumatol Ther. 2017 Jun;4(1):25-43 [PMID: 27933467]
  31. Expert Rev Anticancer Ther. 2018 Jan;18(1):39-50 [PMID: 29210294]
  32. Nucleic Acids Res. 1982 Sep 11;10(17):5303-18 [PMID: 7145702]
  33. Mol Med Rep. 2015 Apr;11(4):2534-40 [PMID: 25434862]
  34. Cancer. 2009 Apr 1;115(7):1531-43 [PMID: 19197972]
  35. J Mol Biol. 1982 May 5;157(1):105-32 [PMID: 7108955]
  36. J Pediatr Hematol Oncol. 2003 Jan;25(1):27-32 [PMID: 12544770]
  37. Genome Res. 2012 Sep;22(9):1698-710 [PMID: 22955982]
  38. Nucleic Acids Res. 2020 Jan 8;48(D1):D682-D688 [PMID: 31691826]
  39. Front Behav Neurosci. 2018 Sep 28;12:175 [PMID: 30323747]
  40. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W345-9 [PMID: 17631615]
  41. Trends Cell Biol. 2018 Apr;28(4):287-301 [PMID: 29274663]
  42. Brief Bioinform. 2019 Nov 27;20(6):2009-2027 [PMID: 30084867]
  43. Cancer Res. 2017 Aug 1;77(15):3965-3981 [PMID: 28701486]
  44. J Eukaryot Microbiol. 1999 May-Jun;46(3):239-47 [PMID: 10377985]
  45. Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773 [PMID: 30357393]
  46. Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16 [PMID: 28521017]
  47. J Clin Oncol. 2012 Jul 10;30(20):2545-51 [PMID: 22665540]
  48. DNA Res. 2009 Feb;16(1):13-30 [PMID: 19131380]
  49. Cancer Res. 2018 Jan 15;78(2):326-337 [PMID: 29066513]
  50. Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45 [PMID: 26553804]
  51. Nucleic Acids Res. 2013 Apr 1;41(6):e74 [PMID: 23335781]
  52. EXCLI J. 2020 Apr 15;19:501-513 [PMID: 32398974]
  53. Nat Cell Biol. 2019 May;21(5):542-551 [PMID: 31048766]
  54. Mol Biol Rep. 2016 May;43(5):427-36 [PMID: 27022737]
  55. Genetics. 2013 Mar;193(3):651-69 [PMID: 23463798]

MeSH Term

RNA, Long Noncoding
RNA-Seq
Machine Learning
RNA, Messenger
Proteins

Chemicals

RNA, Long Noncoding
RNA, Messenger
Proteins

Word Cloud

Created with Highcharts 10.0.0lncRNAsLncDCnon-codingRNAfeaturesLongclassificationmRNAsdisease-specifictooldetectionsecondaryRNAsplayessentialrolediversebiologicalprocessesdiseasedevelopmentAccurateimportantidentificationtissue-presentableaccuratelypredictXGBoostmodelusingextractedsequencesstructurestranslatedproteinsBenchmarkingexperimentsshowedconsistentlyoutperformedsixstate-of-the-arttoolsdistinguishingNotablyusesequencestructureSASSk-merscoreflexibleORFimprovedcapabilityanticipatewilldefinitelypromotediscoverynovelimplementedPythonfreelyavailablehttps://githubcom/lim74/LncDCLncDC:machinelearning-basedlongRNA-Seqdata

Similar Articles

Cited By