A deep learning method for classification of HNSCC and HPV patients using single-cell transcriptomics.

Akanksha Jarwal, Anjali Dhall, Akanksha Arora, Sumeet Patiyal, Aman Srivastava, Gajendra P S Raghava
Author Information
  1. Akanksha Jarwal: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.
  2. Anjali Dhall: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.
  3. Akanksha Arora: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.
  4. Sumeet Patiyal: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.
  5. Aman Srivastava: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.
  6. Gajendra P S Raghava: Department of Computational Biology, Indraprastha Institute of Information Technology, Delhi, India.

Abstract

Background: Head and Neck Squamous Cell Carcinoma (HNSCC) is the seventh most highly prevalent cancer type worldwide. Early detection of HNSCC is one of the important challenges in managing the treatment of the cancer patients. Existing techniques for detecting HNSCC are costly, expensive, and invasive in nature.
Methods: In this study, we aimed to address this issue by developing classification models using machine learning and deep learning techniques, focusing on single-cell transcriptomics to distinguish between HNSCC and normal samples. Furthermore, we built models to classify HNSCC samples into HPV-positive (HPV+) and HPV-negative (HPV-) categories. In this study, we have used GSE181919 dataset, we have extracted 20 primary cancer (HNSCC) samples, and 9 normal tissues samples. The primary cancer samples contained 13 HPV- and 7 HPV+ samples. The models developed in this study have been trained on 80% of the dataset and validated on the remaining 20%. To develop an efficient model, we performed feature selection using mRMR method to shortlist a small number of genes from a plethora of genes. We also performed Gene Ontology (GO) enrichment analysis on the 100 shortlisted genes.
Results: Artificial Neural Network based model trained on 100 genes outperformed the other classifiers with an AUROC of 0.91 for HNSCC classification for the validation set. The same algorithm achieved an AUROC of 0.83 for the classification of HPV+ and HPV- patients on the validation set. In GO enrichment analysis, it was found that most genes were involved in binding and catalytic activities.
Conclusion: A software package has been developed in Python which allows users to identify HNSCC in patients along with their HPV status. It is available at https://webs.iiitd.edu.in/raghava/hnscpred/.

Keywords

References

  1. World J Stem Cells. 2018 Nov 26;10(11):160-171 [PMID: 30631391]
  2. Head Neck. 2019 Oct;41 Suppl 1:19-45 [PMID: 31573749]
  3. Sci Rep. 2020 Apr 30;10(1):7315 [PMID: 32355273]
  4. Indian J Otolaryngol Head Neck Surg. 2019 Oct;71(Suppl 1):1002-1011 [PMID: 31742110]
  5. N Engl J Med. 2010 Jul 1;363(1):24-35 [PMID: 20530316]
  6. Front Genet. 2022 Aug 24;13:927614 [PMID: 36092911]
  7. Mod Pathol. 2013 Jan;26(1):10-21 [PMID: 22899288]
  8. J Natl Cancer Inst. 2008 Feb 20;100(4):261-9 [PMID: 18270337]
  9. J Exp Med. 2018 Sep 3;215(9):2289-2310 [PMID: 30068544]
  10. Cancer Lett. 2020 Apr 28;476:23-33 [PMID: 31958486]
  11. Front Microbiol. 2023 Mar 23;14:1148579 [PMID: 37032893]
  12. Anticancer Res. 2018 Mar;38(3):1279-1290 [PMID: 29491051]
  13. Lancet. 2021 Dec 18;398(10318):2289-2299 [PMID: 34562395]
  14. Sci Rep. 2015 Aug 21;5:13413 [PMID: 26292924]
  15. Nat Rev Genet. 2016 Mar;17(3):175-88 [PMID: 26806412]
  16. Lab Invest. 2011 Aug;91(8):1170-80 [PMID: 21519330]
  17. Sci Rep. 2022 Oct 20;12(1):17560 [PMID: 36266384]
  18. J Exp Med. 2018 Sep 3;215(9):2229-2231 [PMID: 30068545]
  19. Mol Carcinog. 2016 May;55(5):499-513 [PMID: 25728212]
  20. Proteomics. 2024 Mar;24(6):e2300231 [PMID: 37525341]
  21. Proteomics. 2024 May 27;:e2400004 [PMID: 38803012]
  22. Nucleic Acids Res. 2014 Aug;42(14):8845-60 [PMID: 25053837]
  23. Aging (Albany NY). 2020 Jan 12;12(1):767-783 [PMID: 31927533]
  24. Genome Biol. 2018 Feb 6;19(1):15 [PMID: 29409532]
  25. Adv Exp Med Biol. 2014;825:97-127 [PMID: 25201104]
  26. Biosci Rep. 2018 Nov 7;38(6): [PMID: 30291214]
  27. Nat Rev Cancer. 2018 May;18(5):269-282 [PMID: 29497144]
  28. J Clin Oncol. 2011 Nov 10;29(32):4294-301 [PMID: 21969503]
  29. J Clin Oncol. 2012 Jun 10;30(17):2102-11 [PMID: 22565003]
  30. Nat Genet. 2000 May;25(1):25-9 [PMID: 10802651]
  31. Nat Rev Dis Primers. 2020 Nov 26;6(1):92 [PMID: 33243986]
  32. Sci Rep. 2021 Nov 18;11(1):22502 [PMID: 34795387]
  33. Transl Cancer Res. 2020 Oct;9(10):5882-5892 [PMID: 35117201]
  34. Nucleic Acids Res. 2013 Jan;41(Database issue):D377-86 [PMID: 23193289]
  35. Cancer Med. 2020 Jun;9(11):3954-3963 [PMID: 32277605]
  36. Methods Mol Biol. 2016;1418:93-110 [PMID: 27008011]
  37. Nat Commun. 2023 Feb 24;14(1):1055 [PMID: 36828832]
  38. Br J Cancer. 2021 Jun;124(12):1934-1940 [PMID: 33875821]
  39. Nucleic Acids Res. 2021 Jan 8;49(D1):D325-D334 [PMID: 33290552]
  40. Front Pharmacol. 2021 Nov 25;12:779779 [PMID: 34899345]
  41. PLoS One. 2019 Nov 7;14(11):e0224365 [PMID: 31697686]
  42. Acad Emerg Med. 2011 Oct;18(10):1099-104 [PMID: 21996075]
  43. Aging (Albany NY). 2021 Jul 15;13(14):18404-18422 [PMID: 34270462]
  44. BMC Bioinformatics. 2017 Jan 3;18(1):9 [PMID: 28049413]
  45. Front Oncol. 2020 Nov 27;10:598192 [PMID: 33330092]
  46. J Bioinform Comput Biol. 2005 Apr;3(2):185-205 [PMID: 15852500]
  47. Nat Methods. 2014 Jan;11(1):25-7 [PMID: 24524134]
  48. Cancers (Basel). 2016 Mar 29;8(4): [PMID: 27043631]

Word Cloud

Created with Highcharts 10.0.0HNSCCsamplesclassificationlearninggenescancerpatientsmodelsstudyusingdeeptranscriptomicsHPV+HPV-techniquesmachinesingle-cellnormaldatasetprimarydevelopedtrainedmodelperformedmethodGOenrichmentanalysis100AUROC0validationsetHPVBackground:HeadNeckSquamousCellCarcinomaseventhhighlyprevalenttypeworldwideEarlydetectiononeimportantchallengesmanagingtreatmentExistingdetectingcostlyexpensiveinvasivenatureMethods:aimedaddressissuedevelopingfocusingdistinguishFurthermorebuiltclassifyHPV-positiveHPV-negativecategoriesusedGSE181919extracted209tissuescontained13780%validatedremaining20%developefficientfeatureselectionmRMRshortlistsmallnumberplethoraalsoGeneOntologyshortlistedResults:ArtificialNeuralNetworkbasedoutperformedclassifiers91algorithmachieved83foundinvolvedbindingcatalyticactivitiesConclusion:softwarepackagePythonallowsusersidentifyalongstatusavailablehttps://websiiitdeduin/raghava/hnscpred/genebiomarkerssinglecell

Similar Articles

Cited By

No available data.