Deep Learning Methods for Omics Data Imputation.

Lei Huang, Meng Song, Hui Shen, Huixiao Hong, Ping Gong, Hong-Wen Deng, Chaoyang Zhang
Author Information
  1. Lei Huang: School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA. ORCID
  2. Meng Song: School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA.
  3. Hui Shen: Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA.
  4. Huixiao Hong: Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA. ORCID
  5. Ping Gong: Environmental Laboratory, U.S. Army Engineer Research and Development Center, Vicksburg, MS 39180, USA. ORCID
  6. Hong-Wen Deng: Center for Biomedical Informatics and Genomics, School of Medicine, Tulane University, New Orleans, LA 70112, USA. ORCID
  7. Chaoyang Zhang: School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS 39406, USA.

Abstract

One common problem in omics data analysis is missing values, which can arise due to various reasons, such as poor tissue quality and insufficient sample volumes. Instead of discarding missing values and related data, imputation approaches offer an alternative means of handling missing data. However, the imputation of missing omics data is a non-trivial task. Difficulties mainly come from high dimensionality, non-linear or non-monotonic relationships within features, technical variations introduced by sampling methods, sample heterogeneity, and the non-random missingness mechanism. Several advanced imputation methods, including deep learning-based methods, have been proposed to address these challenges. Due to its capability of modeling complex patterns and relationships in large and high-dimensional datasets, many researchers have adopted deep learning models to impute missing omics data. This review provides a comprehensive overview of the currently available deep learning-based methods for omics imputation from the perspective of deep generative model architectures such as autoencoder, variational autoencoder, generative adversarial networks, and Transformer, with an emphasis on multi-omics data imputation. In addition, this review also discusses the opportunities that deep learning brings and the challenges that it might face in this field.

Keywords

References

  1. Nat Methods. 2021 Oct;18(10):1196-1203 [PMID: 34608324]
  2. Genome Biol. 2020 Mar 30;21(1):81 [PMID: 32228704]
  3. Proc Natl Acad Sci U S A. 2021 Apr 13;118(15): [PMID: 33827925]
  4. IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12113-12132 [PMID: 37167049]
  5. Nat Commun. 2018 Mar 8;9(1):997 [PMID: 29520097]
  6. Front Genet. 2020 Oct 15;11:570255 [PMID: 33193667]
  7. Int J Methods Psychiatr Res. 2011 Mar;20(1):40-9 [PMID: 21499542]
  8. BMJ. 2009 Jun 29;338:b2393 [PMID: 19564179]
  9. Metabolomics. 2018 Nov 23;14(12):153 [PMID: 30830437]
  10. Cell Rep Methods. 2021 Sep 15;1(5):100071 [PMID: 35474667]
  11. Front Genet. 2021 Apr 13;12:624128 [PMID: 33927746]
  12. PLoS One. 2010 Mar 15;5(3):e9697 [PMID: 20300623]
  13. Nat Commun. 2018 Feb 21;9(1):750 [PMID: 29467363]
  14. Bioinformatics. 2001 Jun;17(6):520-5 [PMID: 11395428]
  15. Nucleic Acids Res. 2020 Sep 4;48(15):e85 [PMID: 32588900]
  16. Cell Rep. 2020 May 19;31(7):107663 [PMID: 32433972]
  17. Gigascience. 2020 Aug 1;9(8): [PMID: 32761097]
  18. Gigascience. 2020 Jul 1;9(7): [PMID: 32649756]
  19. Nat Commun. 2019 Jan 23;10(1):390 [PMID: 30674886]
  20. Front Artif Intell. 2022 Nov 03;5:1028978 [PMID: 36406474]
  21. Nat Genet. 2023 Dec;55(12):2269-2276 [PMID: 37985819]
  22. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  23. PLoS Comput Biol. 2020 Feb 21;16(2):e1007287 [PMID: 32084131]
  24. Nat Methods. 2021 Mar;18(3):272-282 [PMID: 33589839]
  25. Nat Methods. 2018 Dec;15(12):1053-1058 [PMID: 30504886]
  26. Nat Commun. 2020 Jan 9;11(1):166 [PMID: 31919373]
  27. Nat Commun. 2019 Oct 8;10(1):4576 [PMID: 31594952]
  28. Genome Biol. 2017 Apr 11;18(1):67 [PMID: 28395661]
  29. Science. 2006 Jul 28;313(5786):504-7 [PMID: 16873662]
  30. Nat Methods. 2019 Apr;16(4):311-314 [PMID: 30886411]
  31. BMC Bioinformatics. 2018 Jun 8;19(1):220 [PMID: 29884114]
  32. Sci Rep. 2018 Nov 5;8(1):16329 [PMID: 30397240]
  33. Genome Biol. 2020 Jul 27;21(1):183 [PMID: 32718323]
  34. Genes (Basel). 2019 Aug 28;10(9): [PMID: 31466333]
  35. Nat Biotechnol. 2022 Oct;40(10):1458-1466 [PMID: 35501393]
  36. Proc Natl Acad Sci U S A. 2022 Aug 23;119(34):e2206069119 [PMID: 35969790]
  37. Nat Genet. 2013 Oct;45(10):1113-20 [PMID: 24071849]
  38. Nature. 2022 Mar;603(7901):455-463 [PMID: 35264797]
  39. BMC Bioinformatics. 2019 Oct 11;20(1):492 [PMID: 31601178]
  40. Nat Methods. 2018 Jul;15(7):539-542 [PMID: 29941873]
  41. IEEE Trans Image Process. 2004 Apr;13(4):600-12 [PMID: 15376593]
  42. Nat Methods. 2019 Nov;16(11):1139-1145 [PMID: 31591579]
  43. Bioinformatics. 2022 Jan 12;38(3):597-603 [PMID: 34718418]
  44. Nat Commun. 2022 Aug 30;13(1):5099 [PMID: 36042233]
  45. PLoS Comput Biol. 2020 Jul 20;16(7):e1008050 [PMID: 32687525]
  46. Nat Commun. 2020 Jan 31;11(1):651 [PMID: 32005835]
  47. Nat Commun. 2022 Jan 11;13(1):192 [PMID: 35017482]
  48. Genome Biol. 2019 Oct 18;20(1):211 [PMID: 31627739]
  49. J Proteome Res. 2016 Apr 1;15(4):1116-25 [PMID: 26906401]

Grants

  1. P20 GM109036/NIGMS NIH HHS
  2. R01 AG061917/NIA NIH HHS
  3. R01 AR069055/NIAMS NIH HHS
  4. U19 AG055373/NIA NIH HHS

Word Cloud

Created with Highcharts 10.0.0imputationdatadeepomicsmissingmethodslearningvaluessamplerelationshipslearning-basedchallengesreviewgenerativeautoencodermulti-omicsOnecommonproblemanalysiscanariseduevariousreasonspoortissuequalityinsufficientvolumesInsteaddiscardingrelatedapproachesofferalternativemeanshandlingHowevernon-trivialtaskDifficultiesmainlycomehighdimensionalitynon-linearnon-monotonicwithinfeaturestechnicalvariationsintroducedsamplingheterogeneitynon-randommissingnessmechanismSeveraladvancedincludingproposedaddressDuecapabilitymodelingcomplexpatternslargehigh-dimensionaldatasetsmanyresearchersadoptedmodelsimputeprovidescomprehensiveoverviewcurrentlyavailableperspectivemodelarchitecturesvariationaladversarialnetworksTransformeremphasisadditionalsodiscussesopportunitiesbringsmightfacefieldDeepLearningMethodsOmicsDataImputation

Similar Articles

Cited By