A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer.

Rahaf M Ahmad, Bassam R Ali, Fatma Al-Jasmi, Richard O Sinnott, Noura Al Dhaheri, Mohd Saberi Mohamad
Author Information
  1. Rahaf M Ahmad: Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates. ORCID
  2. Bassam R Ali: Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates. ORCID
  3. Fatma Al-Jasmi: Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates. ORCID
  4. Richard O Sinnott: School of Computing and Information System, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Victoria, Australia. ORCID
  5. Noura Al Dhaheri: Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates. ORCID
  6. Mohd Saberi Mohamad: Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates. ORCID

Abstract

Studies continue to uncover contributing risk factors for breast cancer (BC) development including genetic variants. Advances in machine learning and big data generated from genetic sequencing can now be used for predicting BC pathogenicity. However, it is unclear which tool developed for pathogenicity prediction is most suited for predicting the impact and pathogenicity of variant effects. A significant challenge is to determine the most suitable data source for each tool since different tools can yield different prediction results with different data inputs. To this end, this work reviews genetic variant databases and tools used specifically for the prediction of BC pathogenicity. We provide a description of existing genetic variants databases and, where appropriate, the diseases for which they have been established. Through example, we illustrate how they can be used for prediction of BC pathogenicity and discuss their associated advantages and disadvantages. We conclude that the tools that are specialized by training on multiple diverse datasets from different databases for the same disease have enhanced accuracy and specificity and are thereby more helpful to the clinicians in predicting and diagnosing BC as early as possible.

Keywords

References

Hum Mutat. 2020 Jun;41(6):1079-1090 [PMID: 32176384]
Nat Protoc. 2016 Jan;11(1):1-9 [PMID: 26633127]
Nat Rev Cancer. 2011 Oct 13;11(11):761-74 [PMID: 21993244]
Cancer Res. 2017 Feb 15;77(4):971-981 [PMID: 27923830]
Front Genet. 2022 Sep 30;13:982930 [PMID: 36246618]
Am J Hum Genet. 2021 Apr 1;108(4):682-695 [PMID: 33761318]
J Clin Oncol. 2016 Jun 1;34(16):1872-81 [PMID: 26926684]
Neural Comput. 2006 Jul;18(7):1527-54 [PMID: 16764513]
Trends Genet. 2000 May;16(5):198-200 [PMID: 10782110]
Oncogene. 2010 Oct 21;29(42):5700-11 [PMID: 20676140]
Nucleic Acids Res. 2016 Jan 4;44(D1):D975-9 [PMID: 26635391]
Cancer Sci. 2018 Feb;109(2):453-461 [PMID: 29215753]
Bioinformatics. 2010 Mar 15;26(6):851-2 [PMID: 20106818]
N Engl J Med. 2016 Sep 22;375(12):1109-12 [PMID: 27653561]
Nature. 2012 Mar 28;483(7391):603-7 [PMID: 22460905]
Genome Med. 2016 Nov 4;8(1):117 [PMID: 27814769]
Drug Des Devel Ther. 2015 Oct 01;9:5439-45 [PMID: 26491255]
Mol Cancer Ther. 2016 Aug;15(8):1781-91 [PMID: 27413114]
Bioinform Adv. 2021 Dec 25;2(1):vbab045 [PMID: 35036922]
Front Pharmacol. 2015 Nov 24;6:283 [PMID: 26635612]
Genome Biol. 2020 Nov 9;21(1):274 [PMID: 33168059]
Signal Transduct Target Ther. 2017;2: [PMID: 28435746]
Clin Cancer Res. 2008 Aug 1;14(15):4787-93 [PMID: 18676749]
Database (Oxford). 2015 Oct 08;2015: [PMID: 26450948]
Sign Transduct Insights. 2015;4:1-13 [PMID: 28090171]
PLoS Med. 2016 Dec 27;13(12):e1002201 [PMID: 28027327]
Sci Rep. 2017 Jan 09;7:40321 [PMID: 28067293]
Nature. 2015 May 28;521(7553):436-44 [PMID: 26017442]
Front Genet. 2022 Jan 18;12:805656 [PMID: 35116056]
BMC Med Genomics. 2018 Mar 27;11(1):35 [PMID: 29580235]
JCO Precis Oncol. 2017 Jul;2017: [PMID: 28890946]
J Mol Biol. 2002 Sep 27;322(4):891-901 [PMID: 12270722]
Expert Rev Anticancer Ther. 2008 Oct;8(10):1689-98 [PMID: 18925859]
Hum Mutat. 2001 Apr;17(4):263-70 [PMID: 11295823]
Sci Rep. 2022 Jun 21;12(1):10458 [PMID: 35729312]
Nucleic Acids Res. 2016 Jan 4;44(D1):D862-8 [PMID: 26582918]
Oncotarget. 2017 Jan 31;8(5):8921-8946 [PMID: 27888811]
Mol Carcinog. 2017 Mar;56(3):1000-1009 [PMID: 27597141]
Iran J Pathol. 2016 Spring;11(2):104-11 [PMID: 27499770]
Sci Rep. 2021 May 27;11(1):11114 [PMID: 34045478]
CA Cancer J Clin. 2019 Jan;69(1):7-34 [PMID: 30620402]
Comput Intell Neurosci. 2022 Mar 19;2022:1820777 [PMID: 35345799]
Sci Rep. 2017 Sep 14;7(1):11597 [PMID: 28912487]
Cancer Metastasis Rev. 2013 Jun;32(1-2):25-37 [PMID: 23093327]
J Mol Biol. 2005 Oct 21;353(2):459-73 [PMID: 16169011]
Proc Natl Acad Sci U S A. 2016 Mar 29;113(13):3515-20 [PMID: 26976601]
Nature. 2016 Aug 17;536(7616):285-91 [PMID: 27535533]
Nature. 2017 Oct 19;550(7676):345-353 [PMID: 29019985]
Biomed Res Int. 2021 Apr 14;2021:6667201 [PMID: 33937409]
J Med Genet. 2006 Apr;43(4):295-305 [PMID: 16014699]
Hum Mutat. 2010 Jun;31(6):631-55 [PMID: 20506564]
Chem Rev. 2019 May 8;119(9):5537-5606 [PMID: 30608666]
Genes (Basel). 2021 Jan 23;12(2): [PMID: 33498765]
Eur J Cancer. 2021 Dec;159:1-15 [PMID: 34700215]
Genet Med. 2022 May;24(5):986-998 [PMID: 35101336]
Genet Med. 2015 May;17(5):405-24 [PMID: 25741868]
Hum Mutat. 2019 Sep;40(9):1546-1556 [PMID: 31294896]
Cancer Cell. 2018 Mar 12;33(3):450-462.e10 [PMID: 29533785]
J Community Genet. 2017 Apr;8(2):87-95 [PMID: 28050887]
Med Decis Making. 2017 Feb;37(2):234-242 [PMID: 27491558]
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7 [PMID: 15608251]
Cancer Discov. 2012 May;2(5):401-4 [PMID: 22588877]
Clin Transl Med. 2013 Mar 07;2(1):8 [PMID: 23497644]
Hum Mutat. 2017 Sep;38(9):1092-1108 [PMID: 28508593]
Hum Genet. 2020 Oct;139(10):1197-1207 [PMID: 32596782]
Clin Cancer Res. 2016 Jun 1;22(11):2675-83 [PMID: 26758558]
Hum Mutat. 2021 Mar;42(3):290-299 [PMID: 33326660]
Nat Genet. 2016 Aug;48(8):827-37 [PMID: 27294619]
Genomics Proteomics Bioinformatics. 2018 Feb;16(1):17-32 [PMID: 29522900]
J Mol Diagn. 2017 Jan;19(1):4-23 [PMID: 27993330]
Hum Genet. 2017 Jun;136(6):665-677 [PMID: 28349240]
Cancer Sci. 2018 Mar;109(3):513-522 [PMID: 29345757]
Genome Med. 2022 Feb 25;14(1):21 [PMID: 35209950]
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):22 [PMID: 30704472]
Nucleic Acids Res. 2017 Jan 4;45(D1):D777-D783 [PMID: 27899578]
Hum Mutat. 2008 Mar;29(3):361-6 [PMID: 18175334]
Comput Biol Med. 2022 Jun;145:105434 [PMID: 35364305]
Nature. 2015 Oct 1;526(7571):68-74 [PMID: 26432245]
Contemp Oncol (Pozn). 2015;19(1A):A68-77 [PMID: 25691825]
Am J Hum Genet. 2016 Oct 6;99(4):877-885 [PMID: 27666373]
Bioinformatics. 2009 Nov 1;25(21):2744-50 [PMID: 19734154]
Nat Methods. 2010 Apr;7(4):248-9 [PMID: 20354512]
Nucleic Acids Res. 2017 May 5;45(8):4507-4518 [PMID: 28168276]
J Biomed Biotechnol. 2011;2011:284584 [PMID: 21760703]
Bioinformatics. 2021 Dec 11;37(24):4626-4634 [PMID: 34270679]
Lancet. 2017 Mar 18;389(10074):1134-1150 [PMID: 27865536]
Sci Rep. 2018 Mar 14;8(1):4480 [PMID: 29540703]
Bioinformatics. 2015 May 15;31(10):1536-43 [PMID: 25583119]
Hum Mutat. 2022 Aug;43(8):1012-1030 [PMID: 34859531]
Front Genet. 2020 Nov 10;11:564839 [PMID: 33244318]
Nucleic Acids Res. 2006 Mar 06;34(5):1416-26 [PMID: 16522651]
Nat Methods. 2013 Nov;10(11):1081-2 [PMID: 24037244]
Bioinformatics. 2020 Jun 1;36(12):3637-3644 [PMID: 32282885]
Genome Biol. 2019 Jan 3;20(1):1 [PMID: 30606230]
Pediatr Nephrol. 2014 Jun;29(6):971-7 [PMID: 23720012]
BMC Bioinformatics. 2011;12 Suppl 4:S3 [PMID: 21992054]
Oncologist. 2011;16(4):404-14 [PMID: 21406469]
Brief Bioinform. 2021 Nov 5;22(6): [PMID: 34125166]
Nucleic Acids Res. 2010 Jan;38(Database issue):D670-5 [PMID: 19906700]
Front Genet. 2019 Jan 29;10:13 [PMID: 30761181]
Curr Protoc Hum Genet. 2013 Jan;Chapter 7:Unit7.20 [PMID: 23315928]
Comput Intell Neurosci. 2021 Sep 21;2021:8439655 [PMID: 34603436]
Genome Med. 2022 May 18;14(1):51 [PMID: 35585550]
Lancet Haematol. 2018 Sep;5(9):e391 [PMID: 30172343]
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067 [PMID: 29165669]
Nat Rev Genet. 2013 Feb;14(2):125-38 [PMID: 23329113]
J Med Genet. 2021 Aug;58(8):547-555 [PMID: 32843488]
Nucleic Acids Res. 2018 Jan 4;46(D1):D1039-D1048 [PMID: 29112736]
J Am Med Inform Assoc. 2016 Sep;23(5):1007-15 [PMID: 26911811]

Grants

  1. 12R111/United Arab Emirates University
  2. 12M109/Research Start-up Program
  3. /ASPIRE

MeSH Term

Humans
Female
Breast Neoplasms
Virulence
Databases, Factual
Risk Factors
Machine Learning

Word Cloud

Similar Articles

Cited By