Polymorphic short tandem repeats make widespread contributions to blood and serum traits.

Jonathan Margoliash, Shai Fuchs, Yang Li, Xuan Zhang, Arya Massarat, Alon Goren, Melissa Gymrek
Author Information
  1. Jonathan Margoliash: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA.
  2. Shai Fuchs: Pediatric Endocrine and Diabetes Unit, Edmond and Lily Safra Children's Hospital, Sheba Medical Center, Ramat Gan, Israel.
  3. Yang Li: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
  4. Xuan Zhang: Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA.
  5. Arya Massarat: Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA.
  6. Alon Goren: Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA. Electronic address: agoren@ucsd.edu.
  7. Melissa Gymrek: Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA. Electronic address: mgymrek@ucsd.edu.

Abstract

Short tandem repeats (STRs) are genomic regions consisting of repeated sequences of 1-6 bp in succession. Single-nucleotide polymorphism (SNP)-based genome-wide association studies (GWASs) do not fully capture STR effects. To study these effects, we imputed 445,720 STRs into genotype arrays from 408,153 White British UK Biobank participants and tested for association with 44 blood phenotypes. Using two fine-mapping methods, we identify 119 candidate causal STR-trait associations and estimate that STRs account for 5.2%-7.6% of causal variants identifiable from GWASs for these traits. These are among the strongest associations for multiple phenotypes, including a coding CTG repeat associated with apolipoprotein B levels, a promoter CGG repeat with platelet traits, and an intronic poly(A) repeat with mean platelet volume. Our study suggests that STRs make widespread contributions to complex traits, provides stringently selected candidate causal STRs, and demonstrates the need to consider a more complete view of genetic variation in GWASs.

Keywords

Associated Data

Dryad | 10.5061/dryad.z612jm6jk

References

  1. Nat Struct Biol. 2003 Jan;10(1):33-7 [PMID: 12447348]
  2. Behav Genet. 2009 Sep;39(5):580-95 [PMID: 19526352]
  3. Science. 2021 Sep 24;373(6562):1499-1505 [PMID: 34554798]
  4. Nat Cell Biol. 2007 May;9(5):556-64 [PMID: 17417629]
  5. Eur J Hum Genet. 2016 Aug;24(8):1188-94 [PMID: 26733287]
  6. Bioinformatics. 2016 May 15;32(10):1493-501 [PMID: 26773131]
  7. Nat Genet. 2016 Apr;48(4):359-66 [PMID: 26901066]
  8. Am J Hum Genet. 2017 Jul 6;101(1):5-22 [PMID: 28686856]
  9. Nat Genet. 2023 Jan;55(1):112-122 [PMID: 36510025]
  10. Science. 2009 May 29;324(5931):1213-6 [PMID: 19478187]
  11. Br J Haematol. 2015 Sep;170(5):626-39 [PMID: 25944497]
  12. Nat Genet. 2016 Oct;48(10):1279-83 [PMID: 27548312]
  13. Nature. 1994 Jun 16;369(6481):568-71 [PMID: 8202159]
  14. Nucleic Acids Res. 1989 May 25;17(10):4003 [PMID: 2567503]
  15. Nature. 2007 Jun 21;447(7147):932-40 [PMID: 17581576]
  16. Cell. 2022 Sep 1;185(18):3426-3440.e19 [PMID: 36055201]
  17. Front Endocrinol (Lausanne). 2017 Sep 12;8:234 [PMID: 28955303]
  18. Psychol Methods. 2012 Sep;17(3):399-417 [PMID: 22563845]
  19. Exp Mol Med. 2021 Jan;53(1):125-135 [PMID: 33473144]
  20. Nucleic Acids Res. 2019 Sep 5;47(15):e90 [PMID: 31194863]
  21. Science. 1999 Nov 26;286(5445):1735-8 [PMID: 10576740]
  22. Am J Hum Genet. 2018 Sep 6;103(3):338-348 [PMID: 30100085]
  23. Mol Cell Biol. 2001 Feb;21(3):854-64 [PMID: 11154272]
  24. Nat Genet. 2015 Sep;47(9):1073-8 [PMID: 26214589]
  25. Gigascience. 2015 Feb 25;4:7 [PMID: 25722852]
  26. Genome Biol. 2020 Aug 21;21(1):209 [PMID: 32819438]
  27. Am J Hum Genet. 2017 Nov 2;101(5):700-715 [PMID: 29100084]
  28. Curr Genet. 2018 Aug;64(4):789-794 [PMID: 29327083]
  29. Nat Genet. 2016 Jul;48(7):817-20 [PMID: 27270105]
  30. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D493-6 [PMID: 14681465]
  31. Nature. 2022 Jul;607(7920):732-740 [PMID: 35859178]
  32. Science. 2020 Sep 11;369(6509):1318-1330 [PMID: 32913098]
  33. Nature. 2016 Sep 8;537(7619):239-243 [PMID: 27525555]
  34. Proc Natl Acad Sci U S A. 2010 Jan 19;107(3):961-8 [PMID: 20080596]
  35. Nat Methods. 2020 Mar;17(3):261-272 [PMID: 32015543]
  36. Bioinformatics. 2021 May 5;37(5):731-733 [PMID: 32805020]
  37. Nat Genet. 2019 Nov;51(11):1652-1659 [PMID: 31676866]
  38. Trends Genet. 2014 Nov;30(11):504-12 [PMID: 25182195]
  39. PLoS One. 2014 Jan 08;9(1):e85150 [PMID: 24416353]
  40. Proc Natl Acad Sci U S A. 2004 Mar 9;101(10):3504-9 [PMID: 14993601]
  41. Genet Epidemiol. 2011 Feb;35(2):102-10 [PMID: 21254217]
  42. Hum Mol Genet. 1992 Sep;1(6):397-400 [PMID: 1301913]
  43. Nature. 2012 Sep 6;489(7414):57-74 [PMID: 22955616]
  44. Educ Psychol Meas. 2015 Oct;75(5):785-804 [PMID: 29795841]
  45. Nat Protoc. 2012 Feb 16;7(3):500-7 [PMID: 22343431]
  46. Am J Hum Genet. 2018 Dec 6;103(6):858-873 [PMID: 30503517]
  47. Nature. 2016 Feb 11;530(7589):177-83 [PMID: 26814963]
  48. Nat Commun. 2018 Oct 23;9(1):4397 [PMID: 30353011]
  49. Nat Biotechnol. 2019 May;37(5):555-560 [PMID: 30858580]
  50. Nature. 2013 Sep 26;501(7468):506-11 [PMID: 24037378]
  51. Genome Res. 2002 Jun;12(6):996-1006 [PMID: 12045153]
  52. Am J Hum Genet. 2020 Oct 1;107(4):654-669 [PMID: 32937144]
  53. Nature. 2018 Oct;562(7726):203-209 [PMID: 30305743]
  54. J Cell Mol Med. 2020 Nov;24(21):12491-12503 [PMID: 32954656]
  55. Nat Genet. 2012 Oct;44(10):1161-5 [PMID: 22922873]
  56. Proc Natl Acad Sci U S A. 2001 Jul 31;98(16):8985-90 [PMID: 11447254]
  57. PLoS Genet. 2006 Dec;2(12):e190 [PMID: 17194218]
  58. Nature. 2015 Oct 1;526(7571):68-74 [PMID: 26432245]
  59. Endocr Rev. 2022 Jul 13;43(4):611-653 [PMID: 34676866]
  60. Nat Biotechnol. 2011 Jan;29(1):24-6 [PMID: 21221095]
  61. Genome Res. 2023 Feb;33(2):184-196 [PMID: 36577521]
  62. Nucleic Acids Res. 2019 Jan 8;47(D1):D766-D773 [PMID: 30357393]
  63. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D590-8 [PMID: 16381938]
  64. J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300 [PMID: 37220626]
  65. Genome Biol. 2018 Aug 21;19(1):121 [PMID: 30129428]
  66. Nat Genet. 2006 Aug;38(8):904-9 [PMID: 16862161]
  67. Nat Rev Genet. 2004 Jun;5(6):435-45 [PMID: 15153996]
  68. PLoS Med. 2015 Mar 31;12(3):e1001779 [PMID: 25826379]
  69. Blood. 2010 Feb 11;115(6):1254-63 [PMID: 19880496]
  70. Mol Cell. 2012 Feb 24;45(4):459-69 [PMID: 22264826]
  71. Nat Methods. 2017 Jun;14(6):590-592 [PMID: 28436466]
  72. EMBO J. 2005 Jun 1;24(11):1988-98 [PMID: 15889141]
  73. Curr Protoc Hum Genet. 2015 Oct 06;87:11.16.1-11.16.14 [PMID: 26439713]
  74. Nat Rev Mol Cell Biol. 2021 Sep;22(9):589-607 [PMID: 34140671]
  75. Elife. 2019 Nov 20;8: [PMID: 31746734]
  76. Nat Genet. 2017 May;49(5):692-699 [PMID: 28369037]
  77. Lipids Health Dis. 2017 Sep 6;16(1):166 [PMID: 28874158]
  78. Nat Commun. 2015 Sep 14;6:8111 [PMID: 26368830]
  79. Nucleic Acids Res. 2016 May 5;44(8):3750-62 [PMID: 27060133]
  80. Genet Epidemiol. 2013 Feb;37(2):136-41 [PMID: 22996348]
  81. Bioinformatics. 2010 Mar 15;26(6):841-2 [PMID: 20110278]
  82. Trends Genet. 2010 Feb;26(2):59-65 [PMID: 20036436]
  83. Nat Rev Mol Cell Biol. 2020 Mar;21(3):167-178 [PMID: 32005969]
  84. Am J Physiol Regul Integr Comp Physiol. 2002 Aug;283(2):R496-504 [PMID: 12121863]
  85. Bioinformatics. 2019 Nov 1;35(22):4754-4756 [PMID: 31134279]

Grants

  1. DP5 OD024577/NIH HHS
  2. R01 HG010885/NHGRI NIH HHS
  3. RM1 HG011558/NHGRI NIH HHS

Word Cloud

Created with Highcharts 10.0.0traitsSTRstandemrepeatsGWASsbloodcausalrepeatplateletcomplexassociationeffectsstudyphenotypesfine-mappingcandidateassociationsvariantsmakewidespreadcontributionsshortShortgenomicregionsconsistingrepeatedsequences1-6 bpsuccessionSingle-nucleotidepolymorphismSNP-basedgenome-widestudiesfullycaptureSTRimputed445720genotypearrays408153WhiteBritishUKBiobankparticipantstested44Usingtwomethodsidentify119STR-traitestimateaccount52%-76%identifiableamongstrongestmultipleincludingcodingCTGassociatedapolipoproteinBlevelspromoterCGGintronicpolymeanvolumesuggestsprovidesstringentlyselecteddemonstratesneedconsidercompleteviewgeneticvariationPolymorphicserumGWASmicrosatellites

Similar Articles

Cited By (20)