Learning Bayesian Networks from Correlated Data.

Harold Bae, Stefano Monti, Monty Montano, Martin H Steinberg, Thomas T Perls, Paola Sebastiani
Author Information
  1. Harold Bae: Oregon State University, College of Public Health and Human Sciences, Corvallis, 97331, USA.
  2. Stefano Monti: Boston University, Department of Medicine, Boston, 02118, USA.
  3. Monty Montano: Harvard Medical School, Department of Medicine, Boston, 02115, USA.
  4. Martin H Steinberg: Boston University, Department of Medicine, Boston, 02118, USA.
  5. Thomas T Perls: Boston University, Department of Medicine, Boston, 02118, USA.
  6. Paola Sebastiani: Boston University, Department of Biostatistics, Boston, 02118, USA.

Abstract

Bayesian networks are probabilistic models that represent complex distributions in a modular way and have become very popular in many fields. There are many methods to build Bayesian networks from a random sample of independent and identically distributed observations. However, many observational studies are designed using some form of clustered sampling that introduces correlations between observations within the same cluster and ignoring this correlation typically inflates the rate of false positive associations. We describe a novel parameterization of Bayesian networks that uses random effects to model the correlation within sample units and can be used for structure and parameter learning from correlated data without inflating the Type I error rate. We compare different learning metrics using simulations and illustrate the method in two real examples: an analysis of genetic and non-genetic factors associated with human longevity from a family-based study, and an example of risk factors for complications of sickle cell anemia from a longitudinal study with repeated measures.

References

  1. Nat Genet. 2005 Jul;37(7):710-7 [PMID: 15965475]
  2. Biometrics. 2011 Jun;67(2):495-503 [PMID: 20662831]
  3. Control Clin Trials. 2000 Dec;21(6):552-60 [PMID: 11146149]
  4. Stat Med. 2001 May 15-30;20(9-10):1461-7 [PMID: 11343366]
  5. Blood. 2007 Oct 1;110(7):2727-35 [PMID: 17600133]
  6. Front Genet. 2012 Nov 30;3:277 [PMID: 23226160]
  7. Nat Genet. 2005 Apr;37(4):435-40 [PMID: 15778708]
  8. Biometrics. 2000 Dec;56(4):1016-22 [PMID: 11129456]
  9. Genet Epidemiol. 2014 Apr;38(3):191-7 [PMID: 24464521]
  10. Comput Biol Chem. 2011 Feb;35(1):40-9 [PMID: 21333602]
  11. Am J Epidemiol. 2009 Dec 15;170(12):1555-62 [PMID: 19910380]
  12. J Comput Biol. 2000;7(3-4):601-20 [PMID: 11108481]
  13. Aging (Albany NY). 2011 Jan;3(1):63-76 [PMID: 21258136]
  14. Aging Dis. 2010 Oct;1(2):147-57 [PMID: 22396862]
  15. J Gerontol A Biol Sci Med Sci. 2012 Apr;67(4):395-405 [PMID: 22219514]
  16. Stat Med. 2011 Sep 30;30(22):2754-64 [PMID: 21786277]
  17. Nat Rev Genet. 2010 Apr;11(4):259-72 [PMID: 20212493]
  18. Front Public Health. 2013 Sep 30;1:38 [PMID: 24350207]
  19. Stat Med. 2011 Nov 10;30(25):3050-6 [PMID: 21805487]
  20. Biometrics. 2003 Dec;59(4):762-9 [PMID: 14969453]
  21. Stat Med. 2009 Jan 30;28(2):221-39 [PMID: 19012297]
  22. Stat Methods Med Res. 2014 Feb;23(1):42-59 [PMID: 22523185]

Grants

  1. U19 AG023122/NIA NIH HHS
  2. T32 GM074905/NIGMS NIH HHS
  3. U01 AG023749/NIA NIH HHS
  4. R21 HL114237/NHLBI NIH HHS
  5. U01 AG023755/NIA NIH HHS

MeSH Term

Anemia, Sickle Cell
Bayes Theorem
Humans
Life Expectancy

Word Cloud

Created with Highcharts 10.0.0BayesiannetworksmanyrandomsampleobservationsusingwithincorrelationratelearningfactorsstudyprobabilisticmodelsrepresentcomplexdistributionsmodularwaybecomepopularfieldsmethodsbuildindependentidenticallydistributedHoweverobservationalstudiesdesignedformclusteredsamplingintroducescorrelationsclusterignoringtypicallyinflatesfalsepositiveassociationsdescribenovelparameterizationuseseffectsmodelunitscanusedstructureparametercorrelateddatawithoutinflatingTypeerrorcomparedifferentmetricssimulationsillustratemethodtworealexamples:analysisgeneticnon-geneticassociatedhumanlongevityfamily-basedexampleriskcomplicationssicklecellanemialongitudinalrepeatedmeasuresLearningNetworksCorrelatedData

Similar Articles

Cited By (7)