Learning Bayesian networks from demographic and health survey data.

Neville Kenneth Kitson, Anthony C Constantinou
Author Information
  1. Neville Kenneth Kitson: Bayesian Artificial Intelligence Research Lab, Risk and Information Management (RIM) Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London E1 4NS, UK; OneWorld UK, CAN Mezzanine, London SE1 4YR, UK. Electronic address: n.k.kitson@qmul.ac.uk.
  2. Anthony C Constantinou: Bayesian Artificial Intelligence Research Lab, Risk and Information Management (RIM) Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London E1 4NS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK. Electronic address: a.constantinou@qmul.ac.uk.

Abstract

Child mortality from preventable diseases such as pneumonia and diarrhoea in low and middle-income countries remains a serious global challenge. We combine knowledge with available Demographic and Health Survey (DHS) data from India, to construct Causal Bayesian Networks (CBNs) and investigate the factors associated with childhood diarrhoea. We make use of freeware tools to learn the graphical structure of the DHS data with score-based, constraint-based, and hybrid structure learning algorithms. We investigate the effect of missing values, sample size, and knowledge-based constraints on each of the structure learning algorithms and assess their accuracy with multiple scoring functions. Weaknesses in the survey methodology and data available, as well as the variability in the CBNs generated by the different algorithms, mean that it is not possible to learn a definitive CBN from data. However, knowledge-based constraints are found to be useful in reducing the variation in the graphs produced by the different algorithms, and produce graphs which are more reflective of the likely influential relationships in the data. Furthermore, valuable insights are gained into the performance and characteristics of the structure learning algorithms. Two score-based algorithms in particular, TABU and FGES, demonstrate many desirable qualities; (a) with sufficient data, they produce a graph which is similar to the reference graph, (b) they are relatively insensitive to missing values, and (c) behave well with knowledge-based constraints. The results provide a basis for further investigation of the DHS data and for a deeper understanding of the behaviour of the structure learning algorithms when applied to real-world settings.

Keywords

MeSH Term

Algorithms
Bayes Theorem
Child
Demography
Humans
Knowledge Bases
Sample Size

Word Cloud

Created with Highcharts 10.0.0dataalgorithmsstructurelearningDHSknowledge-basedconstraintsgraphdiarrhoeaavailableHealthBayesianCBNsinvestigatelearnscore-basedmissingvaluessurveywelldifferentgraphsproduceChildmortalitypreventablediseasespneumonialowmiddle-incomecountriesremainsseriousglobalchallengecombineknowledgeDemographicSurveyIndiaconstructCausalNetworksfactorsassociatedchildhoodmakeusefreewaretoolsgraphicalconstraint-basedhybrideffectsamplesizeassessaccuracymultiplescoringfunctionsWeaknessesmethodologyvariabilitygeneratedmeanpossibledefinitiveCBNHoweverfoundusefulreducingvariationproducedreflectivelikelyinfluentialrelationshipsFurthermorevaluableinsightsgainedperformancecharacteristicsTwoparticularTABUFGESdemonstratemanydesirablequalitiessufficientsimilarreferencebrelativelyinsensitivecbehaveresultsprovidebasisinvestigationdeeperunderstandingbehaviourappliedreal-worldsettingsLearningnetworksdemographichealthDirectedacyclicGraphicalmodelsinformaticsStructure

Similar Articles

Cited By