Learning Bayesian networks from demographic and health survey data.

Neville Kenneth Kitson, Anthony C Constantinou

Author Information

Neville Kenneth Kitson: Bayesian Artificial Intelligence Research Lab, Risk and Information Management (RIM) Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London E1 4NS, UK; OneWorld UK, CAN Mezzanine, London SE1 4YR, UK. Electronic address: n.k.kitson@qmul.ac.uk.
Anthony C Constantinou: Bayesian Artificial Intelligence Research Lab, Risk and Information Management (RIM) Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London (QMUL), London E1 4NS, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK. Electronic address: a.constantinou@qmul.ac.uk.

PMID: 33217542 DOI: 10.1016/j.jbi.2020.103588

Child mortality from preventable diseases such as pneumonia and diarrhoea in low and middle-income countries remains a serious global challenge. We combine knowledge with available Demographic and Health Survey (DHS) data from India, to construct Causal Bayesian Networks (CBNs) and investigate the factors associated with childhood diarrhoea. We make use of freeware tools to learn the graphical structure of the DHS data with score-based, constraint-based, and hybrid structure learning algorithms. We investigate the effect of missing values, sample size, and knowledge-based constraints on each of the structure learning algorithms and assess their accuracy with multiple scoring functions. Weaknesses in the survey methodology and data available, as well as the variability in the CBNs generated by the different algorithms, mean that it is not possible to learn a definitive CBN from data. However, knowledge-based constraints are found to be useful in reducing the variation in the graphs produced by the different algorithms, and produce graphs which are more reflective of the likely influential relationships in the data. Furthermore, valuable insights are gained into the performance and characteristics of the structure learning algorithms. Two score-based algorithms in particular, TABU and FGES, demonstrate many desirable qualities; (a) with sufficient data, they produce a graph which is similar to the reference graph, (b) they are relatively insensitive to missing values, and (c) behave well with knowledge-based constraints. The results provide a basis for further investigation of the DHS data and for a deeper understanding of the behaviour of the structure learning algorithms when applied to real-world settings.

Directed acyclic graph Graphical models Health informatics Structure learning

Algorithms

Bayes Theorem

Child

Demography

Humans

Knowledge Bases

Sample Size

OpenLB
Open Library of Bioscience

Abstract

Keywords

MeSH Term

Word Cloud

Similar Articles

Cited By

Research & Resources

Featured

Alliance & Collaboration

Conference & Outreach

About

OpenLB Open Library of Bioscience