MalKinID: A classification model for identifying malaria parasite genealogical relationships using identity-by-descent.
Wesley Wong, Lea Wang, Stephen F Schaffner, Xue Li, Ian Cheeseman, Timothy J C Anderson, Ashley Vaughan, Michael Ferdig, Sarah K Volkman, Daniel L Hartl, Dyann F Wirth
Author Information
Wesley Wong: Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA. ORCID
Lea Wang: Harvard College, Harvard University, Cambridge, MA 02138, USA.
Stephen F Schaffner: Infectious Disease and Microbiome Program, Broad Institute, Cambridge, MA 02142, USA. ORCID
Xue Li: Program in Disease Intervention and Prevention, Texas Biomedical Research Institute, San Antonio, TX 78227, USA.
Ian Cheeseman: Program in Host Pathogen Interactions, Texas Biomedical Research Institute, San Antonio, TX 78227, USA.
Timothy J C Anderson: Program in Disease Intervention and Prevention, Texas Biomedical Research Institute, San Antonio, TX 78227, USA.
Ashley Vaughan: Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, WA 98105, USA.
Michael Ferdig: Department of Biological Sciences, Eck Institute for Global Health, University of Notre Dame, Notre Dame, IN 46556, USA.
Sarah K Volkman: Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA.
Daniel L Hartl: Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA.
Dyann F Wirth: Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA. ORCID
Pathogen genomics is a powerful tool for tracking infectious disease transmission. In malaria, identity-by-descent is used to assess the genetic relatedness between parasites and has been used to study transmission and importation. In theory, identity-by-descent can be used to distinguish genealogical relationships to reconstruct transmission history or identify parasites for QTL experiments. MalKinID (Malaria Kinship Identifier) is a new classification model designed to identify genealogical relationships among malaria parasites based on genome-wide identity-by-descent proportions and identity-by-descent segment distributions. MalKinID was calibrated to the genomic data from 3 laboratory-based genetic crosses (yielding 440 parent-child and 9060 full-sibling comparisons). MalKinID identified lab-generated F1 progeny with >80% sensitivity and showed that 0.39 (95% CI 0.28, 0.49) of the second-generation progeny of a NF54 and NHP4026 cross were F1s and 0.56 (0.45, 0.67) were backcrosses of an F1 with the parental NF54 strain. In simulated outcrossed importations, MalKinID reconstructs genealogy history with high precision and sensitivity, with F1-scores exceeding 0.84. However, when importation involves inbreeding, such as during serial co-transmission, the precision and sensitivity of MalKinID declined, with F1-scores (the harmonic mean of precision and sensitivity) of 0.76 (0.56, 0.92) and 0.23 (0.0, 0.4) for parent-child and full-sibling and <0.05 for second-degree and third-degree relatives. Disentangling inbred relationships required adapting MalKinID to perform multisample comparisons. Genealogical inference is most powered when (1) outcrossing is the norm or (2) multisample comparisons based on a predefined pedigree are used. MalKinID lays the foundations for using identity-by-descent to track parasite transmission history and for separating progeny for quantitative-trait-locus experiments.