Sparse haplotype-based fine-scale local ancestry inference at scale reveals recent selection on immune responses.

Yaoling Yang, Richard Durbin, Astrid K N Iversen, Daniel J Lawson
Author Information
  1. Yaoling Yang: Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK. yaoling.yang@bristol.ac.uk. ORCID
  2. Richard Durbin: Department of Genetics, University of Cambridge, Cambridge, UK. ORCID
  3. Astrid K N Iversen: Nuffield Department of Clinical Neurosciences, John Radcliffe Hospital, University of Oxford, Oxford, UK. ORCID
  4. Daniel J Lawson: Department of Statistical Sciences, School of Mathematics, University of Bristol, Bristol, UK. dan.lawson@bristol.ac.uk. ORCID

Abstract

Increasingly efficient methods for inferring the ancestral origin of genome regions are needed to gain insights into genetic function and history as biobanks grow in scale. Here we describe two near-linear time algorithms to learn ancestry harnessing the strengths of a Positional Burrows-Wheeler Transform. SparsePainter is a faster, sparse replacement of previous model-based 'chromosome painting' algorithms to identify recently shared haplotypes, whilst PBWTpaint uses further approximations to obtain lightning-fast estimation optimized for genome-wide relatedness estimation. The computational efficiency gains of these tools for fine-scale local ancestry inference offer the possibility to analyse large-scale genomic datasets using different approaches. Application to the UK Biobank shows that haplotypes better represent ancestries than principal components, whilst linkage-disequilibrium of ancestry identifies signals of recent changes to population-specific selection for many genomic regions associated with immune responses, suggesting avenues for understanding the pathogen-immune system interplay on a historical timescale.

References

Am J Hum Genet. 2013 Aug 8;93(2):278-88 [PMID: 23910464]
Genetics. 2012 Jun;191(2):607-19 [PMID: 22491189]
Mol Biol Evol. 2019 Mar 1;36(3):632-637 [PMID: 30517680]
Nat Commun. 2023 Nov 7;14(1):6802 [PMID: 37935687]
Mol Biol Evol. 2017 Feb 1;34(2):296-317 [PMID: 27756828]
Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012 [PMID: 30445434]
Am J Hum Genet. 2016 Feb 4;98(2):229-42 [PMID: 26805783]
Nat Genet. 2025 Feb;57(2):379-389 [PMID: 39901012]
BMC Immunol. 2016 Oct 11;17(1):38 [PMID: 27729009]
Genetics. 2016 Apr;202(4):1485-501 [PMID: 26857625]
Nature. 2015 Oct 1;526(7571):75-81 [PMID: 26432246]
Trends Microbiol. 2014 Mar;22(3):138-46 [PMID: 24468533]
Science. 2014 Feb 14;343(6172):747-751 [PMID: 24531965]
Nature. 2015 Mar 19;519(7543):309-314 [PMID: 25788095]
Bioinformatics. 2012 May 15;28(10):1359-67 [PMID: 22495753]
Am J Hum Genet. 2015 Jan 8;96(1):37-53 [PMID: 25529636]
Gigascience. 2015 Feb 25;4:7 [PMID: 25722852]
Nature. 2024 Jan;625(7994):321-328 [PMID: 38200296]
Infect Genet Evol. 2003 May;3(1):19-28 [PMID: 12797969]
Bioinformatics. 2021 Aug 25;37(16):2390-2397 [PMID: 33624749]
Hum Biol. 2012 Aug;84(4):343-64 [PMID: 23249312]
Curr Biol. 2023 Nov 6;33(21):4761-4769.e5 [PMID: 37935118]
Nat Rev Genet. 2021 May;22(5):269-283 [PMID: 33408383]
Mol Biol Evol. 2014 Oct;31(10):2824-7 [PMID: 25015648]
Bioinformatics. 2014 May 1;30(9):1266-72 [PMID: 24413527]
Am J Hum Genet. 2023 Feb 2;110(2):326-335 [PMID: 36610402]
Cells. 2020 May 02;9(5): [PMID: 32370106]
Am J Hum Genet. 2021 Oct 7;108(10):1880-1890 [PMID: 34478634]
Genetics. 2003 Dec;165(4):2213-33 [PMID: 14704198]
Nat Commun. 2025 Mar 20;16(1):2742 [PMID: 40113767]
PLoS Genet. 2009 Jun;5(6):e1000519 [PMID: 19543370]
Nat Comput Sci. 2023 Jul;3(7):621-629 [PMID: 37600116]
Genome Res. 2015 Aug;25(8):1215-28 [PMID: 25995268]
Nature. 2018 Oct;562(7726):203-209 [PMID: 30305743]
Genome Res. 2022 Nov-Dec;32(11-12):2057-2067 [PMID: 36316157]
PLoS Genet. 2006 Dec;2(12):e190 [PMID: 17194218]
PLoS Genet. 2012 Jan;8(1):e1002453 [PMID: 22291602]
Nature. 2015 Oct 1;526(7571):68-74 [PMID: 26432245]
Science. 2010 May 7;328(5979):710-722 [PMID: 20448178]
Nat Genet. 2021 Feb;53(2):195-204 [PMID: 33462486]
Genetics. 2019 Jul;212(3):869-889 [PMID: 31123038]
BMC Genomics. 2020 Mar 14;21(1):228 [PMID: 32171239]
Genetics. 2000 Jun;155(2):945-59 [PMID: 10835412]
PLoS Genet. 2009 Oct;5(10):e1000686 [PMID: 19834557]
Front Genet. 2021 May 24;12:639877 [PMID: 34108987]
Genet Epidemiol. 2014 Sep;38(6):502-15 [PMID: 25043967]

Grants

  1. 202108060092/China Scholarship Council (CSC)
  2. OFIL-20-095/Oak Foundation

MeSH Term

Humans
Haplotypes
Algorithms
Linkage Disequilibrium
Selection, Genetic
Genome, Human
Polymorphism, Single Nucleotide
Immunity
Genetics, Population
Genome-Wide Association Study

Word Cloud

Similar Articles

Cited By