Inferring B cell phylogenies from paired heavy and light chain BCR sequences with Dowser.

Cole G Jensen, Jacob A Sumner, Steven H Kleinstein, Kenneth B Hoehn
Author Information
  1. Cole G Jensen: Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.
  2. Jacob A Sumner: Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.
  3. Steven H Kleinstein: Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA.
  4. Kenneth B Hoehn: Department of Pathology, Yale School of Medicine, New Haven, CT 06520, USA.

Abstract

Antibodies are vital to human immune responses and are composed of genetically variable heavy and light chains. These structures are initially expressed as B cell receptors (BCRs). BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated heavy and light chains, but advancements in single cell sequencing now pair heavy and light chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired heavy and light chain sequences to build phylogenetic trees. We found incorporating light chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree building methods and persisted even when mixing bulk and single cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some light chains were missing, such as when mixing single cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for heavy and light chain gene partitions. Thus, we recommend using maximum likelihood methods with separate heavy and light chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.

References

  1. Front Immunol. 2022 Dec 06;13:1014439 [PMID: 36618367]
  2. Eur J Immunol. 2015 Aug;45(8):2409-19 [PMID: 26036683]
  3. Cell Rep. 2020 Jan 21;30(3):905-913.e6 [PMID: 31968262]
  4. Genome Med. 2015 Nov 20;7:121 [PMID: 26589402]
  5. Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30649-30660 [PMID: 33199596]
  6. Front Immunol. 2013 Nov 15;4:358 [PMID: 24298272]
  7. Elife. 2021 Nov 17;10: [PMID: 34787567]
  8. Bioinformatics. 2018 Jul 1;34(13):i341-i349 [PMID: 29949968]
  9. Cell. 2015 Apr 23;161(3):470-485 [PMID: 25865483]
  10. Evolution. 1985 Jul;39(4):783-791 [PMID: 28561359]
  11. Bioinformatics. 2011 Feb 15;27(4):592-3 [PMID: 21169378]
  12. Bioinformatics. 2017 Dec 15;33(24):3938-3946 [PMID: 28968873]
  13. J Exp Zool B Mol Dev Evol. 2005 Jan 15;304(1):64-74 [PMID: 15593277]
  14. Nat Rev Genet. 2020 Jul;21(7):428-444 [PMID: 32424311]
  15. J Immunol. 2016 Nov 1;197(9):3566-3574 [PMID: 27707999]
  16. JCI Insight. 2021 Jun 22;6(12): [PMID: 34061047]
  17. Nat Commun. 2022 Jan 21;13(1):440 [PMID: 35064122]
  18. PLoS Comput Biol. 2022 Apr 25;18(4):e1009885 [PMID: 35468128]
  19. Mol Biol Evol. 2018 May 1;35(5):1253-1265 [PMID: 29474671]
  20. Trends Immunol. 2019 Nov;40(11):1011-1021 [PMID: 31645299]
  21. Proc Natl Acad Sci U S A. 2019 Nov 5;116(45):22664-22672 [PMID: 31636219]
  22. Front Immunol. 2018 Oct 31;9:2451 [PMID: 30429847]
  23. Nature. 2008 May 29;453(7195):667-71 [PMID: 18449194]
  24. PLoS Comput Biol. 2022 Nov 28;18(11):e1010723 [PMID: 36441808]
  25. J Immunol. 2021 Jun 15;206(12):2785-2790 [PMID: 34049971]
  26. Nature. 2020 Oct;586(7827):127-132 [PMID: 32866963]
  27. J Immunol. 2019 Oct 1;203(7):1687-1692 [PMID: 31484734]
  28. Nat Med. 2015 Jan;21(1):86-91 [PMID: 25501908]
  29. Annu Rev Immunol. 2012;30:429-57 [PMID: 22224772]
  30. Bioinformatics. 2019 Nov 1;35(21):4453-4455 [PMID: 31070718]
  31. Bioinformatics. 2015 Oct 15;31(20):3356-8 [PMID: 26069265]
  32. Nature. 1987 Aug 27-Sep 2;328(6133):805-11 [PMID: 3498121]
  33. J Immunol Methods. 2008 Sep 30;338(1-2):67-74 [PMID: 18706908]
  34. Bioinformatics. 2020 Dec 22;36(20):5007-5013 [PMID: 32619004]
  35. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D256-61 [PMID: 15608191]
  36. Immunity. 2000 Jul;13(1):37-45 [PMID: 10933393]

Grants

  1. K99 AI159302/NIAID NIH HHS
  2. R01 AI104739/NIAID NIH HHS
  3. T15 LM007056/NLM NIH HHS

Word Cloud

Created with Highcharts 10.0.0lightheavycellmethodschainsBCRBchainsequencessequencingsingleseparateusingmixingdataimmuneresponsescellsmutationstreesHoweverphylogeniespairedphylogeneticfoundsignificantlytreebulkbranchlengthsmaximumlikelihoodpartitionsAntibodiesvitalhumancomposedgeneticallyvariablestructuresinitiallyexpressedreceptorsBCRsdiversityshapedsomatichypermutationselectionevolutionaryprocessproducesclonesdescendcommonancestordifferPhylogeneticinferredcanreconstructhistorywithinclonerecentlytechnologiesseparatedadvancementsnowpairindividualuncleargenescombinedinferstudyinvestigatedstrategiesbuildincorporatingimprovedaccuracyreproducibilityacrosstestedimprovementgreaterdifferencebuildingpersistedevenalsomanyestimatedbiasedmissingbiaseliminatedgeneThusrecommendespeciallytypesimplementedRpackageDowser:https://dowserreadthedocsioInferringDowser

Similar Articles

Cited By