Chromosome-Length Haplotypes with StrandPhaseR and Strand-seq.

Vincent C T Hanlon, David Porubsky, Peter M Lansdorp
Author Information
  1. Vincent C T Hanlon: Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada. vhanlon@bccrc.ca.
  2. David Porubsky: Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. porubsky@uw.edu.
  3. Peter M Lansdorp: Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada.

Abstract

Dense local haplotypes can now readily be extracted from long-read or droplet-based sequence data. However, these methods struggle to combine subchromosomal haplotype blocks into global chromosome-length haplotypes. Strand-seq is a single cell sequencing technique that uses read orientation to capture sparse global phase information by sequencing only one of two DNA strands for each parental homolog. In combination with dense local haplotypes from other technologies, Strand-seq data can be used to obtain complete chromosome-length phase information. In this chapter, we run the R package StrandPhaseR to phase SNVs using publicly available sequence data for sample HG005 of the Genome in a Bottle project.

Keywords

References

  1. Porubský D, Sanders AD, van Wietmarschen N et al (2016) Direct chromosome-length haplotyping by single-cell sequencing. Genome Res 26:1565–1574. https://doi.org/10.1101/gr.209841.116 [DOI: 10.1101/gr.209841.116]
  2. Falconer E, Hills M, Naumann U et al (2012) DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat Methods 9:1107–1112. https://doi.org/10.1038/nmeth.2206 [DOI: 10.1038/nmeth.2206]
  3. Porubský D (2017) Haplotype resolved genomes: computational challenges and applications. Dissertation, University of Groningen
  4. van Wietmarschen N, Lansdorp PM (2016) Bromodeoxyuridine does not contribute to sister chromatid exchange events in normal or Bloom syndrome cells. Nucleic Acids Res 44:6787–6793. https://doi.org/10.1093/nar/gkw422 [DOI: 10.1093/nar/gkw422]
  5. Porubsky D, Sanders AD, Taudt A et al (2020) breakpointR: an R/Bioconductor package to localize Strand state changes in Strand-seq data. Bioinformatics 36:1260–1261. https://doi.org/10.1093/bioinformatics/btz681 [DOI: 10.1093/bioinformatics/btz681]
  6. Hanlon VCT, Chan DD, Hamadeh Z et al (2022) Construction of Strand-seq libraries in open nanoliter arrays. Cell Rep Methods 2(1):100150 [DOI: 10.1016/j.crmeth.2021.100150]
  7. Porubsky D, Garg S, Sanders AD et al (2017) Dense and accurate whole-chromosome haplotyping of individual genomes. Nat Commun 8. https://doi.org/10.1038/s41467-017-01389-4
  8. Wagner J, Olson ND, Harris L et al (2021) Benchmarking challenging small variants with linked and long reads. bioRxiv. https://doi.org/10.1101/2020.07.24.212712
  9. Ebert P, Audano PA, Zhu Q et al (2021) Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372. https://doi.org/10.1126/science.abf7117
  10. Sanders AD, Falconer E, Hills M et al (2017) Single-cell template Strand sequencing by Strand-seq enables the characterization of individual homologs. Nat Protoc 12:1151–1176. https://doi.org/10.1038/nprot.2017.029 [DOI: 10.1038/nprot.2017.029]
  11. Weisenfeld NI, Kumar V, Shah P et al (2017) Direct determination of diploid genome sequences. Genome Res 27:757–767. https://doi.org/10.1101/gr.214874.116 [DOI: 10.1101/gr.214874.116]
  12. Lin J-H, Chen L-C, Yu S-Q et al (2021) LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. bioRxiv. https://doi.org/10.1101/2020.07.24.212712
  13. Selvaraj S, Dixon RJ, Bansal V et al (2013) Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nat Biotechnol 31:1111–1118. https://doi.org/10.1038/nbt.2728 [DOI: 10.1038/nbt.2728]
  14. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/btp352 [DOI: 10.1093/bioinformatics/btp352]
  15. Li H (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. arxiv.org/abs/1303.3997
  16. Bushnell B. BBTools software package, sourceforge.net/projects/bbmap /
  17. Martin M, Patterson M, Garg S et al (2016) WhatsHap: fast and accurate read-based phasing. bioRxiv. https://doi.org/10.1101/085050
  18. Gros C, Sanders AD, Korbel JO et al (2021) ASHLEYS: automated quality control for single-cell Strand-seq data. Bioinformatics 37:3356–3357. https://doi.org/10.1093/bioinformatics/btab221 [DOI: 10.1093/bioinformatics/btab221]
  19. Kent WJ, Sugnet CW, Furey TS et al (2002) The human genome browser at UCSC. Genome Res 12:996–1006. https://doi.org/10.1101/gr.229102 [DOI: 10.1101/gr.229102]
  20. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal 17(10–12). https://doi.org/10.14806/ej.17.1.200

Grants

  1. PJT-159787/CIHR

MeSH Term

Haplotypes
Sequence Analysis, DNA
Chromosomes
Genome
High-Throughput Nucleotide Sequencing
Polymorphism, Single Nucleotide
Algorithms

Word Cloud

Created with Highcharts 10.0.0Strand-seqhaplotypesdataphaseStrandPhaseRlocalcansequenceglobalchromosome-lengthsequencinginformationGenomeBottleDensenowreadilyextractedlong-readdroplet-basedHowevermethodsstrugglecombinesubchromosomalhaplotypeblockssinglecelltechniqueusesreadorientationcapturesparseonetwoDNAstrandsparentalhomologcombinationdensetechnologiesusedobtaincompletechapterrunRpackageSNVsusingpubliclyavailablesampleHG005projectChromosome-LengthHaplotypesHaplotypePhasing

Similar Articles

Cited By