Haplotype-resolved assembly of auto-polyploid genomes via combining Hi-C and gametic data.

Xiaohui Zhang, Dongxi Li, Weihua Pan
Author Information
  1. Xiaohui Zhang: College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, 030024, Shanxi, China.
  2. Dongxi Li: College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan, 030024, Shanxi, China. dxli0426@126.com.
  3. Weihua Pan: Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China. panweihua@caas.cn.

Abstract

Haplotype-resolved genome assembly plays a crucial role in understanding allele-specific functions. However, obtaining haplotype-resolved assembly for auto-polyploid genomes remains challenging. Existing methods can be classified into reference-based phasing, assembly-based phasing, and gamete binning. Nevertheless, there is a lack of cost-effective and efficient methods for haplotyping auto-polyploid genomes. In this study, we propose a novel phasing algorithm called PolyGH, which combines Hi-C and gametic data. We conducted experiments on tetraploid potato cultivars and divided the method into three steps. Firstly, gametic data was utilized to bin non-collapsed contigs, followed by merging adjacent fragments of the same type within the same contig. Secondly, accurate Hi-C signals related to differential genomic regions were acquired using unique k-mers. Finally, collapsed fragments were assigned to haplotigs based on combined Hi-C and gametic signals. Comparing PolyGH with Hi-C-based and gametic data-based methods, we found that PolyGH exhibited superior performance in haplotyping auto-polyploid genomes when integrating both data types. This approach has the potential to enhance haplotype-resolved assembly for auto-polyploid genomes.

Keywords

References

Nat Plants. 2019 Aug;5(8):833-845 [PMID: 31383970]
J Comput Biol. 2016 Sep;23(9):718-36 [PMID: 27280382]
Nat Genet. 2016 Jul;48(7):817-20 [PMID: 27270105]
J Comput Biol. 2015 Jun;22(6):498-509 [PMID: 25658651]
Nat Genet. 2022 Mar;54(3):342-348 [PMID: 35241824]
Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
Bioinformatics. 2023 Jan 1;39(1): [PMID: 36525368]
Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]
Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
Hortic Res. 2022 Dec 29;10(1):uhac288 [PMID: 37077372]
Genome Res. 2017 May;27(5):801-812 [PMID: 27940952]
Nat Genet. 2018 Nov;50(11):1565-1573 [PMID: 30297971]
Science. 2017 Apr 7;356(6333):92-95 [PMID: 28336562]
Genome Biol. 2020 Dec 29;21(1):306 [PMID: 33372615]
Nat Methods. 2021 Feb;18(2):170-175 [PMID: 33526886]
PLoS Comput Biol. 2019 Aug 21;15(8):e1007273 [PMID: 31433799]

Grants

  1. Grant No. 32100501/National Natural Science Foundation of China, Shenzhen Science and Technology Program
  2. Grant No. RCBS20210609103819020/National Natural Science Foundation of China, Shenzhen Science and Technology Program

MeSH Term

Humans
Sequence Analysis, DNA
Haplotypes
Polyploidy
Alleles
Germ Cells

Word Cloud

Similar Articles

Cited By