A comprehensive evaluation of the potential of three next-generation short-read-based plant pan-genome construction strategies for the identification of novel non-reference sequence.

Meiye Jiang, Meili Chen, Jingyao Zeng, Zhenglin Du, Jingfa Xiao
Author Information
  1. Meiye Jiang: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.
  2. Meili Chen: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.
  3. Jingyao Zeng: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.
  4. Zhenglin Du: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.
  5. Jingfa Xiao: National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.

Abstract

Pan-genome studies are important for understanding plant evolution and guiding the breeding of crops by containing all genomic diversity of a certain species. Three short-read-based strategies for plant pan-genome construction include iterative individual, iteration pooling, and map-to-pan. Their performance is very different under various conditions, while comprehensive evaluations have yet to be conducted nowadays. Here, we evaluate the performance of these three pan-genome construction strategies for plants under different sequencing depths and sample sizes. Also, we indicate the influence of length and repeat content percentage of novel sequences on three pan-genome construction strategies. Besides, we compare the computational resource consumption among the three strategies. Our findings indicate that map-to-pan has the greatest recall but the lowest precision. In contrast, both two iterative strategies have superior precision but lower recall. Factors of sample numbers, novel sequence length, and the percentage of novel sequences' repeat content adversely affect the performance of all three strategies. Increased sequencing depth improves map-to-pan's performance, while not affecting the other two iterative strategies. For computational resource consumption, map-to-pan demands considerably more than the other two iterative strategies. Overall, the iterative strategy, especially the iterative pooling strategy, is optimal when the sequencing depth is less than 20X. Map-to-pan is preferable when the sequencing depth exceeds 20X despite its higher computational resource consumption.

Keywords

References

  1. Bioinformatics. 2012 Dec 1;28(23):3150-2 [PMID: 23060610]
  2. Genome Biol. 2004;5(2):R12 [PMID: 14759262]
  3. New Phytol. 2018 Oct;220(2):360-363 [PMID: 30129229]
  4. Nat Plants. 2021 Jun;7(6):766-773 [PMID: 34017083]
  5. Plant Biotechnol J. 2018 Jul;16(7):1265-1274 [PMID: 29205771]
  6. Nucleic Acids Res. 2017 Jan 25;45(2):597-605 [PMID: 27940610]
  7. Nat Biotechnol. 2014 Oct;32(10):1045-52 [PMID: 25218520]
  8. Bioinformatics. 2012 Feb 15;28(4):593-4 [PMID: 22199392]
  9. Nat Genet. 2019 Jun;51(6):1044-1051 [PMID: 31086351]
  10. Genome Biol. 2019 Nov 14;20(1):238 [PMID: 31727128]
  11. BMC Genomics. 2017 Mar 27;18(1):261 [PMID: 28347275]
  12. Cell. 2020 Jul 9;182(1):162-176.e13 [PMID: 32553274]
  13. Plant Biotechnol J. 2019 Apr;17(4):789-800 [PMID: 30230187]
  14. Bioinformatics. 2012 Feb 1;28(3):416-8 [PMID: 22130594]
  15. Bioinformatics. 2017 Aug 01;33(15):2408-2409 [PMID: 28369371]
  16. Curr Protoc Bioinformatics. 2004 May;Chapter 4:Unit 4.10 [PMID: 18428725]
  17. Curr Opin Plant Biol. 2007 Apr;10(2):149-55 [PMID: 17300983]
  18. Proc Natl Acad Sci U S A. 2016 Aug 30;113(35):E5163-71 [PMID: 27535938]
  19. Mol Plant. 2021 Dec 6;14(12):2032-2055 [PMID: 34384905]
  20. Science. 2021 Aug 6;373(6555):655-662 [PMID: 34353948]
  21. Nucleic Acids Res. 2012 Apr;40(7):e49 [PMID: 22217600]
  22. Gigascience. 2012 Dec 27;1(1):18 [PMID: 23587118]
  23. Bioinformatics. 2015 May 15;31(10):1674-6 [PMID: 25609793]
  24. Nucleic Acids Res. 2007 Jan;35(Database issue):D883-7 [PMID: 17145706]
  25. Front Plant Sci. 2017 Feb 14;8:184 [PMID: 28261241]
  26. Cell. 2021 Jun 24;184(13):3542-3558.e16 [PMID: 34051138]
  27. Bioinformatics. 2015 Nov 15;31(22):3691-3 [PMID: 26198102]
  28. Elife. 2022 Sep 09;11: [PMID: 36083267]
  29. Proc Natl Acad Sci U S A. 2005 Sep 27;102(39):13950-5 [PMID: 16172379]
  30. Nat Commun. 2017 Dec 19;8(1):2184 [PMID: 29259172]
  31. Commun Biol. 2019 Jun 18;2:215 [PMID: 31240253]
  32. Trends Genet. 2020 Feb;36(2):132-145 [PMID: 31882191]
  33. Genome Res. 2003 Sep;13(9):2178-89 [PMID: 12952885]
  34. Cell Res. 2022 Oct;32(10):878-896 [PMID: 35821092]
  35. Nat Genet. 2018 Feb;50(2):278-284 [PMID: 29335547]
  36. Nat Commun. 2017 Jan 24;8:14061 [PMID: 28117401]
  37. Plant J. 2017 Jun;90(5):1007-1013 [PMID: 28231383]
  38. Genet Mol Res. 2013 Aug 16;12(3):2982-9 [PMID: 24065654]
  39. Nat Plants. 2019 Jan;5(1):54-62 [PMID: 30598532]
  40. Bioinformatics. 2014 May 1;30(9):1297-9 [PMID: 24420766]
  41. J Mol Biol. 1990 Oct 5;215(3):403-10 [PMID: 2231712]
  42. Nucleic Acids Res. 2012 Dec;40(22):e172 [PMID: 22904089]
  43. Nat Commun. 2016 Nov 11;7:13390 [PMID: 27834372]

Word Cloud

Created with Highcharts 10.0.0strategiesiterativepan-genomeconstructionthreeplantmap-to-panperformancesequencingnovelcomputationalresourceconsumptiontwodepthshort-read-basedpoolingdifferentcomprehensivesampleindicatelengthrepeatcontentpercentagerecallprecisionsequencestrategy20XevaluationPan-genomestudiesimportantunderstandingevolutionguidingbreedingcropscontaininggenomicdiversitycertainspeciesThreeincludeindividualiterationvariousconditionsevaluationsyetconductednowadaysevaluateplantsdepthssizesAlsoinfluencesequencesBesidescompareamongfindingsgreatestlowestcontrastsuperiorlowerFactorsnumberssequences'adverselyaffectIncreasedimprovesmap-to-pan'saffectingdemandsconsiderablyOverallespeciallyoptimallessMap-to-panpreferableexceedsdespitehigherpotentialnext-generationidentificationnon-referenceshort-readsbased

Similar Articles

Cited By

No available data.