An improved de novo genome assembly of the common marmoset genome yields improved contiguity and increased mapping rates of sequence data.

Vasanthan Jayakumar, Hiromi Ishii, Misato Seki, Wakako Kumita, Takashi Inoue, Sumitaka Hase, Kengo Sato, Hideyuki Okano, Erika Sasaki, Yasubumi Sakakibara
Author Information
  1. Vasanthan Jayakumar: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
  2. Hiromi Ishii: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
  3. Misato Seki: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
  4. Wakako Kumita: Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa, 210-0821, Japan.
  5. Takashi Inoue: Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa, 210-0821, Japan.
  6. Sumitaka Hase: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
  7. Kengo Sato: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan.
  8. Hideyuki Okano: Department of Physiology, Keio University School of Medicine, Shinjuku, Tokyo, 160-8582, Japan.
  9. Erika Sasaki: Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kawasaki, Kanagawa, 210-0821, Japan.
  10. Yasubumi Sakakibara: Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, 223-8522, Japan. yasu@bio.keio.ac.jp.

Abstract

BACKGROUND: The common marmoset (Callithrix jacchus) is one of the most studied primate model organisms. However, the marmoset genomes available in the public databases are highly fragmented and filled with sequence gaps, hindering research advances related to marmoset genomics and transcriptomics.
RESULTS: Here we utilize single-molecule, long-read sequence data to improve and update the existing genome assembly and report a near-complete genome of the common marmoset. The assembly is of 2.79 Gb size, with a contig N50 length of 6.37 Mb and a chromosomal scaffold N50 length of 143.91 Mb, representing the most contiguous and high-quality marmoset genome up to date. Approximately 90% of the assembled genome was represented in contigs longer than 1 Mb, with approximately 104-fold improvement in contiguity over the previously published marmoset genome. More than 98% of the gaps from the previously published genomes were filled successfully, which improved the mapping rates of genomic and transcriptomic data on to the assembled genome.
CONCLUSIONS: Altogether the updated, high-quality common marmoset genome assembly provide improvements at various levels over the previous versions of the marmoset genome assemblies. This will allow researchers working on primate genomics to apply the genome more efficiently for their genomic and transcriptomic sequence data.

Keywords

References

  1. Science. 2016 Apr 1;352(6281):aae0344 [PMID: 27034376]
  2. Gigascience. 2018 Feb 1;7(2):1-7 [PMID: 29253147]
  3. Sci Data. 2020 May 8;7(1):139 [PMID: 32385314]
  4. BMC Bioinformatics. 2005 Feb 15;6:31 [PMID: 15713233]
  5. Science. 2018 Jun 8;360(6393): [PMID: 29880660]
  6. Genome Res. 2017 May;27(5):722-736 [PMID: 28298431]
  7. Nat Methods. 2017 Nov;14(11):1072-1074 [PMID: 28945707]
  8. Bioinformatics. 2005 May 1;21(9):1859-75 [PMID: 15728110]
  9. Nat Methods. 2015 Jan;12(1):59-60 [PMID: 25402007]
  10. Nat Genet. 2017 Apr;49(4):643-650 [PMID: 28263316]
  11. G3 (Bethesda). 2018 May 4;8(5):1391-1398 [PMID: 29519939]
  12. Genes Cells. 2010 Sep 1;15(9):959-69 [PMID: 20670273]
  13. Nat Commun. 2019 Jan 16;10(1):260 [PMID: 30651564]
  14. Hortic Res. 2018 Aug 15;5:50 [PMID: 30131865]
  15. Gigascience. 2018 Jun 1;7(6): [PMID: 29893829]
  16. Cell Rep. 2018 Jun 5;23(10):3078-3090 [PMID: 29874592]
  17. Brief Bioinform. 2019 May 21;20(3):866-876 [PMID: 29112696]
  18. Nat Biotechnol. 2011 May 15;29(7):644-52 [PMID: 21572440]
  19. Nature. 2017 Jun 22;546(7659):524-527 [PMID: 28605751]
  20. Nature. 2009 May 28;459(7246):523-7 [PMID: 19478777]
  21. G3 (Bethesda). 2017 Jan 5;7(1):109-117 [PMID: 27852011]
  22. Bioinformatics. 2016 Jul 15;32(14):2103-10 [PMID: 27153593]
  23. BMC Bioinformatics. 2018 Dec 14;19(1):481 [PMID: 30547739]
  24. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25 [PMID: 14534192]
  25. Bioinformatics. 2015 Oct 1;31(19):3210-2 [PMID: 26059717]
  26. Mol Ecol Resour. 2018 Nov;18(6):1188-1195 [PMID: 30035372]
  27. Nucleic Acids Res. 2003 Oct 1;31(19):5654-66 [PMID: 14500829]
  28. Nat Biotechnol. 2019 May;37(5):540-546 [PMID: 30936562]
  29. Stem Cells. 2005 Oct;23(9):1304-13 [PMID: 16109758]
  30. Gigascience. 2017 Oct 1;6(10):1-16 [PMID: 29020750]
  31. Nature. 2018 Nov;563(7732):501-507 [PMID: 30429615]
  32. PeerJ. 2018 Jun 4;6:e4958 [PMID: 29888139]
  33. Nat Methods. 2012 Mar 04;9(4):357-9 [PMID: 22388286]
  34. Curr Protoc Bioinformatics. 2009 Mar;Chapter 4:4.10.1-4.10.14 [PMID: 19274634]
  35. PLoS Comput Biol. 2019 Aug 21;15(8):e1007273 [PMID: 31433799]
  36. Bioinformatics. 2013 Jan 1;29(1):15-21 [PMID: 23104886]
  37. Gigascience. 2018 Aug 1;7(8): [PMID: 30107523]
  38. Nat Genet. 2014 Aug;46(8):850-7 [PMID: 25038751]
  39. Nat Methods. 2020 Feb;17(2):155-158 [PMID: 31819265]
  40. Neuron. 2016 Nov 2;92(3):582-590 [PMID: 27809998]
  41. Nucleic Acids Res. 2015 Jan;43(Database issue):D737-42 [PMID: 25392405]
  42. Semin Fetal Neonatal Med. 2012 Dec;17(6):336-40 [PMID: 22871417]
  43. Sci Rep. 2015 Nov 20;5:16894 [PMID: 26586576]
  44. PLoS One. 2014 Nov 19;9(11):e112963 [PMID: 25409509]
  45. Nat Methods. 2016 Dec;13(12):1050-1054 [PMID: 27749838]
  46. Dev Growth Differ. 2014 Jan;56(1):53-62 [PMID: 24387631]
  47. Genome Biol. 2019 Oct 28;20(1):224 [PMID: 31661016]
  48. Nat Methods. 2013 Jun;10(6):563-9 [PMID: 23644548]
  49. Bioinformatics. 2018 Sep 15;34(18):3094-3100 [PMID: 29750242]
  50. Genomics. 2018 Nov;110(6):399-403 [PMID: 29665418]

Grants

  1. Kakenhi 16H06279 and 18H04127/Japan Society for the Promotion of Science
  2. Innovative areas 221S0002/Ministry of Education, Culture, Sports, Science and Technology
  3. JP19kk0305008/Japan Agency for Medical Research and Development

MeSH Term

Animals
Callithrix
Chromosome Mapping
Computational Biology
Contig Mapping
Genome
Genomics
High-Throughput Nucleotide Sequencing
Sequence Alignment

Word Cloud

Similar Articles

Cited By