Novel Megaptera novaeangliae (Humpback whale) haplotype chromosome-level reference genome.

Maria-Vittoria Carminati, Vlonjat Lonnie Gashi, Ruiqi Li, Daniel Jacob Klee, Sara Rose Padula, Ajay Manish Patel, Andy Dick Yee Tan, Jacqueline Mattos, Nolan Kane
Author Information
  1. Maria-Vittoria Carminati: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA. ORCID
  2. Vlonjat Lonnie Gashi: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.
  3. Ruiqi Li: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.
  4. Daniel Jacob Klee: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA. ORCID
  5. Sara Rose Padula: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.
  6. Ajay Manish Patel: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.
  7. Andy Dick Yee Tan: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA. ORCID
  8. Jacqueline Mattos: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.
  9. Nolan Kane: Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA.


The sequencing of a kidney sample (KW2013002) from a stranded Megaptera novaeangliae (Humpback whale) calf is the first chromosome-level reference genome for this species. The calf, a 457���cm and 2,500 lbs male, was found stranded in Hawai'i Kai, HI, in 2013 and was marked as abandoned/orphaned. In 2023, 1���g of kidney was sequenced with PacBio long-read DNA sequencing, chromatin conformation capture (Hi-C), RNA sequencing, and mitochondrial sequencing to comprehensively characterize the genome and transcriptome of M. novaeangliae. Data validation includes a synteny analysis, mitochondrial annotation, and a comparison of BUSCO scores (scaffold v. reference genome and Balaenoptera musculus (Blue whale) v. M. novaeangliae). BUSCO analysis was performed on an M. novaeangliae scaffold-level assembly to determine genomic completeness of the reference genome, with a scaffold BUSCO score of 91.2% versus a score of 95.4%. Synteny analysis was performed using the B. musculus genome as comparison to determine chromosome-level coverage and structure. Further, a time-based phylogenetic tree was constructed using the sequenced data and publicly available genomes.


  1. Carminati, M.-V. G. et al. Megaptera novaeangliae isolate KW2013002, whole genome shotgun sequencing project. GenBank (2024).
  2. Jackson, J. A. et al. Global diversity and oceanic divergence of humpback whales (Megaptera novaeangliae). Proc. R. Soc. B 281, 1786, (2014). [DOI: 10.1098/rspb.2013.3222]
  3. Roman, J. & McCarthy, J. J. The whale pump: marine mammals enhance primary productivity in a coastal basin. PLoS One 5, 10, (2010). [DOI: 10.1371/journal.pone.0013255]
  4. McGowen, M. R. et al. Phylogenomic Resolution of the Cetacean Tree of Life Using Target Sequence Capture. Syst. Biol. 69, 479���501, (2020). [DOI: 10.1093/sysbio/syz068]
  5. Morin, P. A. et al. Building genomic infrastructure: Sequencing platinum���standard reference���quality genomes of all cetacean species. Marine Mammal Science 36(4), 1356���1366, (2020). [DOI: 10.1111/mms.12721]
  6. GenBank. Megaptera novaeangliae (humpback whale) genome assembly, megNov1, GCA_004329385.1. National Center for Biotechnology Information (NCBI). Available from: (2019).
  7. Tollis, M. et al. Return to the Sea, Get Huge, Beat Cancer: An Analysis of Cetacean Genomes Including an Assembly for the Humpback Whale (Megaptera novaeangliae). Molecular Biology and Evolution 36(8), 1746���1763, (2019). [DOI: 10.1093/molbev/msz099]
  8. Carminati, M.-V. G. et al. Novel Megaptera novaeangliae (Humpback whale) haplotype reference genome [Dataset]. Dryad (2024).
  9. NCBI Sequence Read Archive (2024).
  10. GenBank. Balaenoptera musculus (blue whale) genome assembly, mBalMus1.pri.v3, GCA_009873245.3. National Center for Biotechnology Information (NCBI). Available from: (2020).
  11. Katoh, K., Rozewicki, J. & Yamada, K. D. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Briefings in Bioinformatics 20(4), 1160���1166, (2019). [DOI: 10.1093/bib/bbx108]
  12. Allio, R., Doneg��, S., Galtier, N. & Nabholz, B. Large Variation in the Ratio of Mitochondrial to Nuclear Mutation Rate across Animals: Implications for Genetic Diversity and the Use of Mitochondrial DNA as a Molecular Marker. Molecular Biology and Evolution 34(11), 2762���2772, (2017). [DOI: 10.1093/molbev/msx197]
  13. Cummins, J. Mitochondrial DNA in mammalian reproduction. Reviews of reproduction 3(3), 172���182, (1998). [DOI: 10.1530/ror.0.0030172]
  14. Carminati, M.-V. G. et al. Megaptera novaeangliae voucher NIST KW2013002 mitochondrion, complete genome. (2024).
  15. Genome Assembly mEubGla1.1hap2.+XY (Eubalena glacialis), (2023).
  16. Genome Assembly mBalAcu1.1 (Balaenoptera acutorostrata), (2023).
  17. Genome Assembly mBalRic1.hap2 (Balaenoptera ricei) (2023).
  18. Genome Assembly mTurTru1.mat.Y (Tursiops truncatus) (2020).
  19. Genome Assembly mOrcOrc1.1 (Orcinus orca) (2022).
  20. Genome Assembly mKogBre1haplotype1 (Kogia breviceps) (2022).
  21. Genome Assembly ASM283717v5 (Physeter catodon), (2023).
  22. Genome Assembly Loxafr3.0 (Loxodonta africana, (2009).
  23. Genome Assembly mHipAmp2.hap2 (Hippopotamus amphibius kiboko) (2023).
  24. Eschrichtius robustus Genome sequencing and assembly (2023).
  25. Genome Assembly Oros_1.0 (Odobenus rosmarus divergens), (2013).
  26. Genome Assembly TriManLat1.0 (Trichechus manatus latirostris), (2012).
  27. Genome assembly Neophocaena_asiaeorientalis_V1.1 (Neophocaena asiaeorientalis asiaeorientalis), (2018).
  28. Genome Assembly mGloMel1.1 (Globicephala melas), (2023).
  29. Genome assembly mMesDen1 primary haplotype (Mesoplodon densirostris), (2022).
  30. Justin Chu. JupiterPlot. GitHub repository. (2018).
  31. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772���780, (2013). [DOI: 10.1093/molbev/mst010]
  32. Nguyen, L. T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268���274, (2015). [DOI: 10.1093/molbev/msu300]
  33. Smith, S. A. LSD2: Least-Squares Dating for Estimating Species Divergence Times. Bioinformatics 35(21), 4429���4431 (2019).
  34. Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51, D445���D451, (2023). [DOI: 10.1093/nar/gkac998]

MeSH Term

Humpback Whale
Sequence Analysis, DNA

Word Cloud

Created with Highcharts 10.0.0genomenovaeangliaesequencingreferencewhalechromosome-levelManalysisBUSCOkidneystrandedMegapteraHumpbackcalfsequencedmitochondrialcomparisonscaffoldvmusculusperformeddeterminescoreusingsampleKW2013002firstspecies457���cm2500lbsmalefoundHawai'iKaiHI2013markedabandoned/orphaned20231���gPacBiolong-readDNAchromatinconformationcaptureHi-CRNAcomprehensivelycharacterizetranscriptomeDatavalidationincludessyntenyannotationscoresBalaenopteraBluescaffold-levelassemblygenomiccompleteness912%versus954%SyntenyBcoveragestructuretime-basedphylogenetictreeconstructeddatapubliclyavailablegenomesNovelhaplotype

Similar Articles

Cited By