Sorghum Genome

Sorghum is a drought-tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass. It is a genetic model for C4 grasses due to its relatively small genome (approximately 740 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. We collected sorghum related information from the Internet with links to the original web site of these resources. You may find general information and handbooks about sorghum, available genome and transcriptome databases, research institutions around the world, as well as sorghum producers. We hope that these resources will help you in your research.

Sorghum Genome

The homozygous genotype BTx623 of Sorghum bicolor was sequenced in 2009 by Paterson et al. In the following years, new sequence data were added, new assemblies were built, and new annotations were made. The current version (V3.1) which has revealed new functional genes copies and increased accuracy of repetitive regions (McCormick et al., 2018) was used as the reference genome in our database.
Here are some facts about the sorghum reference genome.

  • Genome size: ~732.2Mb
  • Number of chromosomes: 10 (2n=20)
  • Number of protein-coding genes: ~34211

In 2018, Oxford Nanopore sequences generated on a MinION sequencer were combined with Bionano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 of 33.28 Mbps and covers 90% of the expected genome length. A sequence accuracy of 99.85% is obtained after aligning the assembly against Illumina Tx430 data and 99.6% of the 34,211 public gene models align to the assembly.
In 2019, Cooper et al. presented a new reference genome based on an archetypal sweet sorghum line Rio and compared it to the current grain sorghum reference BTx623, revealing a high rate of nonsynonymous and potential loss of function mutations, but few changes were observed in gene content and overall genome structure. The chromosome-level assembly of the Rio genome comprised 729.4 Mb, which was 99.6% the size of the BTx623 genome. The amount of repetitive DNA versus gene content was nearly identical, with 35,467 genes identified in Rio versus 34,129 in BTx623.
On the other hand, dozens of sorghum accessions including grain sorghum and sweet sorghum have been resequenced, such as:

  • 2011 - Zheng et al. 3 sorghum lines
  • 2013 - Mace et al. 44 sorghum lines
  • 2018 - Zhang et al. 241 sorghum lines

We identified the SNPs from these sorghum accessions using a computational pipeline and constructed this database.