Documents - MaizeOmics (A comprehensive multidimensional omics resource for maize (<em>Zea mays</em>))

1. Framework

MaizeOmics is an integrated database that provides comprehensive knowledge and analysis tools for maize multi-omics. We generated de novo assemblies for 141 different genomes of different maize accessions, identified approximately 1.253 million large-scale structural variations (SVs), and constructed a maize pan-genome. Additionally, we collected 3,367 maize germplasms and generated around 56 million SNPs and INDELs from them. To support functional and comparative analyses, MaizeOmics also includes transcriptomic datasets from 42 tissue-stage samples of the Mo17 accession and 192 tissue-stage samples of the B73 accession, as well as approximately 35,000 phenotypic records covering around 2,400 maize traits collected from different years and areas of planting. The database is organized into six major modules: Genomics, Transcriptomic, Variants, Epigenomic, Phenomic and Tools. In addition to browsing and retrieving data, MaizeOmics provides a series of online analytical tools, including BLAST search (BLAST), GWAS analysis (easyGWAS), sequence extract ( SeqFetch), enrichment analysis (GO/KEGG, PCR Primers Batch Design (PrimerServer), pan-genome haplotypes analysis (HapView), population genetic HaploBlock analysis (HapSnap), CRISPR Off-Target identifier (CRIOff) and genome browser (JBrowse).

Figure 1 Framework of MaizeOmics

2. Search

The "Search" bar on homepage provides both global search across the entire database and directional search by input type. The following query formats are supported:
   Region: input in the format "chromosome:start-end".
   Genes/Symbol: complete/incomplete gene IDs or gene symbols.
   Variation: input in the format "chromosome:position" type or a variation ID.
   Accession: accession ID, English name, or Chinese name.
   Phenotype: abbreviation or full title in English/Chinese.
   Function: incomplete gene function description.
The Search function provides highly integrated results, enabling users to quickly locate relevant information in the database and navigate to detailed pages for further exploration.

Figure 2 Search

3. Genomics module

3.1. Genome

The Genome module presents genomic information for 141 maize accessions with de novo assembled genomes. For each accession, users can access phylogeny, gene annotation, accession information, de novo assembly statistics and gene details.

Figure 3.1 Accessions

3.2. Accession details

Each accession has an individual page containing detailed information and, where available, a genome view. This page provides gene number statistics, links to JBrowse, and associated phenotypic information. Clicking on the gene number redirects users to the gene search panel.

Figure 3.2 Accession details

3.3. Gene search panel

In MaizeOmics, mRNA annotations are available for all de novo assembled genomes. Users can filter genes by assembly name, gene type, and genomic region, or directly search any interesting gene by gene ID (including published gene IDs, re-annotated gene IDs, or even gene functions). Each gene in the search results links to its corresponding detail page.

Figure 3.3 Gene search

3.4. Gene detail

A dedicated exhibition framework is provided for each gene. This framework shows 7 parts including basic summary, molecular sequences, gene function, syntelog genes, ortholog genes in A.thaliana and O.sativa, genome variation and expression of syntelog gene in B73v4. This framework allows users to quickly navigate to specific information of interest. From the gene detail page, users can further explore associated variations, homologous groups, and expression patterns.

Figure 3.4 Gene details

3.5. PanGenomes

The PanGenomes section displays the distribution of core genes within the syntenic pan-genome and the number of genes assigned to the pan-genome for each accession. For each pan-gene, MaizeOmics provides haplotype distribution, haplotype sequences, multiple alignment results, phylogenetic trees composed of sequences, pan-genes variations, homologous in Arabidopsis and rice, and their corresponding gene functions.

Figure 3.5 PanGenomes

4. Transcriptomic module

4.1. Transcriptome in tissues

The Transcriptomic module contains three gene expression datasets. The first dataset includes expression profiles across 42 tissues from different developmental stages and organs in the Mo17 accession. Users can interactively query this dataset and visualize expression levels for sets of Mo17 genes defined by genomic region, functional category, pan-genome group, or custom lists through box plots, heatmaps, and line charts.

Figure 4.1 Transcriptome in tissues

4.2. Transcriptome in seed-organs

The second dataset includes expression data from 192 seed organs samples for B73. When a gene is queried in this module, a heatmap showing the FPKM values of orthologous genes across tissue stages is displayed. Users can also search using gene function terms through the "Gene Function" filter. Additionally, this transcriptome module direct navigation to the corresponding gene detail page.

Figure 4.2 Transcriptome in seed-organs

4.3. Transcriptome of ScRNA

The third dataset is a single-cell transcriptomic dataset, capturing cellular heterogeneity and gene expression profiles in early maize endosperm tissues across multiple cell types.

Figure 4.3 Transcriptome in single cell RNA-seq

5. Variome module

The Variome module organizes the SNPs and INDELs of the 3,367 maize accessions and SV from 141 maize accessions. The main Variome page contains three parts: B73v4(SNP+INDEL), Mo17v2(SNP+INDEL) and Mo17v2(SV).

5.1 SNP+INDEL (B73v4+Mo17v2)

For the population of 3367 accessions based on the B73v4 reference genome, the database provides summary statistics on population variation information, including variant types, genomic positions, and variant effects. Users can select SNPs within a specified genomic region by defining filter parameters such as genomic interval, frequency, and variant effect. In addition, 16 phenotypic traits are available for haplotype analysis, and selective sweep analysis can be performed between any two of 19 subgroups. A similar interface is also provided for the 141-accession population based on the Mo17v2 reference genome, including comparable variation statistics, filtering options, haplotype analysis, and selective sweep analysis features.

Figure 5.1 SNP statistics

5.2 SV (Mo17v2)

For the dataset of 141 accessions aligned to the Mo17v2 reference genome, MaizeOmics presents summary statistics for structural variations within the population, including variant types (INS/DEL/complex SV) and their genomic positions. Haplotypes defined by structural variations within a specified genomic region can also be visualized.

Figure 5.2 SV statistics

6. Epigenome

The Epigenetic module provides 18 chromatin marks, enabling high-resolution mapping of histone modifications in maize B73 seedlings, immature ears, and embryos. Uers can select different tissues and modification types to visualize epigenetic modification regions and their varying abundances in the genome browser.

Figure 6 Epigenome

7. Phenome module

In the Phenome module, 16 phenotypes are grouped into 3 trait categories. Phenotype records were obtained from Hebei and Hainan during the spring and summer planting seasons of 2020. Users can quickly select phenotypes through interactive filters. Those phenotype records are summarized according to different qualitative tags or quantitative value regions, and accessions with phenotypic value are further grouped by sample subgroup and countries.

Figure 7 Phenome

8. Online tools

8.1 BLAST

Genome, CDS, and protein sequences from 28 accessions are provided for BLAST analysis. Users can submit up to 20 sequences as query. Self-uploading sequences are also accepted as BLAST subject. Advanced parameters are available for customized analyses.

Figure 8.1.1 BLAST submission

BLAST results, including genomic regions and genes, are linked to the corresponding elements in the database. All BLAST output formats are available for visualization and download.

Figure 8.1.2 BLAST result

8.2 SeqFetch

SeqFetch enables users to retrieve genomic sequences, including mRNA, CDS, proteins, and any regions, etc.

Figure 8.2 SeqFetch

8.3 GO Enrichment

Gene Ontology enrichment analysis can be performed using a user-provided list of gene IDs. Users may also select predefined gene sets for the analysis.

Figure 8.3 GO

8.4 KEGG Enrichment

KEGG pathway enrichment analysis is available for user-provided gene ID sets of interest. Users may also select gene sets of interest for the analysis.

Figure 8.4 KEGG

8.5 CRIOff

CRIOff identifies potential off-target sites of CRISPR/Cas-derived RNA-guided endonucleases (RGEN), and predicts off-target activities. It employs Cas-OFFinder to find the offtarget sites and CRISPR-Net to quantify the activity of off-targets containing Indels and Mismatches.

Figure 8.5 CRIOff

8.6 PrimerServer

PrimerServer allows batch design of PCR primers and specificity checking. Users can perform primer design and specificity checking together or separately, and results can be visualized via the tabs at the top.

Figure 8.6 PrimerServer

8.7 easyGWAS

easyGWAS provides three genotype datasets, including SNP data derived from WGS of 3,367 accessions. The MAF and missing rate thresholds can be set for genotype quality control. Users can choose EMMAX, GEMMA, or Plink for GWAS calculation. Additional parameters, such as PCA option and threshold settings, are also available.

Figure 8.7.1 easyGWAS submission

The GWAS result page summarizes all selected parameters and shows the Manhattan plot and QQ-plot, with significant loci highlighted above the chosen threshold. The complete GWAS results and figures are available for download.

Figure 8.7.2 easyGWAS result

8.8 HapSnap (SNP)

HapSnap supports haplotype analysis based on SNP data from 3,367 maize accessions and 16 phenotypic traits. Firstly, users can define the target region either by “chromosome:start-end” or by “gene ID” + “up/down-stream length”. Secondly, the variation types can be freely combined to construct haplotypes. Thirdly, filters for missing rate, minor allele frequency and heterozygous rate are also available. After haplotypes are defined, the haplotype frequency, haplotype vs. genotype, linkage disequilibrium will be calculated.

Figure 8.8 HapSnap

8.9 HapView (pan-gene)

HapView supports haplotype analysis based on pan-gene data across multiple accessions. Users first select a target assembly as the reference background, and then provide genes of interest either by direct input gene IDs or uploading a TXT file. Next, the system automatically groups sequences into haplotypes for each gene across different accessions. Haplotype distributions are visualized as heatmaps, and detailed information, including haplotype type and sequence, is presented in a result table. Users may also navigate to the corresponding gene or syntenic group (SG) pages for further exploration.

Figure 8.9 HapView

8.10 JBrowse

The genome browser is implemented using JBrowse, with Mo17 as the reference genome. Multiple tracks are available for visualization, including gene annotation, GC content, domestication region, QTLs, variations and other omics such as methylation.

Figure 8.10 JBrowse

9. Download

The Download section provides access to datasets for 141 maize accessions, including 90 newly assembled genomes and 51 previously published genomes. Available data contains genome assemblies, mRNA, CDS, and protein sequences, as well as genomic annotation files.

Figure 9 Download