In use case 1, we demonstrate a case study (Niu, J., Wang, W., Wang, Z. et al. Tagging large CNV blocks in wheat boosts digitalization of germplasm resources by ultra-low-coverage sequencing. Genome Biol 25, 171 (2024). https://doi.org/10.1186/s13059-024-03315-6) using GWH data together with other wheat genome sequence data to conduct a cost-effective genotyping approach that uses large copy number variation blocks (CNVbs) to digitalize and manage wheat germplasm resources with ultra-low-coverage sequencing, providing a platform for variety identification and modern breeding applications. The authors used previously published de novo assembled wheat genome sequences including GWHANRF00000000 to construct a pan-genome reference by iteratively mapping whole-genome resequencing data against 16 assembled reference genomes. This approach allowed them to identify novel genomic regions absent in the Chinese Spring reference genome, which helped construct a more representative pan-genome. These reference sequences were essential for tagging large copy number variation blocks (CNVbs) across wheat accessions, enabling accurate profiling even at low sequencing coverage. Hereâs a detailed step-by-step guide to maximize GWH wheat genome sequence use:
Step 1: Download wheat reference genomes and collect additional genome assemblies like GWHANRF00000000 from GWH and Ensembl Plants. Because wheat genomes from NCBI GenBank/RefSeq also can be downloaded from GWH now.
Figure 1. GWH advanced search interface for searching all user-submitted and integrated wheat genome assemblies.
Figure 2. Search result of all wheat genome assemblies stored in GWH.
Figure 3. Batch download of all genomic sequences of the result wheat genomes.
Step 2: Select high-quality wheat genome to construct a wheat pan-genome with downloaded wheat genomes from GWH and other sources. The construction process includes: (1) Select 'Chinese Spring' as reference genome (GCF_018294505.1), and other genomes as query genomes; (2) Identify genome sequences missing from the reference but present in a query genome as a supplement to the reference genome; (3) This is done by alignment and iteratively identifying new genomic blocks, and finally to construct a wheat pan-genome.
The constructed pan-genome allows the identification of structural variations across different accessions, ensuring that the complete diversity is represented. In the following steps, this study identified and refined copy number variations (CNVs) in wheat by mapping whole-genome resequencing data against a comprehensive pan-genome, which included sequences from GWH and other sources. Using a 100 Kb window, CNVs were detected and refined into genetic markers by reducing noise with Hidden Markov Models (HMMs), grouping linked CNVs, and merging overlapping CNV blocks. These CNV markers were linked to traits like disease resistance and yield by associating markers with beneficial alleles, with validation through traditional methods like PCR. The authors also developed the WheatCNVb platform, an interactive database that visualizes each wheat variety's CNV "fingerprint" in a QR-code-like format. A key feature was the use of ultra-low-coverage sequencing, achieving 99.3% accuracy in recalling markers, making CNV analysis cost-effective and reliable. This approach demonstrates a valuable application of GWH genome sequence data, potentially advancing breeding programs and trait discovery in wheat.
Use Case 2: Identifying Novel Antibiotic Resistance Genes in Clinical Bacterial Isolates Using Genome Warehouse Reannotation
Suppose a research team has isolated several multidrug-resistant Acinetobacter baumannii strains. While initial genome sequencing and basic genome assembly were performed, the research team wants to conduct a more thorough analysis of potential resistance mechanisms.
Step 1: Initial genome data submission
First, the user needs to sequence and assemble the A. baumannii isolates using standard protocols, and submit the genome assemblies to GWH and release the data. The day after release, GWH will perform the basic quality assessment and genome reannotation for the submitted genomes, and released the reannotation results.
Step 2: Obtaining genome reannotation services
The reannotation pipeline will provide: (1) Gene structure predictions; (2) Refined gene start/stop coordinates; (3) Gene functional annotations; (4) Enhanced detection of regulatory elements.
Step 3: Collect genome sequence and annotation files
To perform this step, user can download A. baumannii genome assemblies and reannotations from GWH. A demonstration of the process is shown in the following snapshots.
Sub-step 1. User can use the advanced search page and search the scientific name as âAcinetobacter baumanniiâ and click search (Figure 1).
Figure 1. The advanced search function can be used to search the interested genomes.
Sub-step 2. In the search result page, select the 'Assembly level' as 'Complete genome' and the 'Has reannotation' as 'Yes' of the filter item on the left to obtain the high-quality genome sequences with reannotation.
Figure 2. The search scope can be narrowed down to obtain more accurate search results by using the filtering function on the results page.
Sub-step 3. Set the page size to 200 (because the limited maximum download file number is 200) and click 'select all' to select all items on each page.
Figure 3. Select interested genomes to download required data files.
Sub-step 4. Click the 'Download' button and select 'Selected sequences' to determine download data type (Figure 4). Then you can choose data source ('GWH reannotation' and 'NCBI RefSeq') to download data files, including DNA, GFF, RNA, CDS, and protein.
Figure 4. Click the 'Download' button and determine download data type.
Figure 5. Clearly define the scope of data to be downloaded by selecting the data source and file format.
Step 4: Comparative Analysis
After the interested genomes have finished download, the comparative analysis can be conducted using software like PGAweb (http://pgaweb.vlcc.cn/), which can perform orthologous clustering, pan-genome profiling, sequence variation and evolution analysis, and functional classification. It can help user to understand the dynamics and evolution of A. baumannii genomes. User can focus on regions with significant divergence linked to drug resistance, from which may discover previously unrecognized resistance determinants, may get improved clinical diagnostics and may find new therapeutic targets.
By using GWH, they have got several benefits: (1) Comprehensive data resource: Direct submission and integrated genome data from NCBI GenBank and RefSeq; (2) Reannotation services: Performing genome reannotation for the submitted high quality prokaryotic genome; (3) Standardization: Consistent annotation methodology across all analyzed strains; (4) Quality Control: Integrated quality checks ensure reliable annotations.