Introduction

MethBank is a comprehensive DNA methylation database. It integrates consensus reference methylomes (CRMs), whole genome single-base resolution methylomes (SRMs), DNA & RNA methylation Tools (MeTools) and knowledge of epigenome-wide association studies (EWAS), provides an interactive browser for visualization and develops multiple tools for analysis.

About visualization

To visualize high resolution DNA methylomes, an interactive and user-friendly methylome browser built on JBrowse (http://jbrowse.org; a fast, embeddable genome browser built completely with JavaScript and HTML5) is deployed in MethBank. For each species, the methylome browser includes a variety of data tracks and allows users to choose tracks of interest and to zoom and scroll any region along the genome. In addition, users can change to another species by clicking the name of species on the upper left corner of the JBrowse.

About analysis tool

MethBank-CRM provides a tool to predict methylation age of human, named Age Predictor. Based on large-scale human methylation datasets integrated in MethBank, the age-related CpG sites with linear DNA methylation changes during aging are identified by Spearman correlation (|r| > 0.6). As a result, 52 age-related CpG sites (shown below) are selected in terms of their correlation and further employed with three machine learning models (Random Forest, SVM, and Elastic Net) to predict human DNA methylation age. Technically, the random forest algorithm is implemented by the randomForest (version 4.6-12) R package, where the parameter settings are ntree = 500 and mtry = 17. The SVM algorithm is implemented by the e1071 (version 1.6-7) R package with a radial basis function kernel, where the parameter settings are gamma = 0.0192 and cost = 1. For the elastic net, the glmnet function is used in glmnet (version 2.0-10) R package, where the parameters are optimized by tenfold cross-validation using a grid search and the best performance is obtained when setting alpha = 0.5 and lambda = 0.08. Age Predictor has been integrated into MethBank as an online tool that features straightforward and user-friendly web interfaces and accepts various types of data (raw data, processed data, GEO sample ID) as input.

The input page The output page

MethBank-SRM presents IDMP (Identification of Differentially Methylated Promoter), a tool developed for identifying differentially methylated promoters (DMP) between any two samples. The identification procedure is detailed below. First, a Fisher’s exact test is performed on the condition that the delta methylation levels of the promoters between two samples are greater than a specified threshold. For this test, a contingency table is constructed where the row indicates a particular sample and the column indicates the sum of number of reads that supports a methylated cytosine or an unmethylated cytosine over all the cytosines at this promoter in a given sample. Second, the Benjamini-Hochberg False Discovery Rate (FDR) correction for the p-values of Fisher’s exact test is used. Finally, the promoter methylation of gene associated with DMP is provided. Users can directly download IDMP from the home webpage of MethBank and identify DMPs by providing two genome methylation files (BED format) of interested samples and the gene annotation file (GFF3 format) and setting the parameters (which include cytosine sequence context (C, CG or CH), the relative start position of promoters to TSS, delta methylation level, and p-value, etc).

MethBank-CRM (Consensus Reference Methylome) module

  • 450K data is download from GEO and TCGA.
  • Datasets used in MethBank-CRM from NCBI include GSE73549 and GSE112047 for prostate, GSE90124 for skin, GSE111223, GSE99029 and GSE92767 for saliva, GSE32148, GSE40279, GSE50660, GSE51032, GSE51388, GSE52113, GSE53128, GSE53740, GSE59509, GSE61151, GSE61496, GSE64495, GSE65638, GSE67751, GSE72773, GSE72775, GSE72777, GSE73103, GSE79056, GSE80283, GSE80310, GSE83334, GSE87571, GSE89093 for peripheral blood.
  • Reference genome is hg38.
  • Data processing includes correct probe design bias, remove sample with outlier, remove the batch effects, and so on. The pipeline shows as the figure.
Key steps:
  • Correct probe design bias
  • Remove sample with outlier
  • Remove batch effect
  • Construct reference methylomes
  • Annotation and analysis

MethBank-SRM (Single-base resolution methylome) module

(WGBS data in SRM module is aligned by Bismark software after 2021. But it is solved by WBSA before 2021.)

  • Whole-genome bisulfite sequencing data is download from SRA and GSA.
  • assembly versions were used for all species
    • Hg38 (Homo sapiens)
    • mm10 (Mus musculus)
    • Zv9 (Danio rerio)
    • RGSP-1.0(Oryza sativa)
    • Gmax_275_v2.0 (Glycine max)
    • GCF_000188115.3_SL2.50 (Solanum lycopersicum)
    • Mesculenta_305_v6 (Manihot esculenta)
    • Pvulgaris_218_v1 (Phaseolus vulgaris)
  • The pipeline of data processing Key steps: a) Data filter: remove low quality data and adaptor sequence; b) Build reference index; c) Align solved sequencing data to reference and remove unmapped reads, multiple mapped reads; d) Remove duplicate reads; e) Identify cytosine methylation levers in difference context; f) Annotation and analysis: calculate genome coverage, C coverage, depth and BS-conversion rate; then remove samples with lower depth or lower coversion rate; calculate methylation levels in promoter, gene body and downstream of each gene; identify methylated CpG islands and related genes; identify differentially methylated promoters.

MethBank-EWAS module

With the explosive growth of epigenome-wide association studies (EWAS), a large amount of data and knowledge related to EWAS have been accumulated. Although these data hold great potential for clinical translation, a standardized platform for data archiving, retrieving and exploration is indispensable. For this reason, we updated the existing data resources, EWAS Atlas (Nucleic Acids Res 2019, https://ngdc.cncb.ac.cn/ewas/atlas) and EWAS Data Hub (Nucleic Acids Res 2020, https://ngdc.cncb.ac.cn/ewas/datahub), and proposed EWAS Toolkit (https://ngdc.cncb.ac.cn/ewas/toolkit), an online tool for downstream analysis of EWAS result. Focusing on EWAS research, these three resources provide knowledge, data and tools respectively. More importantly, the quality and functionality for each component are enhanced by cross-linking information from each other. To this end, we present EWAS Open Platform, an integrated one-stop analysis platform for EWAS research.
View more

MethBank-MeTool (DNA & RNA Methylation Tools) module

We created MethBank-MeTool to catalogue and curated analysis tools for DNA and RNA methylation. MethBank-MeTool collects a range of information on each tool and categorizes them according to the platforms, libraries, applications and functions. MethBank-MeTool supports keyword search and provides dynamic update for the citation of all tools.