Cardiovascular Disease Atlas
Data release 1.0

1. Database overview

Cardiovascular disease (CVD) is the leading cause of death worldwide. The rapid advancement in sequencing technology has revealed key mechanisms and molecular signatures underlying CVD. Consequently, a wealth of CVD-related knowledge and data resources has become available. To gain a more comprehensive understanding of the biological characteristics of CVD, it is necessary to construct integrated and organized multi-omics databases. Although several databases have been developed and contributed significantly to CVD research communities, they still have certain limitations.

Therefore, we developed CardioVascular Disease Atlas (CVD Atlas), a multi-omics data resource for cardiovascular disease, containing 215,333 gene-disease associations of 190 diseases and 44 traits with 35,829 genes and based on 308 curated publications, 652 datasets and integration of prior knowledge from 7 databases (RNADisease, HMDD v4.0, miRNASNP-v3, PharmGKB, CTD ,PedAM and TWAS Atlas). CVD Atlas aim to provide readable and useful information for researchers.

2. Curation criteria

CVD Atlas collected publications studying on CVD and contain the following information.

  • Information about identified variant-disease associations.
  • Information about differentially expressed genes with significant statistics (|log2FC|>1, adjusted p value<0.05), condition pairs, and experiment methods.
  • Information about disease associations such as biomarker, therapeutic target described in articles
  • Information about differentially methylated position with significant statistics (|deltabeta|>0.2, adjusted p value<0.05) and condition pairs.

3. Data analysis

3.1 Dataset collection

CVD Atlas collected datasets by searching in Gene Expression Omnibus (GEO), GWAS Catalog, PRIDE, Metabolomics Workbench.

  • Genomic: GWAS summary statistics were collected from GWAS Catalog.
  • Transcriptomic: Only RNA-seq data and microarray data generated using Affymetrix, Agilent, and Illumina platforms, with clearly described condition groups, were collected.
  • Epigenomic: Only microarray data with clearly described condition groups were collected.
  • Proteomic: Only data with available protein expression matrix and clearly described condition groups were collected.
  • Metabolomic: Only data with available metabolite expression matrix and clearly described condition groups were collected.

3.2 Workflow

3.3 CVD Atlas score

CVD Atlas developed a confidence score system to evaluate the reliability of a specific gene-disease association. In principle, gene-disease associations supported by more evidence should be given higher confidence scores, and associations supported by more evidence types should be given higher confidence scores.

For each gene-disease association, CVD Atlas score was calculated as follows,

\[Score = 1-\prod_{i}^{}(\frac{1}{1+log(x_{i}+1)})\]

where x is the number of publications or datasets supporting evidence for association, and i refers to evidence type including genomic, transcriptomic, epigenomic and chemical.

3.4 Disease enrichment tool

The P value are calculated by Fisher's exact test:

Interested genes Non-interested genes Row total
Disease genes a b a+b
Non-disease genes c d c+d
Column total a+c b+d a+b+c+d

\[P\;value = {(a+b)!(c+d)!(a+c)!(b+d)! \over a!\, b!\, c!\, d!(a+b+c+d)!}\]

4. Database usage

4.1 Browse

4.1.1 Disease browse

Disease browse displayed diseases included in CVD Atlas, and different levels of information whether contained in CVD Atlas were shown.

By clicking specific disease name or CVD Atlas disease ID, details of this disease will be displayed. The disease detail page consists of several module as follows:

4.1.1.1 Basic information

This panel contains two modules: Basic information and Gene prioritization.

a) Basic information displays disease name, CVD Atlas disease ID, synonyms, MeSH ID, Disease ontology ID, and description of this disease.

b) Gene prioritization displays the disease-gene score calculated by CVD Atlas.

4.1.1.2 Genomic

This panel contains three modules: GWAS Catalog significant SNP, GWAS Colocalization and Variant-disease association.

a) GWAS Catalog significant SNP demonstrates significant SNPs collected from GWAS summary statistics of GWAS Catalog. Only SNPs with P value < 5e-8 are considered as significant SNPs, and are displayed in table. There are two plots above the table. Left one demonstrates top risk SNPs of this disease, and right one shows the ancestry composition of datasets related with this disease. Besides, users can change the options above the plots to get interested information.

b) GWAS Colocalization demonstrates colocalization results calculated from GWAS summary statistics and GTEx v8 eQTLs. Combination cutoffs (PP4>=0.75, PP3+PP4>=0.9, and PP4/PP3 >=3) were used as powerful colocalization evidence.

c) Disease-variant association collected associations of disease and variants from publications and prior databases.

4.1.1.3 Transcriptomic

This panel contains two modules: Differentially expressed gene and Disease-gene association.

a) Differentially expressed gene displayed genes that differentially expressed, which were calculated from datasets or curated manually. Only genes with |log2FC|>=1 and adjusted P value < 0.05 were considered as DEGs. There are two plots above the table. Left one demonstrates top DEGs of this disease, and right one shows the tissue composition of datasets related with this disease. Besides, users can change the options above the plots to get interested information.

b) Disease-gene association collected transcriptomic associations of disease and genes from publications and prior databases.

4.1.1.4 Epigenomic

This panel contains two modules: Differentially methylated position and Disease-gene association.

a) Differentially methylated position displayed possitions that differentially methylated, which were calculated from datasets or curated manually. Only genes with |delta beta|>=0.2 and adjusted P value < 0.05 were considered as DMPs.

b) Disease-gene association collected epigenomic associations of disease and genes from publications and prior databases.

4.1.1.5 Metabolomic

This panel contains one modules that named Differential expressed metabolite, which displayed metabolites differentially expressed in this disease, calculated from datasets.

4.1.1.6 Proteomic

This panel contains one modules that named Differential expressed protein, which displayed proteins differentially expressed in this disease, calculated from datasets.

4.1.1.7 Chemical

This panel contains one modules that named Disease-chemical association, which displayed chemicals associated with this disease, integrated from public databases. In CTD tab, there are two plots above the table. Left one demonstrates the number of publications of chemicals associated with this disease in "Inference score" way, and right one shows in "direct evidence" way. Besides, users can change the options above the plots to get interested information.

4.1.1.8 Other information

This panel contains one modules that named Non-human genes associated with disease, which displayed genes associated with this disease in non-human species, curated manually or integrated from public databases.

4.1.2 Trait browse

Trait browse displayed traits included in CVD Atlas

By clicking specific trait name or CVD Atlas trait ID, details of this trait will be displayed. The trait detail page consists of several module as follows:

a) Basic information displays trait name, CVD Atlas disease ID, trait ontology ID, and description of this trait.

b) Gene prioritization displays the trait-gene score calculated by CVD Atlas.

c) GWAS dataset shows GWAS summary statistics related to this trait collected from GWAS Catalog.

d) GWAS Catalog significant SNP demonstrates significant SNPs collected from GWAS summary statistics of GWAS Catalog. Only SNPs with P value < 5e-8 are considered as significant SNPs, and are displayed in table. There are two plots above the table. Left one demonstrates top risk SNPs of this trait, and right one shows the ancestry composition of datasets related with this trait. Besides, users can change the options above the plots to get interested information.

e) GWAS Colocalization demonstrates colocalization results calculated from GWAS summary statistics and GTEx v8 eQTLs. Combination cutoffs (PP4>=0.75, PP3+PP4>=0.9, and PP4/PP3 >=3) were used as powerful colocalization evidence.

4.1.3 Dataset browse

Dataset browse displayed datasets included in CVD Atlas. Studied diseases, tissue or cell type, number of samples, experiment type and omics were shown.

By clicking specific CVD Atlas dataset ID or dataset accession, details of this disease will be displayed. The dataset detail page consists of several module as follows:

4.1.3.1 Genomic

a) Basic information displays different information of this dataset according to omics.

b) Metadata demonstrates the information of samples in this dataset.

c) Significant SNP

d) Colocalization

4.1.3.2 Transcriptomic

a) Basic information displays different information of this dataset according to omics.

b) Metadata demonstrates the information of samples in this dataset.

c) Differential expression gene displays the differentially expressed genes in this dataset.

d) GO enrichment displays GO enrichment analysis results of DEG. By clicking GO ID of interested term, users could get details of this term in Gene Ontology.

e) Gene expression displays expression of genes in this dataset. Users could view expression of different genes by selecting their interests.

f) Co-expression network demonstrates WGCNA analysis results. The first table displayed module-trait correlation and P value. Then a network graph was plotted and users could select interested module and change threshold of weight, and larger point size represents more number of links. The detail of network graph was displayed in the table below. Genes collected in CVD Atlas were clickable in 'FromNode' and 'ToNode' columns. Additionally, by clicking 'Detail' in 'Detail' column, a expression scatter graph of two genes in the same row would appear below the table.

4.1.3.3 Epigenomic

a) Basic information displays different information of this dataset according to omics.

b) Metadata demonstrates the information of samples in this dataset.

c) Differential methylation position

4.1.3.4 Proteomic

a) Basic information displays different information of this dataset according to omics.

b) Differential methylation position

4.1.3.5 Metabolomic

a) Basic information displays different information of this dataset according to omics.

b) Differentially expressed metabolite

4.1.4 Gene browse

Basic information of gene was displayed on gene page includes gene symbol, CVD Atlas gene ID, species, gene ID, gene type, number of diseases and publications collected in CVD Atlas. The ‘Symbol’ or ‘CVDG’ columns provided hyperlink to the detail page of gene, containing information of gene ontology, disease statistic, transcriptomic and epigenetic alteration based on manually curation or dataset analysis, GWAS study, related diseases and drug.

The gene detail page consisted of several module as follows:

4.1.4.1 Basic information

This panel contains two modules: Basic information and Gene prioritization.

a) Basic information displays gene symbol, CVD Atlas ID, gene id, gene biotype, chromosome, start position and end position.

b) Gene prioritization displays the gene-disease score calculated by CVD Atlas.

4.1.4.2 Genomic

This panel contains three modules: GWAS Catalog significant SNP, GWAS Colocalization and Variant-disease association.

a) GWAS Catalog significant SNP demonstrates significant SNPs collected from GWAS summary statistics of GWAS Catalog. Only SNPs with P value < 5e-8 are considered as significant SNPs, and are displayed in table. There are two plots above the table. Left one demonstrates top risk SNPs of this disease, and right one shows the ancestry composition of datasets related with this disease. Besides, users can change the options above the plots to get interested information.

b) GWAS Colocalization demonstrates colocalization results calculated from GWAS summary statistics and GTEx v8 eQTLs. Combination cutoffs (PP4>=0.75, PP3+PP4>=0.9, and PP4/PP3 >=3) were used as powerful colocalization evidence.

c) Disease-variant association collected associations of disease and variants from publications and prior databases.

4.1.4.3 Transcriptomic

This panel contains two modules: Differentially expressed gene, Gene-disease association and GTEx expression level.

a) Differentially expressed gene displayed genes that differentially expressed, which were calculated from datasets or curated manually. Only genes with |log2FC|>=1 and adjusted P value < 0.05 were considered as DEGs. There are two plots above the table. Left one demonstrates top DEGs of this disease, and right one shows the tissue composition of datasets related with this disease. Besides, users can change the options above the plots to get interested information.

b) Gene-disease association collected transcriptomic associations of this gene with diseases from publications and prior databases.

c) GTEx expression level shows expression level of this gene in GTEx.

4.1.4.4 Epigenomic

This panel contains two modules: Differentially methylated position and Gene-disease association.

a) Differentially methylated position displayed possitions that differentially methylated, which were calculated from datasets or curated manually. Only genes with |delta beta|>=0.2 and adjusted P value < 0.05 were considered as DMPs.

b) Gene-disease association collected epigenomic associations of disease and genes from publications and prior databases.

4.1.4.5 Chemical

This panel contains one modules that named Gene-chemical association, which displayed chemicals associated with this gene, integrated from public databases.

4.1.5 SNP browse

SNP browse displayed SNPs included in CVD Atlas, and number of related diseases and PMIDs were shown.

By clicking specific SNP rs ID, details of this disease will be displayed. The SNP detail page consists of several module as follows:

a) Basic information displays dbSNP ID, position, ref allele, alt allele, and nearby gene.

b) GWAS significance demonstrates the significant results (P < 5e-8) of this SNP in different datasets collected from GWAS Catalog.

c) Colocalization analysis shows colocalization results calculated from GWAS summary statistics and GTEx v8 eQTLs. Combination cutoffs (PP4>=0.75, PP3+PP4>=0.9, and PP4/PP3 >=3) were used as powerful colocalization evidence.

d) Variant-disease association collected GWAS study results of this SNP by manually curation or integration from other databases.

e) GTEx eQTL collected GTEx eQTL results of this SNP were collected and displayed in this module. CVD Atlas only retained tissues that related to cardiovascular system.

4.1.6 Association browse

Association browse displayed associations included in CVD Atlas. Users can use the filter tool to get interested information.

4.1.7 Publication browse

Publication browse displayed publications curated by CVD Atlas.

CVD Atlas provides 2 ways to search. One is quick search on the Home page to help users access information directly. The other is advanced search on the Search page, allowing users to search for genes, diseases, SNPs and datasets more specific by their interested terms.

4.3 Knowledge graph

To facilitate a deeper understanding and exploration of the relationships and connections within the data, CVD Atlas developed knowledge graphs that focus on diseases, traits, or genes. By centering on specific entities, these knowledge graphs provide researchers with valuable insights into the complex interplay between various factors and their impact on health and disease. Each type of graph consisted of a filter and a graph.

4.4 Tool

CVD Atlas provides two types of tool: Disease enrichment and Signature comparing.

Disease enrichment was developed to assist users in inputting a gene symbol list, specifying a series of conditions, and then utilizing a Fisher's exact test to calculate the significance p-value based on disease-gene association with high confidence score in CVD Atlas.

Signature comparing was invented to assist users in searching for relevant diseases in CVD Atlas using a list of gene symbols. This is achieved by comparing the overlap between the genes of interest provided by the user and the differentially expressed genes identified in each datasets within CVD Atlas.

5. Contact us

If you have any questions or suggestions, please feel free to contact us via email (qianqiheng2018m@big.ac.cn)