Documentation - TWAS Atlas

Database Overview

Transcriptome-Wide Association Study ( TWAS ) establishes a direct association between genes and traits, identifying signals related to traits at the transcriptome level. By utilizing large sample sizes, TWAS enables a direct connection between traits and gene expression, providing clearer biological insights than genomic variants alone. This approach enhances our understanding of the complex mechanisms and regulatory processes involved in various diseases and traits. In recent years, hundreds of TWAS studies have been conducted, offering valuable resources for researchers exploring complex traits and diseases. There is a pressing need for a comprehensive resource that consolidates TWAS publications. Significant efforts have been made to create a TWAS database that integrates related datasets.

Currently, there is a lack of curated databases for published Transcriptome-Wide Association Studies (TWAS) and their associated findings. This gap is especially evident given the rapidly increasing number of studies, which incorporate a diverse range of data resources and improved methodologies tailored to specific datasets. Our goal is to develop TWAS Atlas, a comprehensive knowledgebase that systematically integrates published TWAS findings, offering interactive visualization and analysis modules. TWAS Atlas will manually collect high-quality gene-trait associations from extensive publications. Additionally, we will conduct further TWAS analyses to expand phenotypic coverage. We aim to enhance the readability and usability of TWAS results by consolidating findings from researchers worldwide, elucidating the complex genetic underpinnings, and identifying new targets for therapeutic intervene.

TWAS has proven to be valuable in identifying genes that are significantly associated with traits of interest. However, to translate these association signals into functional or causal units, complementary analytical approaches are often used in conjunction with TWAS. To facilitate this, a series of standardized analyses relevant to TWAS has been integrated into TWAS Atlas 2.0, utilizing publicly available GWAS datasets. TWAS Atlas 2.0 showcases the results of enrichment analyses for TWAS candidate genes, as well as outputs from Summary-data-based Mendelian Randomization (SMR), colocalization, and fine-mapping. Collectively, these complementary approaches enhance interpretability, minimize false positives, and offer deeper biological insights into the molecular basis of complex traits.

Data Curation

1. Inclusion Criteria

The publications included in the TWAS Atlas should focus on human traits and provide the following information:

(1) Computational Methods/Software Used for TWAS: This includes classic TWAS methods and their corresponding transformations, such as PrediXcan, MultiXcan, FUSION, UTMOST, JTI, Summary-PrediXcan (S-PrediXcan), and Summary-MultiXcan (S-MultiXcan). Additionally, it should cover other methods for calculating gene-trait associations, like linear regression models and random forests.

(2) Gene-Trait Associations: This section should report p-values or effect sizes to indicate the significance and magnitude of gene expression impacts on traits. Only relationships that are significant in the publication—defined as having a p-value less than 1E-4 or a q-value less than 0.05—will be included in the TWAS Atlas. For studies that do not report p-values or only provide associations corresponding to a cutoff, all relevant data will still be recorded in the Atlas.

(3) Tissue Information: This should include details about single-tissue and cross-tissue analyses, highlighting the tissue-specific characteristics of associations within TWAS.

2. Curation Process

(1) Literature search and filter: We conduct literature search in PubMed using pre-defined keywords, and the publication is included in the TWAS Atlas only if it contains a necessary description of the involved features and significant gene-trait associations.

(2) Study curation: We manually curate the study information for each qualified publication, including reported trait, tissue type, method/software, and ancestry of population.

(3) Association collection: We further collect significant gene-trait associations (at a significant level in the publication or with a p-value less than 1E-4 or q-value less than 0.05), including the significance and size of the effect of gene expression on the trait. Furthermore, we integrate genome-level regulatory information about genes ( eQTLs ) in 49 human tissues from GTEx to provide more comprehensive regulatory information for traits. Detailed gene and SNP information is reannotated based on Ensembl version 114 and the Gene database (GRCh38.p14).

Data Analysis Workflow

TWAS Atlas performs transcriptome-wide association study analysis based on the GTEx reference eQTL panel and summary statistics data from 171 GWAS summary statistics (114 from the UK Biobank and 57 from the GWAS Catalog) by Summary-PrediXcan and Summary-MultiXcan software.

1. Reference eQTL data collection

The eQTL prediction weights were obtained from PredictDB and derived using elastic net models trained on GTEx v8 data.

2. GWAS data collection

(1) We filtered GWAS studies that targeted European populations since the collected prediction weights were trained on these populations.

(2) Projects must provide comprehensive GWAS summary statistics, including the genetic variant (identified by dbSNP rsID or genomic coordinates), effect size or odds ratio indicating the strength and direction of the association, and the p-value indicating its statistical significance.

3. Data preprocessing and standardization for GWAS summary statistics data

(1) Standardize the genomic coordinates utilizing GRCh38 reference.

(2) Calculate z-scores from p-values and beta coefficients.

(3) Perform TWAS analysis using Summary-PrediXcan for tissue specificity analysis and Summary-MultiXcan for cross-tissue analysis.

Data Structure

Data Statistics

Number of Publications

Distribution of Trait Types

Distribution of Gene Types

Top 6 Softwares

Trait Ontology

To create a unified classification system for trait names, definitions, and categories, we have established a trait ontology system. The traits reported in the collected publications are mapped to specific entities to build this ontology. The traits in the atlas are displayed and organized according to these mapped traits, which enhances comparability. The Experimental Factor Ontology (EFO) is hosted and described within this context. Each trait stored in the TWAS Atlas includes information such as the trait name, ontology ID, description, synonyms, and terms mapped from other databases and ontologies. Detailed descriptions of the traits are sourced from the EFO and other ontologies, such as the NCIT. Additionally, we have categorized all the traits in the atlas into four main categories—disease, measurement, phenotypic abnormality, and others—as well as 46 subcategories. This classification helps users locate and understand the traits of interest more easily.

Knowledge Graph

1. Introduction

TWAS Atlas features an interactive knowledge graph to visualize and download association maps among variants, genes, and traits.

2. Implementation

The knowledge map for each available trait includes two types of associations:

(1) Gene-trait associations: These are curated from all relevant publications reporting on the trait. Each line connecting a gene to a trait conveys three pieces of information: - The color of the line indicates the tissue type. - The size of the line reflects the significance magnitude of the association. - Dotted lines represent negative correlations, while solid lines indicate positive correlations. Additionally, the graph provides evidence descriptions of gene-trait associations as presented in the original publications of TWAS Atlas 2.0.

(2) SNP-gene associations: These are downloaded and integrated from GTEx version 8. For each gene in each tissue, we group all regulatory variants using an LD-clumping strategy with the parameters --clump-p 1E-5 and --clump-ld-thresh 0.1. This process retains only the most significant variants based on p-value, while dropping the remaining variants that are in linkage disequilibrium (LD) with them.

3. Usage

Users can access the webpage through the database navigation bar labeled as 'Knowledge Graph'.

(1) Users should select one trait or a gene of interest at a time as the center of the current graph.

(2) Users can filter relationships based on gene type, effect direction, and tissue type. SNP - gene associations, whether to display or not, depend on the users' own needs.

(3) All nodes and lines in the graph are draggable, allowing users to adjust and download it.

Database Usage

1. Browsing Module

This module facilitates a comprehensive exploration of a database through four distinct perspectives: traits, publications, genes, and datasets.

(1) Traits Browsing: This table provides an overview of traits, including their ontology identifiers, categorized types, the number of associated publications, tissues, and gene-trait associations.

(2) Genes Browsing: The gene summary table displays information such as the gene symbol, Ensembl ID, chromosomal location, and gene type. The traits most strongly associated with a particular gene are presented in descending order.

(3) Publications Browsing: The publication table lists information from curated studies, including the PubMed ID, title, DOI, journal name, year of publication, the number of traits investigated, and links to the corresponding TWAS expression datasets.

(4) Dataset Browsing: This table summarizes the GWAS datasets used in the TWAS and complementary analyses. It includes the dataset ID, source database, accession ID, study title, number of associated traits, ancestry information, GWAS sample size, and links to the original GWAS summary statistics.

2. Searching Module

The database offers a range of flexible search functionalities across various modules.

(1) Quick Search: This feature allows users to quickly search for keywords related to traits, data sources, or genes directly from the home page.

(2) Advanced Search: For users who need more detailed queries, an advanced search option is available on a dedicated page. This allows for precise identification of specific TWAS associations.

3. Analysis Module

TWAS Atlas 2.0 now features four new modules that focus on Enrichment Analysis, Summary-data-based Mendelian Randomization (SMR), Colocalization, and Fine-mapping. Together with the existing TWAS module, they form an integrated framework for robust gene-trait discovery and functional interpretation.

(1) Enrichment Analysis: The Enrichment module interprets the biological functions of TWAS-significant genes for each trait using Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses. Users can select a specific trait to view visualizations and enrichment results related to it.

(2) Complementary Analysis: Users have the option to select different datasets to obtain SMR, Colocalization, and Fine-mapping results alongside TWAS. This provides additional insights into trait-gene associations and enhances understanding of genetic regulatory mechanisms.

4. Download Module

The database offers a collection of downloadable resources, ensuring users access information, including literature summaries and details on traits, genes, and cis-eQTLs of genes.

External Links

TWAS-hub: http://twas-hub.org

webTWAS: http://www.webtwas.net/#

scTWAS Atlas: https://ngdc.cncb.ac.cn/sctwas/

GTEx: https://gtexportal.org/home/datasets

GWAS Catalog: https://www.ebi.ac.uk/gwas

Neale Lab UKBB: https://www.nealelab.is/uk-biobank

swiss: https://github.com/statgen/swiss

PrediXcan: https://github.com/hakyimlab/PrediXcan

dbSNP: https://www.ncbi.nlm.nih.gov/snp

Ensembl: http://asia.ensembl.org/index.html

GeneCards: https://www.genecards.org