Welcome to scTWAS Atlas database, a comprehensive repository designed to store, manage, and disseminate high-resolution single-cell transcriptome-wide association (scTWAS) data. The database currently covers 30 cell types, 9 cell conditions and 34 human complex traits, both curated from single-cell TWAS publications and analyzed from a standard workflow. In addition, scTWAS Atlas provides abundant visualization tools for scTWAS results including knowledge graph and TWAS Manhattan plots, as well as analysis modules such as cell-type-specific TWAS genes comparison and Summary-data-based Mendelian Randomization (SMR) analysis. In summary, scTWAS Atlas offers a comprehensive platform for exploring the intricate landscape of cellular level gene-trait associations, facilitating studies in human health and diseases.
Introduction to single-cell transcriptome-wide association study:
Transcriptome-wide association study integrates eQTL data with GWAS data to identify trait-associated risk genes regulated by risk genomic locus. In detail, TWAS first builds expression prediction models for each gene using external reference panel with genotype data and gene expression data. Then the expression level of genes of GWAS samples are imputed with the models and finally the association between these gene expression level and traits are calculated to identify trait-associated genes. To take deeper understanding of the biological mechanism of these genes functioning in different cell types, we performed transcriptome-wide association analysis at cell-type-level to identify cell-type-specific gene-trait associations.
We included all available single-cell transcriptome-wide association study publications in the database. The curation steps were as follows:
(1) Literature search and filter: we conducted literature search in PubMed using pre-defined keywords and publications were included in scTWAS Atlas only if they contained necessary descriptions on involved features and significant gene-trait associations:
(a) Information about TWAS methods: computational method/software used for single-cell TWAS.
(b) Information about gene-trait associations: p-value and effect size must be presented to indicate significance and size of effect of gene expression on trait respectively. False Discovery Rate (FDR) p-value was also included if provided.
(c) Information about cell type: cell type specificity of TWAS associations must be presented showing cell-type-specific characteristics of associations in TWAS.
(2) Study curation: we manually curated the study information for each qualified publication, including reported trait, cell type and the corresponding tissue type, method/software and ancestry of population.
(a) Association collection: we collected gene-trait associations including significance and size of effect of gene expression on trait. Furthermore, we integrate genome-level regulatory information about genes (eQTLs) in the corresponding cell types from the original single-cell eQTL datasets to provide more comprehensive regulatory information for trait.
scTWAS Atlas performs single-cell transcriptome-wide association study analysis based on single-cell eQTL summary statistics data from 10 single-cell eQTL studies and GWAS summary statistics from GWAS Catalog by OTTERS software in the current version.
(1) single-cell eQTL data collection
We searched and filtered single-cell eQTL studies with the following standards:
(a) Studies must contain clear information about tissue and cell type, excluding embryonic tissues and artificially cultured tissues.
(b) Studies must provide explicit information about the population and sample size.
(c) Cell counts must be more than 5,000.
(d) Studies must provide complete or calculable complete eQTL statistical data, including genetic variant (identified by dbSNP rsID or genomic coordinates), target genes that regulated by the variants, the strength and direction of the eQTL-expression association (effect size), significance of the association (p-value).
(2) GWAS data collection
We obtained GWAS summary statistics from the public database GWAS Catalog and filtered GWAS projects with the following standards:
(a) Since the collected eQTLs are mainly from European populations, we filtered GWAS studies targeting European populations.
(b) Sample size of the GWAS project must be more than 9,000.
(c) Projects must include complete GWAS summary statistics data, including genetic variant (identified by dbSNP rsID or genomic coordinates), the strength and direction of variant-trait association (effect size or odd ratios) and significance of the association (p-value).
(3) Data preprocessing and standardization for both eQTL summary statistics and GWAS summary statistics data:
(a) Standardize the genomic coordinates utilizing GRCh38 reference.
(b) Calculate z-scores from p-values and beta coefficients or or odd ratios.
(4) Software
We used OTTERS to perform TWAS analysis. OTTERS is developed to perform TWAS utilizing eQTL summary statistics and GWAS summary statistics.
scTWAS Atlas works on a structured and standardized category for traits based on MesH (Medical Subject Heading) Database. The names and definitions of traits were mapped to the MesH for unity. For the detail information of each trait, its name, ontology ID, description, synonyms are provided.
scTWAS Atlas works on a standardized definition and classification for cell types based on the Experimental Factor Oncology (EFO). For the detail information of each cell type, its name, ontology ID, description, synonyms are provided.
(a) Introduction
We have developed an interactive knowledge graph to facilitate the understanding of complex relationships between genes, cell types, and diseases. This graph is updated in real-time as new data becomes available, ensuring a current view of the ever-evolving landscape of single-cell TWAS associations.
(b) Implementation
We provide three kinds of central nodes in the knowledge graph.
1) Choose a trait as the central node: for each available traits, the knowledge map contains five layers of associations: trait - tissue - cell type - gene - SNP.
2) Choose a gene as the central mode: for each available gene, the knowledge map contains four layers of associations: gene - cell type - tissue- trait.
3) Choose a cell type as the central node: the knowledge map contains three layers of associations: cell type - gene - trait.
For each association, the edges represent different biological meanings:
1) Each trait - tissue - cell type - gene line represents a single-cell TWAS association that the trait is regulated by the gene through a certain cell type in a given tissue.
2) SNP - gene associations, which are integrated from single-cell eQTL studies. For each gene in each cell type, only the significant eQTLs (p-value < 1e-4) are presented.
(1) Browsing Module
This module is designed to facilitate exploration of the database through two distinct perspectives: traits and cell types.
(a) Trait Browsing: Users are presented with a statistical chart that illustrates the scope of traits in the database. A horizontal bar plot displays cell types with greatest number of associated traits, indicating their wide impacts on multiple traits. An accompanying overview table furnishes in-depth details, delineating trait categories, associated cell types, GWAS datasets, and the count of associated genes.
(b) Cell Type Browsing: Similarly, the cell type page offers a statistical chart and an overview table. This table elucidates tissue types and distills summary statistics, including associated traits, single-cell eQTL datasets, and the tally of associated genes for each cell type.
(2) Search Module
The database is equipped with versatile search functionalities across multiple modules.
(a) Quick Search: empowers users to swiftly search for keywords pertaining to traits, cell types, or genes on home page directly.
(b) Advanced Search: For users seeking more refined queries, an advanced search option is accessible through a dedicated page, enabling the pinpointing of specific TWAS associations with precision.
(3) Visualization Module
(a) Knowledge Graph: Users can access the webpage through the database navigation bar labeled as 'Knowledge Graph'. Firstly, users should select one trait, one cell type or a gene of interest at a time as the center of the current graph, then filter relationships based on tissue type or cell type. All nodes and lines in the graph are draggable, allowing users to adjust and download it.
(b) Manhattan Plots: Users could identify the significant genomic regions associating with traits by filtering different traits, cell types and datasets.
(4) Analysis Module
The database provides analytical tools designed to dissect and compare TWAS results at the cell-type level.
(a) scTWAS Comparison Analysis: Users are able to click on "Intersection of TWAS genes" on the page and then choose different traits and datasets to visualize an UpSet plot for visual representation of intersecting and distinct sets of TWAS genes across various cell types. Additionally, users are able to click on "Significance of TWAS genes" and visualize the heatmap plot to delineate similarity patterns of p-values of TWAS genes among different cell types.
(b) SMR Analysis: Users are able to choose different datasets and get SMR results compared with TWAS, providing complementary trait-gene associations and deeper insights into their genetic regulatory mechanisms.
(5) Download Module
The database provides a repository of downloadable resources, ensuring that users have access to information including literature summaries, single-cell eQTL study datasets, GWAS project datasets, summary information of traits, cell types and gene, and significant single-cell eQTL data.
National Genomics Data Center
Beijing Institute of Genomics (China National Center for Bioinformation), Chinese Academy of Sciences
No. 104 building, No.1 Beichen West Road, Chaoyang District
Beijing 100101, China
Tel: +86 (10) 8409-7443
Fax: +86 (10) 8409-7443