Transcriptome-Wide Association Study ( TWAS ) directly associates genes with traits to figure out trait-related signals at the transcriptome level. TWAS effectively utilizes large samples, allowing for direct association between traits and the expression of genes, which are more explicable biological units than genomic variants. TWAS helps us a deeper understanding of the complex mechanisms and regulation of various diseases and traits. Hundreds of TWAS studies have been carried out and demonstrated to be of great reference value for researchers on complex traits and diseases in recent years. There is an urgent necessity to provide an integrative resource for TWAS publications. Valuable contributions have been made in establishing TWAS database to integrate TWAS related datasets.
There is still a lack of database curating published TWAS knowledge and associations with a rapidly increasing number, which involve a wider range of data resources and improved methodologies more appropriate to corresponding datasets. Therefore, we aim to develop TWAS Atlas, a curated resource of published transcriptome-wide association studies. Unlike data-oriented databases, TWAS Atlas manually collects high-quality gene-trait associations from intensive publications. We envisage promoting readability and usability of TWAS results by combining published results from researchers worldwide, explaining the complex genetic basis and providing new targets and therapeutic directions.
The publications included in TWAS Atlas should study on human traits and contain the following information.
(1) Information about computational method/software used for TWAS, including the classic TWAS methods and their corresponding transformations such as PrediXcan, MultiXcan, FUSION, UTMOST, SMR, Summary-PrediXcan ( S-PrediXcan ) and Summary-MultiXcan ( S-MultiXcan ), and other methods for calculating gene-trait associations like linear regression models, random forest and so on.
(2) Information about gene-trait associations including p-value or effect size to indicate significance and size of effect of gene expression on trait respectively. Only relationships reported at a significant level in the publication or p-value less than 1E-4 or q-value less than 0.05 are included in TWAS Atlas. For studies without reporting p-value or only with the associations corresponded to cutoff in the publication, all of them are recorded in the atlas.
(3) Information about tissue, such as single tissue and cross-tissue, showing tissue-specific characteristics of associations in TWAS.
(1) Literature search and filter: we conduct literature search in PubMed using pre-defined keywords and publication is included in TWAS Atlas only if it contains necessary description on involved features and significant gene-trait associations.
(2) Study curation: we manually curate the study information for each qualified publication, including reported trait, tissue type, method/software and ancestry of population.
(3) Association collection: we further collect significant gene-trait associations ( at a significant level in the publication or p-value less than 1E-4 or q-value less than 0.05 ), including significance and size of effect of gene expression on trait. Furthermore, we integrate genome-level regulatory information about genes ( eQTLs ) in 49 human tissues from GTEx to provide more comprehensive regulatory information for trait. Detailed gene and SNP information is reannotated based on GENCODE version26 ( GRCh38 ). The standardized curation process makes TWAS Atlas a high-quality curated knowledgebase.
To unify trait name, definition and category, we newly establish a trait ontology classification system. Traits reported in the collected publications are mapped to entities to build the trait ontology. Traits in the atlas are displayed and grouped according to the mapped traits to facilitate comparability. The Experimental Factor Ontology ( EFO ) is hosted and described here. The information for each trait stored in the TWAS Atlas involves the trait name, ontology ID, description, synonyms and the terms mapped from other databases and ontologies. The detailed description of the trait is from EFO and other ontologies like the NCIT. The traits are mapped from other databases and ontologies ( e.g. ICD9/ICD10, MONDO and SNOMEDCT ). Meanwhile, we classify all the traits collected in the atlas, including four main categories ( disease, measurement, phenotypic abnormality and others ) and 41 subcategories, helping users better locate and comprehend the traits of interest.
Knowledge graph constructed in the TWAS Atlas is an interactive graph to visualize and download the association maps among variants, genes and traits.
For each available trait, the knowledge map contains two kinds of associations:
(1) Gene - trait associations, which are curated from all the publications reporting this trait. The line that connects a gene to a trait contains three aspects of information. The color of the line represents the tissue type; The size of the line represents the magnitude of the significance of the association; The dotted and solid lines represent negative correlation and positive correlation respectively.
(2) SNP - gene associations, which are downloaded and integrated from GTEx version 8. For each gene in each tissue, we clump all regulatory variants of the gene based on LD-clump strategy using swiss ( parameter --clump-p 1E-5 --clump-r2 0.1 ), with only the best variants by p-value kept first and the remaining variants in LD with it dropped.
Users can access the webpage through the database navigation bar labeled as 'Knowledge Graph'.
(1) Users should select one trait or a gene of interest at a time as the center of the current graph.
(2) Users can filter relationships based on gene type, effect direction and tissue type. SNP - gene associations whether to display or not depends on users' own needs.
(3) All nodes and lines in the graph are draggable, allowing users to adjust and download it.