Leukemia Atlas

1. Introduction

1.1 About LeukAtlas

LeukAtlas is a comprehensive, freely accessible database that integrates three public cohorts (including TCGA-LAML, BEAT AML1.0, and TARGET) and twelve different studies related to CNVs in Acute Myeloid Leukemia (AML), covering various subtypes and age groups. The current version of LeukAtlas contains 12,597 CNVs from 1446 samples, along with any genetic mutations and clinical phenotype data that are publicly available.
LeukAtlas provides various CNV analysis tools, facilitating users to upload, download, and query leukemia CNV information, providing a comprehensive platform for CNV analysis in leukemia research.

1.2 Overview of LeukAtlas

The overview and construction of the LeukAtlas database are as follows:

Data Collection. The WXS data and SNP Array data were obtained from public databases and literature, along with corresponding phenotype information.

Data Processing. The identified CNV mutations were mined for comprehensive analysis, including CNV patterns, profiles, pathogenicity, enrichment region, and prognosis stratification.

Database Modules. LeukAtlas includes three main function modules: Profile, Browse, and Tool.

2. User Guide

2.1 ‘Profile’ function usage

This module includes information on 1446 AML-diagnosed samples. CNV profiles for different phenotypes are available for download and visualization.

Filter the AML subgroups of interest

Users can filter different subgroups by the following filter options (Gender, Age, Fab, Karyotype Group, Race Region, TP53, FLT3-TKD, FLT3-ITD, NRAS, NPM1, DNMT3A) to generate corresponding CNV profiles, clinical information, segmentation information, and mutation information.

When browsing CNV frequencies at chromosome loci, you can move the cursor to the specific loci of interest to view the particular counts of each CNV type. You can optionally visualize only one specific class of CNV mutation types by clicking the legend below the CNV frequency profile.

Download CNV and phenotype information

CNV segmentation data, phenotypic data, and clinical information in the database can be downloaded as Excel or CSV files from the Profile Module. Notably, by clicking on the display columns, the information you want to download is optional.

2.2 Browse CNV distribution in AML by IGV

Browse Module provides a search interface where users can query specific genomic regions or genes. This module offers detailed genomic annotations and displays CNV mutation frequencies within the AML population.

The genome browser in the database is based on IGV: Integrative Genomics Viewer (https://www.igv.org). This module displays the frequency of mutations and CNVs (GAIN, LOSS, and UPD) at chromosomal loci, as well as the presence of CpG islands, Repeat Masker, distal enhancers, and proximal enhancers, along with annotation information of genes within the respective regions.

The annotation data for RefSeq genes, CpG islands, and Repeat sequences are sourced from the UCSC Genome Browser (https://genome.ucsc.edu). The annotation data for distal enhancers and proximal enhancers is sourced from the ENCODE database (https://www.encodeproject.org). You can enter specific chromosomal segments or gene symbols to search for and display the relevant information within that segment.

2.3 Investigate your CNV data

With the Tool Module, users can upload their own CNV data for exploring CNV patterns through text input and file upload methods. Note that the input file must be the standard ASCAT output file with six columns. LeukAtlas supports only GRCh38 version of the human genome.

Here is a standard example of an input file:

sample	chr	startpos	endpos	nMajor	nMinor
S1	1	3301765	247650984	1	1
S1	2	480597	241537572	2	1
S1	3	2170634	45186644	1	0
S1	3	45187701	46886248	2	0
S1	3	46889988	197812401	0	0

The 'Visualization' tool processes samples for copy number categorization and presents the results in bar chart format. An example is displayed on the page, and users can download 'Pattern.zip' to access the results for all samples. Additionally, the CNV profile is plotted for further analysis.

The "Predict OS" tool enables joint analysis and clustering of user-uploaded samples with those in the database, predicting long-term survival for these samples. Before using this tool, samples must run “Visualization” to obtain copy number categorization. And an appropriate subgroup from the database should be selected for analysis. The predictive results should be considered as references only.

The "Annotation" tool employs XCNV to annotate the pathogenicity of CNVs, uses bedtools to identify AML-related genes, and categorizes different types such as UPD, GAIN, and LOSS based on changes in copy number.

3. Data Processing Standards

3.1 Data Resource

LeukAtlas compiles CNV data for leukemia from 12 studies and public databases like TCGA, GEO, and EGA. Data types include SNP 6.0 array, WES, and WGS. The keywords used for searching relevant literature in PubMed are 'copy number,' 'leukemia,' 'CNA(s),' 'CNV(s),' and 'genomic instability.'

3.2 Data Analysis

Quality Control and Genotyping.

For SNP array data in *.CEL format, quality control, and genotyping are conducted using the Birdseed algorithms in Affymetrix Power Tools (APT).

For WXS data in *.BAM format, alleleCounter is used to extract allele counts for each SNP locus, which generates Log R ratio (LRR) and B allele frequency (BAF) files.

CNV Analysis with ASCAT.

CNV calling is conducted by ASCAT (Allele-specific copy number analysis of tumors). ASCAT processes the input LRR/BAF files to identify allele-specific copy number variations for each region. ASCAT estimates tumor sample purity and ploidy and corrects the impact of tumor purity and ploidy on copy number estimation.

CNV Profiling.

R package ggplot2 is used to generate CNV profiles for each sample, with different colors representing different copy number states (LOSS, UPD, GAIN).

Pathogenicity Prediction.

Pathogenicity prediction of CNV is accomplished with X-CNV (http://www.unimd.org/XCNV). The XCNV's prediction results include pathogenicity scores and relevant annotations.

4. Contact Us

Please get in touch with us if you have any questions or would like to give us suggestions/comments or report a bug.

Institution: China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences
Address: No.1 Beichen West Road, Chaoyang District, Beijing 100101, China
Email: suyanxun2019m@big.ac.cn