PCTA


38462036	PCTA, a pan-cancer cell line transcriptome atlas. [PMID: 38462036] Siyuan Cheng, Lin Li, Xiuping Yu Abstract A substantial volume of RNA sequencing data have been generated from cancer cell lines. However, it requires specific bioinformatics skills to compare gene expression levels across cell lines. This has hindered non-bioinformaticians from fully utilizing these valuable datasets in their research. To bridge this gap, we established a curated Pan-cancer Cell Line Transcriptome Atlas (PCTA) dataset. This resource aims to provide a user-friendly platform, allowing researchers without extensive bioinformatics expertise to access and leverage the wealth of information within the dataset for their studies. The PCTA dataset encompasses the expression matrix of 24,965 genes, featuring data from 84,385 samples derived from 5677 studies. This comprehensive compilation spans 535 cell lines, representing a spectrum of 114 cancer types originating from 30 diverse tissue types. On UMAP plots, cell lines originating from the same type of tissue tend to cluster together, illustrating the dataset's ability to capture biological relationships. Additionally, an interactive and user-friendly web application (https://pcatools.shinyapps.io/PCTA_app/) was developed for researchers to explore the PCTA dataset. This platform allows users to examine the expression of their genes of interest across a diverse array of samples. Cancer Lett. 2024:588() \| 8 Citations (from Europe PMC, 2026-03-28)
38260452	PCTA, A PAN-CANCER CELL LINE TRANSCRIPTOME ATLAS. [PMID: 38260452] Siyuan Cheng, Lin Li, Xiuping Yu Abstract BACKGROUND: A substantial volume of RNA sequencing data were generated from cancer cell lines. However, it requires specific bioinformatics skills to compare gene expression levels across cell lines. This has hindered non-bioinformaticians from fully utilizing these valuable datasets in their research. To bridge this gap, we established a curated Pan-cancer Cell Line Transcriptome Atlas (PCTA) dataset. This resource aims to provide a user-friendly platform, allowing researchers without extensive bioinformatics expertise to access and leverage the wealth of information within the dataset for their studies. Importantly, PCTA stands out by offering sufficient sample numbers per cell line in comparison to other pan-cancer datasets. METHODS: Cell lines' meta data and RNA sequencing data were retrieved from the Cancer Cell Line Encyclopedia (CCLE), SRA and ARCHS4 databases. Utilizing the programming language R, we conducted data retrieval, normalization, and visualization. Only expression data for protein-coding genes and long-non-coding RNAs (LncRNAs) were considered in this study, streamlining the focus to enhance the precision and relevance of the analysis. RESULTS: The resulting PCTA dataset encompasses the expression matrix of 24,965 genes, featuring data from 84,385 samples derived from 5,677 studies. This comprehensive compilation spans 535 cell lines, representing a spectrum of 114 cancer types originating from 30 diverse tissue types. On UMAP plots, cell lines originating from the same type of tissue tend to cluster together, illustrating the dataset's ability to capture biological relationships. To unravel molecular signatures, marker genes were identified for each cancer type. Additionally, an interactive and user-friendly web application (https://pcatools.shinyapps.io/PCTA_app/ ) was developed for researchers to explore the PCTA dataset. This platform allows users to examine the expression pattern of their genes of interest across a diverse array of samples. Data are visualized as violin-, box-, and point- plots, enhancing the interpretability of the findings. CONCLUSION: The PCTA stands as a comprehensive resource, offering insights into gene expression patterns across diverse cancer cell lines and providing a valuable tool to explore molecular signatures and potential therapeutic targets in cancer research. bioRxiv. 2024:() \| 0 Citations (from Europe PMC, 2026-03-28)

PCTA, a pan-cancer cell line transcriptome atlas. [PMID: 38462036]

Siyuan Cheng, Lin Li, Xiuping Yu

A substantial volume of RNA sequencing data have been generated from cancer cell lines. However, it requires specific bioinformatics skills to compare gene expression levels across cell lines. This has hindered non-bioinformaticians from fully utilizing these valuable datasets in their research. To bridge this gap, we established a curated Pan-cancer Cell Line Transcriptome Atlas (PCTA) dataset. This resource aims to provide a user-friendly platform, allowing researchers without extensive bioinformatics expertise to access and leverage the wealth of information within the dataset for their studies. The PCTA dataset encompasses the expression matrix of 24,965 genes, featuring data from 84,385 samples derived from 5677 studies. This comprehensive compilation spans 535 cell lines, representing a spectrum of 114 cancer types originating from 30 diverse tissue types. On UMAP plots, cell lines originating from the same type of tissue tend to cluster together, illustrating the dataset's ability to capture biological relationships. Additionally, an interactive and user-friendly web application (https://pcatools.shinyapps.io/PCTA_app/) was developed for researchers to explore the PCTA dataset. This platform allows users to examine the expression of their genes of interest across a diverse array of samples.

Cancer Lett. 2024:588() | 8 Citations (from Europe PMC, 2026-03-28)

PCTA, A PAN-CANCER CELL LINE TRANSCRIPTOME ATLAS. [PMID: 38260452]

Siyuan Cheng, Lin Li, Xiuping Yu

Abstract

BACKGROUND: A substantial volume of RNA sequencing data were generated from cancer cell lines. However, it requires specific bioinformatics skills to compare gene expression levels across cell lines. This has hindered non-bioinformaticians from fully utilizing these valuable datasets in their research. To bridge this gap, we established a curated Pan-cancer Cell Line Transcriptome Atlas (PCTA) dataset. This resource aims to provide a user-friendly platform, allowing researchers without extensive bioinformatics expertise to access and leverage the wealth of information within the dataset for their studies. Importantly, PCTA stands out by offering sufficient sample numbers per cell line in comparison to other pan-cancer datasets.
METHODS: Cell lines' meta data and RNA sequencing data were retrieved from the Cancer Cell Line Encyclopedia (CCLE), SRA and ARCHS4 databases. Utilizing the programming language R, we conducted data retrieval, normalization, and visualization. Only expression data for protein-coding genes and long-non-coding RNAs (LncRNAs) were considered in this study, streamlining the focus to enhance the precision and relevance of the analysis.
RESULTS: The resulting PCTA dataset encompasses the expression matrix of 24,965 genes, featuring data from 84,385 samples derived from 5,677 studies. This comprehensive compilation spans 535 cell lines, representing a spectrum of 114 cancer types originating from 30 diverse tissue types. On UMAP plots, cell lines originating from the same type of tissue tend to cluster together, illustrating the dataset's ability to capture biological relationships. To unravel molecular signatures, marker genes were identified for each cancer type. Additionally, an interactive and user-friendly web application (https://pcatools.shinyapps.io/PCTA_app/ ) was developed for researchers to explore the PCTA dataset. This platform allows users to examine the expression pattern of their genes of interest across a diverse array of samples. Data are visualized as violin-, box-, and point- plots, enhancing the interpretability of the findings.
CONCLUSION: The PCTA stands as a comprehensive resource, offering insights into gene expression patterns across diverse cancer cell lines and providing a valuable tool to explore molecular signatures and potential therapeutic targets in cancer research.

bioRxiv. 2024:() | 0 Citations (from Europe PMC, 2026-03-28)

URL:	https://pcatools.shinyapps.io/PCTA_app
Full name:	Pan-cancer Cell Line Transcriptome Atlas
Description:	PCTA is a curated dataset comprising RNAseq data of 84,385 samples from 535 cell lines, representing 114 cancer types across 30 tissue origins. The dataset allows non-bioinformaticians to explore gene expression patterns through an interactive web application, enhancing the accessibility and utility of RNAseq data for cancer research.
Year founded:	2024
Last update:	2024-04-28
Version:	v1.0
Accessibility:	Accessible
Country/Region:	United States

Data type:	RNA
Data object:	Animal
Database category:	Expression Gene genome and annotation Health and medicine
Major species:	Homo sapiens
Keywords:	gene expression Pan-cancer cancer cell line

University/Institution:	LSU Health Shreveport
Address:
City:
Province/State:
Country/Region:	United States
Contact name (PI/Team):	Siyuan Cheng
Contact email (PI/Helpdesk):	siyuan.cheng@lsuhs.edu

Database Commons
a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons a catalog of worldwide biological databases