Database Commons
Database Commons

a catalog of worldwide biological databases

Database Profile

Cloud Pilot RNA-Sequencing for CCLE and TCGA

General information

URL: https://osf.io/gqrz9
Full name:
Description: Cloud Pilot RNA-Sequencing for CCLE and TCGA is a repository containing software, scripts, and processed data of transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas.
Year founded: 2016
Last update:
Version:
Accessibility:
Accessible
Country/Region: United States

Classification & Tag

Data type:
RNA
Data object:
Database category:
Major species:
Keywords:

Contact information

University/Institution: Brigham Young University
Address: Department of Biology, Brigham Young University, Provo, Utah, USA
City: Provo,
Province/State: Utah,
Country/Region: United States
Contact name (PI/Team): Stephen R. Piccolo
Contact email (PI/Helpdesk): stephen_piccolo@byu.edu

Publications

27982081
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia. [PMID: 27982081]
Tatlow PJ, Piccolo SR.

Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9).

Sci Rep. 2016:6() | 61 Citations (from Europe PMC, 2025-12-13)

Ranking

All databases:
2014/6895 (70.805%)
Expression:
415/1347 (69.265%)
2014
Total Rank
58
Citations
6.444
z-index

Community reviews

Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:

Word cloud

Related Databases

Citing
Cited by

Record metadata

Created on: 2018-01-27
Curated by:
Farah Nazir [2018-04-12]