CASOL


28396519	High-confidence coding and noncoding transcriptome maps. [PMID: 28396519] You BH, Yoon SH, Nam JW. Abstract The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. Genome Res. 2017:27(6) \| 48 Citations (from Europe PMC, 2026-05-09)

High-confidence coding and noncoding transcriptome maps. [PMID: 28396519]

You BH, Yoon SH, Nam JW.

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes.

Genome Res. 2017:27(6) | 48 Citations (from Europe PMC, 2026-05-09)

URL:	http://big.hanyang.ac.kr/CASOL/index.html
Full name:	comprehensive annotation system of lncRNAs
Description:	CASOL develops an integrative approach to define reference catalogues lncRNAs. This catalogue unifies previously existing annotation sources with transcripts it assembled from stranded/unstranded RNA-seq data collected from various species and cell lines. The BIGTranscriptome catalogue comprises transcripts that are complete at both the 5′ and 3′ ends with CAGE and 3 P-seq evidence.
Year founded:	2017
Last update:	2019-07-16
Version:	v3.0
Accessibility:	Accessible
Country/Region:	Korea, Republic of

Data type:	RNA
Data object:	Animal
Database category:	Gene genome and annotation
Major species:	Homo sapiens Mus musculus Canis lupus Equus caballus Gallus gallus
Keywords:	lncRNA full length transcript

University/Institution:	Hanyang Univeristy
Address:	FTC Room 1123, Hanyang Univeristy, 222 Wangsimni Seongdong-gu Seoul
City:
Province/State:
Country/Region:	Korea, Republic of
Contact name (PI/Team):	Jin-Wu Nam
Contact email (PI/Helpdesk):	jwnam@hanyang.ac.kr

Database Commons
a catalog of worldwide biological databases

a catalog of worldwide biological databases

Database Profile

General information

Classification & Tag

Contact information

Publications

Ranking

Community reviews

Word cloud

Tags

Related Databases

Record metadata

Database Commons a catalog of worldwide biological databases