Introduction
LGC characterizes and identifies lncRNAs based on the relationship between ORF (open reading frame) Length and GC content.
LGC is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without species-specific adjustments, and is robustly effective in discriminating lncRNAs from protein-coding RNAs across species that range from plants to mammals.
Installation
Install Python 3:
-
Get Python 3 at https://www.python.org/downloads/ or install with your operating system’s package manager.
Download LGC
$ tar zxf LGC-2.0.tar.gz # Depress LGC-2.0.tar.gz
$ cd LGC-2.0 # Open the folder
$ python LGC-2.0.py input.fasta output.txt # Run LGC
Successful run of LGC will print as following:
$ Input: input.fasta # Input file
$ Output: output.txt # Output file
$ Scan ORF ... # Scan ORF and calculate coding potential score
$ ORFfinder analysis completed. # Use ORFfinder to find ORF
$ Done # LGC runs to completion
$ Computation time XXX # Computation time of LGC
Input
Fasta format:
Users can upload fasta-formatted file (<100 Mb) from local disk or paste fasta-formatted sequence(s)
(small data set) into text area.
BED/GTF format:
Users can upload bed/gtf-formatted file (<3 Mb) or paste data into the text area.
When input file is BED/GTF format, the reference genome is required and the assembly version is
important.
This web server now supports Human (GRCh 38, hg19), Mouse (mm10, mm9), Fly (dm3) and Zebrafish
(Zv9).
Output
After finishing calculation, results will be shown in a new page. Users can sort the results by any
column by clicking on the column header. Also, LGC will assign an unique Task ID for each request.
Users can also retrieve results by inputting the Task ID in the homepage.
There are nine columns in the output file.
- Sequence name: name of transcript sequence
- ORF Start: the start of the longest ORF
- ORF End: the end of the longest ORF
- ORF Length: length of the longest ORF
- GC Content: GC content of the longest ORF
- Coding Potential Score: coding potential score for a transcript, which is
protein-coding RNA if greater than 0 or ncRNA if smaller than 0
- Coding Label: "Coding" represents mRNA and “Noncoding” represents lncRNA.
- pc: Probability of ORF in coding sequence
- pnc: Probability of ORF in non-coding sequence
- fc: Stop-codon probability for coding sequence
- fnc:Stop-codon probability in coding sequence