BS-RNA is written in Perl and is executed from the command line in LINUX system. To install BS-RNA simply copy the BS-RNA_v1.0.tar.gz file (please download from http://bs-rna.big.ac.cn) into a BS-RNA installation folder and extract all the files by typing:
tar xzf BS-RNA_v1.0.tar.gz
BS-RNA requires a working of Perl, Python (at least Python2.7.8), HISAT2 (at least hisat2-2.0.1-beta) and SAMtools (at least SAMtools-1.0). Therefore it is a requirement that they are installed on your machine. BS-RNA will assume that these software are all in the working path unless their paths are specified manually. Furthermore bowtie2 should also be in the working path as HISAT2 uses the bowtie2 implementation to handle most of the operations on the FM index.
Either paired-end or single-end reads with variable read length from strand-specific libraries are supported by BS-RNA. The input sequence format should be uncompressed FastQ.
First you need download the reference genome sequences files of your concerned species and place them in a folder. Only single-entry files are supported. BS-RNA supports reference genome sequences in FastA format. The name begin with "chr" and the only allowed file extension is .fa. Secondly a gene model annotation file also need to be downloaded, which should be in GTF format.
Furthermore, two configure files could be specified for indexing the reference genome sequences and mapping the RNA sequencing data to the reference genome sequences if the user want to custom the corresponding parameters. An instruction on how to generate the configure file for hisat2-build indexer or hisat2 could be found in the downloaded package. Each option should be specified in one single line.
|
A typical command for analyzing paired-end RBS-seq data is as follows:
BS-RNA_v1.0 --perlDir script --reads1 test_T-rich.fq --reads2 test_A-rich.fq --gene Homo_sapiens.GRCh37.75.gtf
--rawRef hg19_ref --phred64 --pathToPython /.../python2.7.8/bin
--pathToHISAT2 /.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result
While for a single-end T-rich reads file is like this:
BS-RNA_v1.0 --perlDir script --reads1 test_T-rich.fq --gene Homo_sapiens.GRCh37.75.gtf --rawRef hg19_ref
--phred64 --pathToPython /.../python2.7.8/bin --pathToHISAT2
/.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result
Or for a single-end A-rich reads file:
BS-RNA_v1.0 --perlDir script --reads2 test_A-rich.fq --gene Homo_sapiens.GRCh37.75.gtf --rawRef hg19_ref
--phred64 --pathToPython /.../python2.7.8/bin --pathToHISAT2
/.../hisat2-2.0.1-beta --pathToSAMtools /.../samtools-0.1.16 --outDir /.../demo_result
If the reference genome sequences have been converted in the previous analysis, please skip this step by adding this option to save time: "--convertRef path_of_converted_reference_genome". In this situation, BS-RNA generates three folders in the specified output directory:
Map: contains mapping result file in SAM format and another file with spliced sites.
Filter: contains filtered mapping result file in SAM format and a statistic file called
"filter_mapping.sam.maprate" containing the following information:
|
|||||
ps. The reads are mapped to the converted reference genome sequences, therefore the chromosome present in the SAM file contain "C-T" (represent the chromosome which convert all cytosines to thymines) or "G-A" (represent the chromosome which convert all guanines to adenines).
Level: contains BED files, which presents the following information for each covered cytosine site:
|
If the "--convertRef" option is not specified, an extra folder named "ref_C-T_G-A" will also be created in the output directory. This folder contains the concatenated raw genomce sequences and converted genome sequences in FastA format as well as the corresponding bowtie2 indexed files.