De novo-Assembly

Functions

      'De novo-Assembly' can be used to assemble raw NGS sequencing data, compare the assembly contigs to SARS-CoV-2 reference database, identify SARS-CoV-2 genetic sequences from the assembly contigs, quality evaluation of assembly contigs, analyze the degree of contig coverage, sequencing depth, etc.

Sequencing type:

Upload single-end sequencing file

This process currently only supports Illumina sequencing data. Processing pipelines for analyzing the third-generation sequencing data (PacBio /Nanopore) will be deployed later. To reduce upload time, ‘gzip’ compressed file is recommended.

Email (Results will be notified via email when the calculating time is long)
Email
 
Subject
Run
Reminder: Running tasks: , Tasks in queue: ; Refer to the table below for the estimated processing time.
Data processing pipeline
Example Result
Help
1. Reference Running Time

      Reference running time is tested using real data set when the server system is idle (not including upload time). Running time of actual tasks depend on the workload state of the server system, data volume, data quality and so on.

Data1Data2Data3Data4Data5Data6
SINGLE/PAIREDSINGLEPAIREDPAIREDPAIREDPAIREDPAIRED
Data volume118Mb203Mb1.0Gb1.5Gb2.2Gb8.0Gb
Calculating time*1m34s1m48s23m12s41m18s1h5m28s41m12s
AccessionSRR11247077SRR10903402SRR11092064SRR11092057SRR11092058SRR10971381

*Run on 10 Threads, 50G Memory

2. Software version and main parameters
  1. Trimmomatic (Version: 0.39)
    1. Paired-end model: ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 MINLEN:36
    2. Single-end model: ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 SLIDINGWINDOW: 4:15 LEADING:3 TRAILING:3 MINLEN:36
  2. MEGAHIT (Version: 1.2.9)
    1. Default parameters
  3. QUAST (Version: 5.0.2)
    1. Default parameters
  4. BLASTN (ncbi-blast-2.10.0+)
    1. -outfmt “6 qseqid qlen sseqid slen pident length qcovs qcovhsp qcovus mismatch gapopen qstart qend sstart send gaps evalue bitscore stitle” -subject_besthit -perc_identity 80 -max_hsps 5
  5. BBMap (Version: Last modified February 13, 2020)
    1. kfilter=22 subfilter=15 maxindel=80
  6. Samtools (Version: 1.3.1)
    1. Default parameters
  7. Deeptools/bamCoverage (Version: 3.4.3)
    1. -bs 1
3. Results files
  1. Trimmomatic.log
    1. Raw data quality-trimmed stats summary file
  2. sample_all_assembly.fa
    1. All contigs assembled by MEGAHIT
  3. Assembly_statistics.tar.gz
    1. Assembly quality assessment result by QUAST.
  4. sample_2019nCoV_blastn_6.tsv
    1. SARS-CoV-2 RefSeq BLASTN results (format 6-tabular).
  5. sample_2019nCoV.fa
    1. Contigs which identified as SARS-CoV-2 sequences (BLASTN hit).
  6. viral_contigs_coverage directory
    1. Viral_contigs_bowtie2_mapped.log
      1. Mapping stats of reads on SARS-CoV-2 contigs.
    2. Viral_contigs_coverage_statistics.tsv
      1. Coverage stats of SARS-CoV-2 contigs.
    3. Viral_contigs_coverage.bw
      1. SARS-CoV-2 contigs visualization file in bigWig format.