2. Sequencing File
2.1 File Type

This page reviews the submission file formats currently supported by the GSA, and gives guidance to submitters about current file formats and policies regarding GSA submissions.

File types File suffix Applicable platforms Is recommended
Fastq .fastq.gz
All Platforms Yes
Bam .bam All Platforms Yes
Sff .sff LS454
Complete Genomics Native .tar.gz
Complete Genomics
Solid Native .tar.gz
PacBio_HDF5 .tar
PacBio RS
PacBio RS II
PacBio RS /PacBio RS II recommend
PacBio Sequel Native .tar
PacBio Sequel PacBio Sequel recommend
Oxford Nanopore Native .tar
Oxford Nanapore
10x Genomics .tar
Bnx .bnx.gz
Bionano Genomics
Fasta .fasta.gz
Helicos Native .tar
Helicos BioSciences Corporation
2.2 File Formats

Read data can be submitted in several standards and platform specific formats. We recommend that read data submitted in BAM Fastq and BAM format.

Fastq format

Single and paired reads are accepted as Fastq files that meet the following requirements:

1) Quality scores must be in Phred scale. Both ASCII and space delimitered decimal encoding of quality scores are supported. We will automatically detect the Phred quality offset of either 33 or 64.

2) No technical reads (adapters, linkers, barcodes) are allowed.

3) Single reads must be submitted using a single Fastq file and can be submitted with or without read names.

4) Paired reads must be submitted using two Fastq files.

5) Paired read names must have a suffix identifying the first and second read from the pair, for example '/1' and '/2' (regular expression for the reads: "^@([a-zA-Z0-9_-]+:[0-9]+:[a-zA-Z0-9]+:[0-9]+:[0-9]+:[0-9-]+:[0-9-]+) ([12]):[YN]:[0-9]*[02468]:[ACGTN]+$").

6) The first line for each read must start with '@'.

7) The base calls and quality scores must be separated by a line starting with '+'.

8) The Fastq files must be compressed using gzip or bzip2.

9) The regular expression for bases is “^([ACGTNactgn.]*?)$”

BAM format

Submitted BAM files must be readable with Samtools and Picard.

BAM file names are required to end up with the .bam suffix (e.g. ‘a.bam’).