高级检索

工具

目前,GenBaseTools为通用序列和新冠序列的序列校验提供了两个独立工具。

下载链接

Usage: gbt <COMMAND> Commands: seqvalcom Validate common sequences seqvalcovid Validate COVID19 sequences help Print this message or the help of the given subcommand(s) Options: -h, --help Print help -V, --version Print version

通用序列校验

使用以下命令行校验通用序列:

gbt seqvalcom common_seq.fsa -o val_out

COVID-19序列校验

使用以下命令行校验COVID-19序列:

gbt seqvalcovid covid_seq.fsa -o val_out
校验程序将会输出如下汇总信息:
LOG->ERROR: Found 7 errors LOG->ERROR: Found 0 warnings

如果校验有错,将产生如下校验结果文件:

val_out.error.txt val_out.warning.txt

文件内容将以表格形式显示。例如:

Error TypeMessage
NucleotideFound invalid char '@' at Line 2, Column 18
NucleotideFound invalid char '@' at Line 2, Column 19
NucleotideFound invalid char '@' at Line 2, Column 20
NucleotideFound invalid '>' at Line 2, Column 34 in sequence(seqid:'>ssss'). This symbol is not allowed in the sequence. Please check whether the new-line character is missing.
NucleotideFound invalid '>' at Line 1164, Column 24 in sequence(seqid:'>Beijing-AAA-2022'). This symbol is not allowed in the sequence. Please check whether the new-line character is missing.
Defline Found duplicated sequence id: '>ssss'
Defline Found duplicated sequence id: '>asdf'