Vcfexpress: flexible, rapid user-expressions to filter and format VCFs.

Brent S Pedersen, Aaron R Quinlan
Author Information
  1. Brent S Pedersen: Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, United States. ORCID
  2. Aaron R Quinlan: Department of Human Genetics, University of Utah, Salt Lake City, UT 84112, United States. ORCID

Abstract

MOTIVATION: Variant call format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others.
RESULTS: Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.
AVAILABILITY AND IMPLEMENTATION: vcfexpress is available under the MIT license at https://github.com/brentp/vcfexpress with code used for the manuscript deposited in https://doi.org/10.5281/zenodo.14756838.

References

  1. Genome Biol. 2016 Jun 06;17(1):122 [PMID: 27268795]
  2. Gigascience. 2021 Feb 16;10(2): [PMID: 33590861]
  3. Front Genet. 2012 Mar 15;3:35 [PMID: 22435069]
  4. Bioinformatics. 2011 Aug 1;27(15):2156-8 [PMID: 21653522]
  5. Fly (Austin). 2012 Apr-Jun;6(2):80-92 [PMID: 22728672]
  6. Bioinformatics. 2023 Jan 1;39(1): [PMID: 36519840]
  7. NPJ Genom Med. 2021 Jul 15;6(1):60 [PMID: 34267211]
  8. PLoS Comput Biol. 2022 May 31;18(5):e1009123 [PMID: 35639788]
  9. Gigascience. 2021 Feb 16;10(2): [PMID: 33594436]

Grants

  1. R01 HG012252/NHGRI NIH HHS
  2. /National Human Genome Research Insititute
  3. EOSS4-0000000180/Chan Zuckerberg Insititute's Essential Open Source Software Initiative

MeSH Term

Software
Sequence Analysis, DNA
Genetic Variation
Programming Languages
High-Throughput Nucleotide Sequencing
Humans

Word Cloud

Created with Highcharts 10.0.0formatVCFtoolsvcfexpressfilesfilteravailableBCFToolsvembraneprogramminglanguageMOTIVATION:VariantcallstandardoutputvarioussoftwareidentifygeneticvariationDNAsequencingexperimentsDownstreamanalysesrequireabilityquerymodifysimplyefficientlySeveralperformoperationscommandlineincludingslivarothersRESULTS:introducenewhigh-performancetoolsetanalysiswrittenRustnearlyfastaddsfunctionalityexecuteuserexpressionsluaprecisefilteringreportingvariantsBCFfiledemonstrateperformanceflexibilitycomparingusingbenchmarkAVAILABILITYANDIMPLEMENTATION:MITlicensehttps://githubcom/brentp/vcfexpresscodeusedmanuscriptdepositedhttps://doiorg/105281/zenodo14756838Vcfexpress:flexiblerapiduser-expressionsVCFs

Similar Articles

Cited By

No available data.