EGAsubmitter: A software to automate submission of nucleic acid sequencing data to the European Genome-phenome Archive.

Marco Viviani, Marilisa Montemurro, Livio Trusolino, Andrea Bertotti, Gianvito Urgese, Elena Grassi
Author Information
  1. Marco Viviani: Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy.
  2. Marilisa Montemurro: Politecnico di Torino, Turin, Italy.
  3. Livio Trusolino: Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy.
  4. Andrea Bertotti: Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy.
  5. Gianvito Urgese: Politecnico di Torino, Turin, Italy.
  6. Elena Grassi: Candiolo Cancer Institute-FPO IRCCS, Candiolo, Italy.

Abstract

Making raw data available to the research community is one of the pillars of Findability, Accessibility, Interoperability, and Reuse (FAIR) research. However, the submission of raw data to public databases still involves many manually operated procedures that are intrinsically time-consuming and error-prone, which raises potential reliability issues for both the data themselves and the ensuing metadata. For example, submitting sequencing data to the European Genome-phenome Archive (EGA) is estimated to take 1 month overall, and mainly relies on a web interface for metadata management that requires manual completion of forms and the upload of several comma separated values (CSV) files, which are not structured from a formal point of view. To tackle these limitations, here we present EGAsubmitter, a Snakemake-based pipeline that guides the user across all the submission steps, ranging from files encryption and upload, to metadata submission. EGASubmitter is expected to streamline the automated submission of sequencing data to EGA, minimizing user errors and ensuring higher end product fidelity.

Keywords

References

  1. J Cheminform. 2016 Oct 14;8:54 [PMID: 27795738]
  2. BMC Bioinformatics. 2014;15 Suppl 1:S11 [PMID: 24564732]
  3. Nucleic Acids Res. 2010 Apr;38(6):1767-71 [PMID: 20015970]
  4. Gigascience. 2021 Sep 16;10(9): [PMID: 34528664]
  5. Genet Med. 2022 Apr;24(4):798-810 [PMID: 35065883]
  6. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21 [PMID: 21062823]
  7. PLoS Comput Biol. 2020 May 19;16(5):e1007881 [PMID: 32427998]
  8. BMC Bioinformatics. 2020 Mar 18;21(1):110 [PMID: 32183729]
  9. Nucleic Acids Res. 2022 Jan 7;50(D1):D980-D987 [PMID: 34791407]
  10. Cell. 2013 Sep 26;155(1):27-38 [PMID: 24074859]
  11. Bioinformatics. 2012 Oct 1;28(19):2520-2 [PMID: 22908215]
  12. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  13. Nature. 2016 May 25;533(7604):452-4 [PMID: 27225100]
  14. Clin Pharmacol Ther. 2012 Jun;91(6):994-1000 [PMID: 22549287]
  15. PLoS Comput Biol. 2020 Nov 10;16(11):e1008316 [PMID: 33170857]
  16. Sci Rep. 2018 Jul 25;8(1):11226 [PMID: 30046134]
  17. Sci Data. 2022 Sep 30;9(1):592 [PMID: 36180441]
  18. Bioinformatics. 2009 Aug 15;25(16):2078-9 [PMID: 19505943]

Word Cloud

Similar Articles

Cited By