SODAR: managing multiomics study data and metadata.

Mikko Nieminen, Oliver Stolpe, Mathias Kuhring, January Weiner, Patrick Pett, Dieter Beule, Manuel Holtgrewe
Author Information
  1. Mikko Nieminen: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  2. Oliver Stolpe: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  3. Mathias Kuhring: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  4. January Weiner: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  5. Patrick Pett: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  6. Dieter Beule: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID
  7. Manuel Holtgrewe: Berlin Institute of Health at Charit��-Universit��tsmedizin Berlin, Core Unit Bioinformatics (CUBI), Berlin 10117 , Germany. ORCID

Abstract

Scientists employing omics in life science studies face challenges such as the modeling of multiassay studies, recording of all relevant parameters, and managing many samples with their metadata. They must manage many large files that are the results of the assays or subsequent computation. Users with diverse backgrounds, ranging from computational scientists to wet-lab scientists, have dissimilar needs when it comes to data access, with programmatic interfaces being favored by the former and graphical ones by the latter. We introduce SODAR, the system for omics data access and retrieval. SODAR is a software package that addresses these challenges by providing a web-based graphical user interface for managing multiassay studies and describing them using the ISA (Investigation, Study, Assay) data model and the ISA-Tab file format. Data storage is handled using the iRODS data management system, which handles large quantities of files and substantial amounts of data. SODAR also offers programmable APIs and command-line access for metadata and file storage. SODAR supports complex omics integration studies and can be easily installed. The software is written in Python 3 and freely available at https://github.com/bihealth/sodar-server under the MIT license.

Keywords

References

  1. Nat Genet. 2012 Jan 27;44(2):121-6 [PMID: 22281772]
  2. BMC Syst Biol. 2015 Jul 11;9:33 [PMID: 26160520]
  3. JMIR Med Inform. 2022 Jan 27;10(1):e30363 [PMID: 35084343]
  4. Nature. 2015 Apr 23;520(7548):436 [PMID: 25903618]
  5. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  6. PeerJ. 2020 Feb 19;8:e8607 [PMID: 32117635]
  7. Methods Mol Biol. 2016;1418:93-110 [PMID: 27008011]
  8. Bioinformatics. 2019 Mar 15;35(6):1076-1078 [PMID: 30165396]
  9. Nucleic Acids Res. 2021 Jan 8;49(D1):D1207-D1217 [PMID: 33264411]
  10. Bioinformatics. 2016 Feb 15;32(4):638-40 [PMID: 26508761]
  11. Nat Biotechnol. 2011 Jan;29(1):24-6 [PMID: 21221095]
  12. PLoS One. 2018 Jan 19;13(1):e0191603 [PMID: 29352322]
  13. J Lab Autom. 2013 Aug;18(4):264-8 [PMID: 23592569]
  14. Gigascience. 2021 Dec 6;10(12): [PMID: 34890448]
  15. Brief Bioinform. 2013 Mar;14(2):144-61 [PMID: 22908213]
  16. BMC Bioinformatics. 2011 Sep 09;12:361 [PMID: 21906284]
  17. Gigascience. 2022 Dec 28;12: [PMID: 37498129]
  18. Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43 [PMID: 22139910]
  19. Nucleic Acids Res. 2020 Jul 2;48(W1):W162-W169 [PMID: 32338743]
  20. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D514-7 [PMID: 15608251]
  21. Sci Data. 2016 Jun 21;3:160044 [PMID: 27326542]
  22. Nucleic Acids Res. 2011 Jan;39(Database issue):D19-21 [PMID: 21062823]
  23. Nucleic Acids Res. 2022 Jan 7;50(D1):D980-D987 [PMID: 34791407]
  24. Nucleic Acids Res. 2022 Jan 7;50(D1):D1500-D1507 [PMID: 34747489]
  25. Nucleic Acids Res. 2014 Jan;42(Database issue):D975-9 [PMID: 24297256]
  26. Nucleic Acids Res. 2020 Jan 8;48(D1):D440-D444 [PMID: 31691833]

MeSH Term

Multiomics
Metadata
Software
Information Storage and Retrieval
Data Management