FAIRSCAPE: a Framework for FAIR and Reproducible Biomedical Analytics.

Maxwell Adam Levinson, Justin Niestroy, Sadnan Al Manir, Karen Fairchild, Douglas E Lake, J Randall Moorman, Timothy Clark
Author Information
  1. Maxwell Adam Levinson: Department of Public Health Sciences (Biomedical Informatics), University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  2. Justin Niestroy: Department of Public Health Sciences (Biomedical Informatics), University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  3. Sadnan Al Manir: Department of Public Health Sciences (Biomedical Informatics), University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  4. Karen Fairchild: Department of Pediatrics, University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  5. Douglas E Lake: Center for Advanced Medical Analytics, University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  6. J Randall Moorman: Center for Advanced Medical Analytics, University of Virginia School of Medicine, Charlottesville, VA, USA. ORCID
  7. Timothy Clark: Department of Public Health Sciences (Biomedical Informatics), University of Virginia School of Medicine, Charlottesville, VA, USA. twclark@virginia.edu. ORCID

Abstract

Results of computational analyses require transparent disclosure of their supporting resources, while the analyses themselves often can be very large scale and involve multiple processing steps separated in time. Evidence for the correctness of any analysis should include not only a textual description, but also a formal record of the computations which produced the result, including accessible data and software with runtime parameters, environment, and personnel involved. This article describes FAIRSCAPE, a reusable computational framework, enabling simplified access to modern scalable cloud-based components. FAIRSCAPE fully implements the FAIR data principles and extends them to provide fully FAIR Evidence, including machine-interpretable provenance of datasets, software and computations, as metadata for all computed results. The FAIRSCAPE microservices framework creates a complete Evidence Graph for every computational result, including persistent identifiers with metadata, resolvable to the software, computations, and datasets used in the computation; and stores a URI to the root of the graph in the result's metadata. An ontology for Evidence Graphs, EVI ( https://w3id.org/EVI ), supports inferential reasoning over the evidence. FAIRSCAPE can run nested or disjoint workflows and preserves provenance across them. It can run Apache Spark jobs, scripts, workflows, or user-supplied containers. All objects are assigned persistent IDs, including software. All results are annotated with FAIR metadata using the evidence graph model for access, validation, reproducibility, and re-use of archived data and software.

Keywords

References

  1. Neuron. 2016 May 4;90(3):434-6 [PMID: 27151636]
  2. Cancer Res. 2017 Nov 1;77(21):e3-e6 [PMID: 29092927]
  3. Sci Data. 2018 Nov 20;5:180259 [PMID: 30457573]
  4. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  5. J Cheminform. 2017 Nov 14;9(1):56 [PMID: 29138947]
  6. Genome Res. 2003 Nov;13(11):2498-504 [PMID: 14597658]
  7. Brain Behav. 2019 Jan;9(1):e01141 [PMID: 30506879]
  8. J Biomed Semantics. 2014 Jul 04;5:28 [PMID: 26261718]
  9. Gigascience. 2019 Nov 1;8(11): [PMID: 31675414]
  10. Nat Genet. 2017 Oct 27;49(11):1560-1563 [PMID: 29074945]
  11. Trends Genet. 2019 Mar;35(3):223-234 [PMID: 30691868]
  12. F1000Res. 2020 Oct 19;9:1257 [PMID: 33500780]
  13. Patterns (N Y). 2020 May 08;1(2):100016 [PMID: 33205093]
  14. J Eval Clin Pract. 2011 Apr;17(2):389-93 [PMID: 21392183]
  15. Sci Data. 2019 Apr 10;6(1):28 [PMID: 30971690]
  16. BMJ. 2009 Jul 20;339:b2680 [PMID: 19622839]
  17. Cancer Res. 2017 Nov 1;77(21):e15-e18 [PMID: 29092930]
  18. Curr Protoc Bioinformatics. 2017 Dec 8;60:11.16.1-11.16.32 [PMID: 29220078]
  19. PLoS Biol. 2018 Dec 31;16(12):e3000099 [PMID: 30596645]
  20. PeerJ Comput Sci. 2015;1: [PMID: 26167542]

Grants

  1. R01 HD072071/NICHD NIH HHS
  2. OT3 OD025456/NIH HHS
  3. U01 HG009452/NHGRI NIH HHS
  4. U01 HL133708/NHLBI NIH HHS

MeSH Term

Metadata
Reproducibility of Results
Software
Workflow

Word Cloud

Created with Highcharts 10.0.0softwareFAIREvidenceincludingdataFAIRSCAPEmetadatacomputationalcancomputationsgraphanalysesresultframeworkaccessfullyprovenancedatasetsresultspersistentevidencerunworkflowsResultsrequiretransparentdisclosuresupportingresourcesoftenlargescaleinvolvemultipleprocessingstepsseparatedtimecorrectnessanalysisincludetextualdescriptionalsoformalrecordproducedaccessibleruntimeparametersenvironmentpersonnelinvolvedarticledescribesreusableenablingsimplifiedmodernscalablecloud-basedcomponentsimplementsprinciplesextendsprovidemachine-interpretablecomputedmicroservicescreatescompleteGrapheveryidentifiersresolvableusedcomputationstoresURIrootresult'sontologyGraphsEVIhttps://w3idorg/EVIsupportsinferentialreasoningnesteddisjointpreservesacrossApacheSparkjobsscriptsuser-suppliedcontainersobjectsassignedIDsannotatedusingmodelvalidationreproducibilityre-usearchivedFAIRSCAPE:FrameworkReproducibleBiomedicalAnalyticsAgumentationDigitalCommonsProvenanceReproducibility

Similar Articles

Cited By