A posteriori metadata from automated provenance tracking: integration of AiiDA and TCOD.

Andrius Merkys, Nicolas Mounet, Andrea Cepellotti, Nicola Marzari, Saulius Gražulis, Giovanni Pizzi
Author Information
  1. Andrius Merkys: Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland. andrius.merkys@gmail.com. ORCID
  2. Nicolas Mounet: Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.
  3. Andrea Cepellotti: Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.
  4. Nicola Marzari: Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.
  5. Saulius Gražulis: Institute of Biotechnology, Vilnius University, Saulėtekio al. 7, 10257, Vilnius, Lithuania. ORCID
  6. Giovanni Pizzi: Theory and Simulation of Materials (THEOS) and National Centre for Computational Design and Discovery of Novel Materials (MARVEL), 1015, Lausanne, Switzerland.

Abstract

In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and the TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes.

Keywords

References

  1. Acta Crystallogr B. 2002 Jun;58(Pt 3 Pt 1):364-9 [PMID: 12037357]
  2. Phys Rev B Condens Matter. 1996 Oct 15;54(16):11169-11186 [PMID: 9984901]
  3. Acta Crystallogr A. 2002 Jan;58(Pt 1):60-5 [PMID: 11752765]
  4. Angew Chem Int Ed Engl. 2014 Jan 13;53(3):662-71 [PMID: 24382699]
  5. J Phys Condens Matter. 2009 Sep 30;21(39):395502 [PMID: 21832390]
  6. Am J Epidemiol. 2006 May 1;163(9):783-9 [PMID: 16510544]
  7. J Appl Crystallogr. 2011 Dec 1;44(Pt 6):1259-1263 [PMID: 22199401]
  8. J Appl Crystallogr. 2016 Feb 01;49(Pt 1):292-301 [PMID: 26937241]
  9. Nat Nanotechnol. 2018 Mar;13(3):246-252 [PMID: 29410499]
  10. Biostatistics. 2009 Jul;10(3):405-8 [PMID: 19535325]
  11. Science. 2010 Jan 22;327(5964):415-6 [PMID: 20093459]
  12. Science. 2011 Dec 2;334(6060):1226-7 [PMID: 22144613]
  13. Nucleic Acids Res. 2012 Jan;40(Database issue):D420-7 [PMID: 22070882]
  14. Acta Crystallogr B. 2002 Jun;58(Pt 3 Pt 1):317-24 [PMID: 12037350]

Grants

  1. 13.169/SCIEX
  2. 676598/European Union H2020-EINFRA-2015-1 programme
  3. 676598/European Union H2020-EINFRA-2015-1 programme

Word Cloud

Created with Highcharts 10.0.0computeddataAiiDAcomputationalscientificmetadataprotocolpropertiesTCODprovenancemakestructuresdatabasematerialsdepositautomatedfullscienceorderresultsresearchfindableaccessibleinteroperablere-usablenecessarydecoratestandardisedHowevernumbertechnicalpracticalchallengesprocessdifficultachievepracticeimplementationpresentedtagcrystalwithoutneedhumaninterventioncurateleveragescapabilitiesopen-sourceplatformmanageautomateworkflowsopen-accessstoringusingwell-definedexhaustiveontologyBasedcompleteprocedurerelevantextractedinformationtracksstoresautomaticallymanagingcalculationsalsoenablesreproducibilityfieldproofconceptAiiDA-TCODinterfaceused170theoreticaltogethergraphsconsisting4600nodesposterioritracking:integrationDFTMaterialsOntologyOpenProvenanceReproducibility

Similar Articles

Cited By