HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards.

Andrew J Tritt, Oliver Rübel, Benjamin Dichter, Ryan Ly, Donghe Kang, Edward F Chang, Loren M Frank, Kristofer Bouchard
Author Information
  1. Andrew J Tritt: Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  2. Oliver Rübel: Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  3. Benjamin Dichter: Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  4. Ryan Ly: Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
  5. Donghe Kang: Computer Science and Engineering, Ohio State University, Columbus, OH, USA.
  6. Edward F Chang: Department of Neurological Surgery and the Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, USA.
  7. Loren M Frank: Howard Hughes Medical Institute, Kavli Institute for Fundamental Neuroscience, Department of Physiology, University of California, San Francisco, San Francisco, CA.
  8. Kristofer Bouchard: Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Abstract

A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0 [13], a modern data standard for collaborative science across the neurophysiology community.

Keywords

References

  1. Nat Methods. 2012 Sep;9(9):854-5 [PMID: 22936162]
  2. Anal Chem. 2013 Nov 5;85(21):10354-61 [PMID: 24087878]
  3. Sci Data. 2016 Mar 15;3:160018 [PMID: 26978244]
  4. Front Neuroinform. 2016 Nov 04;10:48 [PMID: 27867355]

Grants

  1. R24 MH116922/NIMH NIH HHS

Word Cloud

Created with Highcharts 10.0.0dataHDMFstandardsstoragemodelingmodernscienceI/Oacrossinfrastructureflexiblestandardization2APIssupportprovidescomponentssupportsneurophysiologyDataubiquitousproblemaggregatingdifferentexperimentalobservationalsourceslacksoftwareenablesextensiblemetadataaddresschallengedevelopedhierarchicalframeworkseparateprocessthreemaincomponents:1specification3interactionenablecomplexrequirementsvaryingusecasesthroughoutlifecycleobjectmappinginsulateintegratevariousapproachdevelopmentextensionsoptimizedbackendsallowingecosystemremainstablemeetdemandslarge-scaleadvancedfunctionalityiterativewritelazyloadparallelalsooptimizationviachunkingcompressionlinkingmodulardemonstrateapplicationpracticedesignNWB0[13]standardcollaborativecommunityHDMF:HierarchicalModelingFrameworkModernScienceStandardsHDF5formats

Similar Articles

Cited By (3)