Cloud-based biomedical data storage and analysis for genomic research: Landscape analysis of data governance in emerging NIH-supported platforms.

Jacklyn M Dahlquist, Sarah C Nelson, Stephanie M Fullerton
Author Information
  1. Jacklyn M Dahlquist: Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA 98195, USA.
  2. Sarah C Nelson: Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
  3. Stephanie M Fullerton: Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA 98195, USA.

Abstract

The storage, sharing, and analysis of genomic data poses technical and logistical challenges that have precipitated the development of cloud-based computing platforms designed to facilitate collaboration and maximize the scientific utility of data. To understand cloud platforms' policies and procedures and the implications for different stakeholder groups, in summer 2021, we reviewed publicly available documents (N = 94) sourced from platform websites, scientific literature, and lay media for five NIH-funded cloud platforms (the All of Us Research Hub, NHGRI AnVIL, NHLBI BioData Catalyst, NCI Genomic Data Commons, and the Kids First Data Resource Center) and a pre-existing data sharing mechanism, dbGaP. Platform policies were compared across seven categories of data governance: data submission, data ingestion, user authentication and authorization, data security, data access, auditing, and sanctions. Our analysis finds similarities across the platforms, including reliance on a formal data ingestion process, multiple tiers of data access with varying user authentication and/or authorization requirements, platform and user data security measures, and auditing for inappropriate data use. Platforms differ in how data tiers are organized, as well as the specifics of user authentication and authorization across access tiers. Our analysis maps elements of data governance across emerging NIH-funded cloud platforms and as such provides a key resource for stakeholders seeking to understand and utilize data access and analysis options across platforms and to surface aspects of governance that may require harmonization to achieve the desired interoperability.

Keywords

References

  1. Qual Health Res. 2005 Nov;15(9):1277-88 [PMID: 16204405]
  2. Blood. 2017 Jul 27;130(4):453-459 [PMID: 28600341]
  3. Nat Genet. 2021 Jan;53(1):2-8 [PMID: 33414545]
  4. Sci Data. 2018 Mar 14;5:180039 [PMID: 29537396]
  5. Hugo J. 2014 Dec;8(1):3 [PMID: 27090251]
  6. Genet Med. 2023 Jan;25(1):115-124 [PMID: 36371759]
  7. Cell Genom. 2022 Jan 12;2(1): [PMID: 35199087]
  8. Am J Hum Genet. 2013 Apr 4;92(4):479-88 [PMID: 23561843]
  9. Cancer Discov. 2021 May;11(5):1082-1099 [PMID: 33408242]
  10. Cancer Res. 2017 Nov 1;77(21):e15-e18 [PMID: 29092930]
  11. Soc Sci Med. 2011 Aug;73(3):367-74 [PMID: 21726926]
  12. Cell Genom. 2021 Nov 10;1(2):None [PMID: 34820660]
  13. Genome Res. 2022 Jul 20;: [PMID: 35858750]
  14. J Law Med Ethics. 2019 Mar;47(1):62-69 [PMID: 30994077]

Grants

  1. R21 HG011501/NHGRI NIH HHS

MeSH Term

Humans
Cloud Computing
Population Health
Genomics
Genome
Information Storage and Retrieval

Word Cloud

Created with Highcharts 10.0.0dataplatformsanalysisacrossclouduseraccessgovernancesharinggenomicauthenticationauthorizationtiersstoragescientificunderstandpoliciesplatformNIH-fundedDataingestionsecurityauditingemergingposestechnicallogisticalchallengesprecipitateddevelopmentcloud-basedcomputingdesignedfacilitatecollaborationmaximizeutilityplatforms'proceduresimplicationsdifferentstakeholdergroupssummer2021reviewedpubliclyavailabledocumentsN =94sourcedwebsitesliteraturelaymediafiveUsResearchHubNHGRIAnVILNHLBIBioDataCatalystNCIGenomicCommonsKidsFirstResourceCenterpre-existingmechanismdbGaPPlatformcomparedsevencategoriesgovernance:submissionsanctionsfindssimilaritiesincludingrelianceformalprocessmultiplevaryingand/orrequirementsmeasuresinappropriateusePlatformsdifferorganizedwellspecificsmapselementsprovideskeyresourcestakeholdersseekingutilizeoptionssurfaceaspectsmayrequireharmonizationachievedesiredinteroperabilityCloud-basedbiomedicalresearch:LandscapeNIH-supporteddatabases

Similar Articles

Cited By (3)