Using Distributed Data over HBase in Big Data Analytics Platform for Clinical Services.

Dillon Chrimes, Hamid Zamani
Author Information
  1. Dillon Chrimes: Database Integration and Management, IMIT Quality Systems, Vancouver Island Health Authority, Vancouver, BC, Canada V8R 1J8. ORCID
  2. Hamid Zamani: School of Health Information Science, Faculty of Human and Social Development, University of Victoria, Victoria, BC, Canada V8P 5C2.


Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.


  1. Genome Med. 2015 Sep 30;7:100 [PMID: 26419432]
  2. Sci Transl Med. 2012 Oct 3;4(154):154ra135 [PMID: 23035047]
  3. BioData Min. 2014 Oct 29;7:22 [PMID: 25383096]
  4. BMC Bioinformatics. 2010 Dec 21;11 Suppl 12:S1 [PMID: 21210976]
  5. Biomed Res Int. 2015;2015:639021 [PMID: 26137488]
  6. Yearb Med Inform. 2014 Aug 15;9:21-6 [PMID: 25123717]
  7. J Med Syst. 2015 Mar;39(3):23 [PMID: 25666927]
  8. J Am Med Inform Assoc. 2013 Jan 1;20(1):25-8 [PMID: 22935136]
  9. NPJ Genom Med. 2016 Jan 13;1:15007 [PMID: 29263805]
  10. Yearb Med Inform. 2014 Aug 15;9:27-35 [PMID: 25123718]
  11. Int J Med Inform. 2010 Sep;79(9):599-610 [PMID: 20615752]
  12. J Gen Intern Med. 2013 Sep;28 Suppl 3:S660-5 [PMID: 23797912]
  13. BMC Genomics. 2014;15 Suppl 8:S3 [PMID: 25435347]
  14. Perspect Health Inf Manag. 2013 Oct 01;10:1a [PMID: 24159269]
  15. Yearb Med Inform. 2014 Aug 15;9:97-104 [PMID: 25123728]
  16. ScientificWorldJournal. 2014;2014:712826 [PMID: 25136682]
  17. BMC Bioinformatics. 2013;14 Suppl 16:S6 [PMID: 24564704]

MeSH Term

British Columbia
Clinical Trials as Topic
Data Collection
Electronic Health Records
Machine Learning
Medical Informatics
Medical Records
Programming Languages
Reproducibility of Results

Word Cloud

Created with Highcharts 10.0.0dataHBasepatientusingDistributedclinicalBigBDAHoweverplatformHDFSbillionrecordsMapReducehighusabilityservicesDataanalyticsimportantreducehealthcarecostsmanychallengesaggregationmaintenanceintegrationtranslationanalysissecurity/privacystudyobjectiveestablishinteractivesimulatedopen-sourcesoftwaretechnologiesachievedconstructionframeworkHadoopFileSystemkey-valueNoSQLdatabasestructuresgeneratedbenchmarkedhospital-specificmetadatanineoptimizediterationingestionHFilesstorefilesrevealedsustainedavailabilityhundredsiterationshowevercompleterequiredweek10 TBmonththree30 TBindexedrespectivelyFoundinconsistencieslimitedcapacitygeneratereplicateefficientlyApacheSparkDrillshowedperformancetechnicalsupportpoorHospitalsystembasedpatient-centricchallengingwherebyprofilesfullyintegratedcomplexpatient-to-hospitalrelationshipsrecommendachievesecuredqueryingentirehospitalvolumessimplifiedeventmodelacrossUsingAnalyticsPlatformClinicalServices

Similar Articles

Cited By (4)