Compressing atmospheric data into its real information content.

Milan Klöwer, Miha Razinger, Juan J Dominguez, Peter D Düben, Tim N Palmer
Author Information
  1. Milan Klöwer: Atmospheric, Oceanic and Planetary Physics, University of Oxford, Oxford, UK. milan.kloewer@physics.ox.ac.uk. ORCID
  2. Miha Razinger: European Centre for Medium-Range Weather Forecasts, Reading, UK.
  3. Juan J Dominguez: European Centre for Medium-Range Weather Forecasts, Reading, UK.
  4. Peter D Düben: European Centre for Medium-Range Weather Forecasts, Reading, UK.
  5. Tim N Palmer: Atmospheric, Oceanic and Planetary Physics, University of Oxford, Oxford, UK.

Abstract

Hundreds of petabytes are produced annually at weather and climate forecast centers worldwide. Compression is essential to reduce storage and to facilitate data sharing. Current techniques do not distinguish the real from the false information in data, leaving the level of meaningful precision unassessed. Here we define the bitwise real information content from information theory for the Copernicus Atmospheric Monitoring Service (CAMS). Most variables contain fewer than 7 bits of real information per value and are highly compressible due to spatio-temporal correlation. Rounding bits without real information to zero facilitates lossless compression algorithms and encodes the uncertainty within the data itself. All CAMS data are 17× compressed relative to 64-bit floats, while preserving 99% of real information. Combined with four-dimensional compression, factors beyond 60× are achieved. A data compression Turing test is proposed to optimize compressibility while minimizing information loss for the end use of weather and climate forecast data.

References

  1. Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Jun;69(6 Pt 2):066138 [PMID: 15244698]
  2. Nature. 2014 Nov 20;515(7527):338-9 [PMID: 25409812]
  3. Phys Rev Lett. 2000 Jul 10;85(2):461-4 [PMID: 10991308]
  4. IEEE Trans Vis Comput Graph. 2006 Sep-Oct;12(5):1245-50 [PMID: 17080858]
  5. Proc Math Phys Eng Sci. 2017 Sep;473(2205):20170144 [PMID: 28989303]
  6. Science. 2020 Oct 2;370(6512):16-17 [PMID: 33004491]
  7. Nature. 2015 Oct 1;526(7571):32-3 [PMID: 26432226]
  8. IEEE Trans Vis Comput Graph. 2020 Sep;26(9):2891-2903 [PMID: 30869621]
  9. Nature. 2015 Sep 3;525(7567):47-55 [PMID: 26333465]
  10. IEEE Trans Image Process. 2004 Apr;13(4):600-12 [PMID: 15376593]
  11. IEEE Trans Vis Comput Graph. 2014 Dec;20(12):2674-83 [PMID: 26356981]
  12. Nat Comput Sci. 2021 Nov;1(11):713-724 [PMID: 38217145]
  13. J Adv Model Earth Syst. 2022 Sep;14(9):e2022MS003148 [PMID: 36248012]
  14. J Adv Model Earth Syst. 2020 Oct;12(10):e2020MS002246 [PMID: 33282116]

Grants

  1. NE/L002612/1/RCUK | Natural Environment Research Council (NERC)
  2. 823988/EC | Horizon 2020 Framework Programme (EU Framework Programme for Research and Innovation H2020)

Word Cloud

Created with Highcharts 10.0.0informationdatarealcompressionweatherclimateforecastcontentCAMSHundredspetabytesproducedannuallycentersworldwideCompressionessentialreducestoragefacilitatesharingCurrenttechniquesdistinguishfalseleavinglevelmeaningfulprecisionunassesseddefinebitwisetheoryCopernicusAtmosphericMonitoringServicevariablescontainfewer7 bitspervaluehighlycompressibleduespatio-temporalcorrelationRoundingbitswithoutzerofacilitateslosslessalgorithmsencodesuncertaintywithin17×compressedrelative64-bitfloatspreserving99%Combinedfour-dimensionalfactorsbeyond60×achievedTuringtestproposedoptimizecompressibilityminimizinglossenduseCompressingatmospheric

Similar Articles

Cited By