On Entropic Learning from Noisy Time Series in the Small Data Regime.

Davide Bassetti, Lukáš Pospíšil, Illia Horenko
Author Information
  1. Davide Bassetti: Faculty of Mathematics, RPTU Kaiserslautern-Landau, Gottlieb-Daimler-Str. 48, 67663 Kaiserslautern, Germany. ORCID
  2. Lukáš Pospíšil: Department of Mathematics, Faculty of Civil Engineering, VŠB-TUO, Ludvika Podeste 1875/17, 708 33 Ostrava, Czech Republic. ORCID
  3. Illia Horenko: Faculty of Mathematics, RPTU Kaiserslautern-Landau, Gottlieb-Daimler-Str. 48, 67663 Kaiserslautern, Germany.

Abstract

In this work, we present a novel methodology for performing the supervised classification of time-ordered noisy data; we call this methodology Entropic Sparse Probabilistic Approximation with Markov regularization (eSPA-Markov). It is an extension of entropic learning methodologies, allowing the simultaneous learning of segmentation patterns, entropy-optimal feature space discretizations, and Bayesian classification rules. We prove the conditions for the existence and uniqueness of the learning problem solution and propose a one-shot numerical learning algorithm that-in the leading order-scales linearly in dimension. We show how this technique can be used for the computationally scalable identification of persistent (metastable) regime affiliations and regime switches from high-dimensional non-stationary and noisy time series, i.e., when the size of the data statistics is small compared to their dimensionality and when the noise variance is larger than the variance in the signal. We demonstrate its performance on a set of toy learning problems, comparing eSPA-Markov to state-of-the-art techniques, including deep learning and random forests. We show how this technique can be used for the analysis of noisy time series from DNA and RNA Nanopore sequencing.

Keywords

References

  1. Neural Comput. 2020 Aug;32(8):1563-1579 [PMID: 32521216]
  2. Nat Mach Intell. 2019 May;1(5):206-215 [PMID: 35603010]
  3. Sci Adv. 2020 Jan 29;6(5):eaaw0961 [PMID: 32064328]
  4. Neural Comput. 2024 May 10;36(6):1198-1227 [PMID: 38669692]
  5. Brief Bioinform. 2022 Jan 17;23(1): [PMID: 34791014]
  6. iScience. 2021 Feb 09;24(3):102171 [PMID: 33665584]
  7. Proc Natl Acad Sci U S A. 2023 Jan 3;120(1):e2214972120 [PMID: 36580592]
  8. Nat Commun. 2019 Apr 23;10(1):1869 [PMID: 31015479]
  9. Nat Biotechnol. 2018 Apr;36(4):338-345 [PMID: 29431738]
  10. Biotechnol Adv. 2021 Jul-Aug;49:107739 [PMID: 33794304]
  11. Proc Natl Acad Sci U S A. 2022 Mar 1;119(9): [PMID: 35197293]
  12. J Imaging. 2022 May 31;8(6): [PMID: 35735955]
  13. Neural Comput. 1997 Nov 15;9(8):1735-80 [PMID: 9377276]
  14. Neural Comput. 2022 Apr 15;34(5):1220-1255 [PMID: 35344997]
  15. Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279 [PMID: 27646134]
  16. Br J Math Stat Psychol. 2006 May;59(Pt 1):1-34 [PMID: 16709277]
  17. Nat Biotechnol. 2018 Dec 03;: [PMID: 30531897]
  18. Nat Biotechnol. 2008 Oct;26(10):1146-53 [PMID: 18846088]
  19. Natl Sci Rev. 2014 Jun;1(2):293-314 [PMID: 25419469]

Word Cloud

Created with Highcharts 10.0.0learningnoisydatatimeseriesmethodologyclassificationEntropicMarkoveSPA-Markoventropicshowtechniquecanusedregimesmallvarianceworkpresentnovelperformingsupervisedtime-orderedcallSparseProbabilisticApproximationregularizationextensionmethodologiesallowingsimultaneoussegmentationpatternsentropy-optimalfeaturespacediscretizationsBayesianrulesproveconditionsexistenceuniquenessproblemsolutionproposeone-shotnumericalalgorithmthat-inleadingorder-scaleslinearlydimensioncomputationallyscalableidentificationpersistentmetastableaffiliationsswitcheshigh-dimensionalnon-stationaryiesizestatisticscompareddimensionalitynoiselargersignaldemonstrateperformancesettoyproblemscomparingstate-of-the-arttechniquesincludingdeeprandomforestsanalysisDNARNANanoporesequencingLearningNoisyTimeSeriesSmallDataRegimeprocessesAImachine

Similar Articles

Cited By