Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices.

Junyun Zhao, Siyuan Huang, Osama Yousuf, Yutong Gao, Brian D Hoskins, Gina C Adam
Author Information
  1. Junyun Zhao: Department of Computer Science, George Washington University, Washington, DC, United States.
  2. Siyuan Huang: Department of Computer Science, George Washington University, Washington, DC, United States.
  3. Osama Yousuf: Department of Electrical and Computer Engineering, George Washington University, Washington, DC, United States.
  4. Yutong Gao: Department of Computer Science, George Washington University, Washington, DC, United States.
  5. Brian D Hoskins: Physical Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States.
  6. Gina C Adam: Department of Electrical and Computer Engineering, George Washington University, Washington, DC, United States.

Abstract

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally () and externally () to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF outperforms streaming batch PCA at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.

Keywords

References

  1. Front Neurosci. 2017 Jun 21;11:324 [PMID: 28680387]
  2. J Math Biol. 1982;15(3):267-73 [PMID: 7153672]
  3. Nat Commun. 2018 Jun 28;9(1):2514 [PMID: 29955057]
  4. IEEE Trans Cybern. 2016 Jan;46(1):233-44 [PMID: 25706980]
  5. Nano Lett. 2010 Apr 14;10(4):1297-301 [PMID: 20192230]
  6. Front Neurosci. 2016 Jul 21;10:333 [PMID: 27493624]
  7. Nature. 2015 May 7;521(7550):61-4 [PMID: 25951284]
  8. PLoS One. 2014 Feb 10;9(2):e85175 [PMID: 24520315]
  9. Faraday Discuss. 2019 Feb 18;213(0):487-510 [PMID: 30357205]
  10. Nature. 1999 Oct 21;401(6755):788-91 [PMID: 10548103]
  11. Nat Commun. 2018 Dec 10;9(1):5267 [PMID: 30531798]
  12. Nature. 2018 Jun;558(7708):60-67 [PMID: 29875487]
  13. Front Neurosci. 2020 Feb 26;14:103 [PMID: 32174807]
  14. Front Neurosci. 2019 Aug 06;13:793 [PMID: 31447628]
  15. Adv Mater. 2018 Mar;30(9): [PMID: 29318659]

Word Cloud

Created with Highcharts 10.0.0memristordecompositionstreamingbatchacceleratorsnon-idealitiestrainingGradientMBGDweightmemorygradientprincipalcomponentanalysisPCAnon-negativematrixfactorizationNMFmemristor-basedpromisinghigh-capacitymachinelearningdevicespreventsoftware-equivalentaccuraciesusedonlineworkusescombinationMini-BatchDescentaveragegradientsstochasticroundingavoidvanishingupdatesmethodskeepoverheadlowmini-batchSinceupdatetransferredmatricesefficientlyalsoinvestigateimpactreconstructingmatrixesinternallyexternallyarrayresultsshowalgorithmscanachievenearaccuracymulti-layerperceptrontrainedMNISTModifiedNationalInstituteStandardsTechnologydatabase310rankssignificantsavingsMoreoveroutperformslow-ranksmakingsuitablehardwareimplementationfutureDecompositionMethodsTrainingNeuralNetworksNon-idealSynapticDevicesReRAMdata

Similar Articles

Cited By