Photonic reinforcement learning based on optoelectronic reservoir computing.

Kazutaka Kanno, Atsushi Uchida
Author Information
  1. Kazutaka Kanno: Department of Information and Computer Sciences, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama, 338-8570, Japan. kkanno@mail.saitama-u.ac.jp.
  2. Atsushi Uchida: Department of Information and Computer Sciences, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama City, Saitama, 338-8570, Japan.

Abstract

Reinforcement learning has been intensively investigated and developed in artificial intelligence in the absence of training data, such as autonomous driving vehicles, robot control, internet advertising, and elastic optical networks. However, the computational cost of reinforcement learning with deep neural networks is extremely high and reducing the learning cost is a challenging issue. We propose a photonic on-line implementation of reinforcement learning using optoelectronic delay-based reservoir computing, both experimentally and numerically. In the proposed scheme, we accelerate reinforcement learning at a rate of several megahertz because there is no required learning process for the internal connection weights in reservoir computing. We perform two benchmark tasks, CartPole-v0 and MountanCar-v0 tasks, to evaluate the proposed scheme. Our results represent the first hardware implementation of reinforcement learning based on photonic reservoir computing and pave the way for fast and efficient reinforcement learning as a novel photonic accelerator.

References

  1. Andrae, A. & Edler, T. On global electricity usage of communication technology: trends to 2030. Challenges 6, 117–157 (2015). [DOI: 10.3390/challe6010117]
  2. Haghighat, M. H. & Li, J. Intrusion detection system using voting-based neural network. Tsinghua Sci. Technol. 26, 484–495 (2021). [DOI: 10.26599/TST.2020.9010022]
  3. Zhang, J. & Xu, Q. Attention-aware heterogeneous graph neural network. Big Data Min. Anal. 4, 233–241 (2021). [DOI: 10.26599/BDMA.2021.9020008]
  4. Bie, Y. & Yang, Y. A multitask multiview neural network for end-to-end aspect-based sentiment analysis. Big Data Min. Anal. 4, 195–207 (2021). [DOI: 10.26599/BDMA.2021.9020003]
  5. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (The MIT Press, Cambridge, 2018).
  6. Zhou, W. et al. Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning. Neurocomputing 466, 285–297 (2021). [DOI: 10.1016/j.neucom.2021.09.044]
  7. Zhu, K. & Zhang, T. Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci. Technol. 26, 674–691 (2021). [DOI: 10.26599/TST.2021.9010012]
  8. Sharma, P. et al. Role of machine learning and deep learning in securing 5G-driven industrial IoT applications. Ad Hoc Netw. 123, 102685 (2021). [DOI: 10.1016/j.adhoc.2021.102685]
  9. Chen, X. et al. DeepRMSA: a deep reinforcement learning framework for routing, modulation and spectrum assignment in elastic optical networks. J. Lightwave Technol. 37, 4155–4163 (2019). [DOI: 10.1109/JLT.2019.2923615]
  10. Badia, A. P. et al. Agent57: Outperforming the Atari Human Benchmark. Preprint at https://arxiv.org/abs/2003.13350 (2020).
  11. Kaiser, Ł. et al. Model based reinforcement learning for Atari. in Proc of International Conference on Learning Representations (ICLR) 2020 (2020).
  12. Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019). [PMID: 31666705]
  13. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). [PMID: 25719670]
  14. Graves, A. et al. Hybrid computing using a neural network with dynamic external memory. Nature 538, 471–476 (2016). [PMID: 27732574]
  15. Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F., The computational limits of deep learning. Preprint at https://arxiv.org/abs/2007.05558v1 (2020).
  16. Soltanolkotabi, M., Javanmard, A. & Lee, J. Theoretical insights into the optimization landscape of over-parameterized shallow neural networks. IEEE Trans. Inf. Theory 65, 742–769 (2019). [DOI: 10.1109/TIT.2018.2854560]
  17. Xie, Q., Minh-Thang, L., Eduard, H., & Quoc V. L. Self-training with noisy student improves ImageNet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10687–10698 (2020).
  18. Schaul, T., Quan, J., Antonoglou, I., & Silver, D., Prioritized experience replay. Preprint at https://arxiv.org/abs/1511.05952 (2016).
  19. Chang, H. & Futagami, K. Reinforcement learning with convolutional reservoir computing. Appl. Intell. 50, 2400–2410 (2020). [DOI: 10.1007/s10489-020-01679-3]
  20. Szita, I., Gyenes, V., & Lőrincz, A., Reinforcement learning with echo state networks. ICANN2006 4131, 830–839 (2006).
  21. Jaeger, H. & Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304, 78–80 (2004). [PMID: 15064413]
  22. Lukoševičius, M. & Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 3, 127–149 (2009). [DOI: 10.1016/j.cosrev.2009.03.005]
  23. Tanaka, G. et al. Recent advances in physical reservoir computing: a review. Neural Netw. 115, 100–123 (2019). [PMID: 30981085]
  24. Torrejon, J. et al. Neuromorphic computing with nanoscale spintronic oscillators. Nature 547, 428–431 (2017). [PMID: 28748930]
  25. Nakajima, K., Hauser, H., Li, T. & Pfeifer, R. Information processing via physical soft body. Sci. Rep. 5, 10487 (2015). [PMID: 26014748]
  26. Shastri, B. J. et al. Photonics for artificial intelligence and neuromorphic computing. Nat. Photon. 15, 102–114 (2021). [DOI: 10.1038/s41566-020-00754-y]
  27. Genty, G. et al. Machine learning and applications in ultrafast photonics. Nat. Photon. 15, 91–101 (2021). [DOI: 10.1038/s41566-020-00716-4]
  28. Moughames, J. et al. Three-dimensional waveguide interconnects for scalable integration of photonic neural networks. Optica 7, 640–646 (2020). [DOI: 10.1364/OPTICA.388205]
  29. Kitayama, K. et al. Novel frontier of photonics for data processing—photonic accelerator. APL Photon. 4, 090901 (2019). [DOI: 10.1063/1.5108912]
  30. Paquot, Y. et al. Optoelectronic reservoir computing. Sci. Rep. 2, 287 (2012). [PMID: 22371825]
  31. Martinenghi, R., Rybalko, S., Jacquot, M., Chembo, Y. K. & Larger, L. Photonic nonlinear transient computing with multiple-delay wavelength dynamics. Phys. Rev. Lett. 108, 244101 (2012). [PMID: 23004274]
  32. Bueno, J., Brunner, D., Soriano, M. C. & Fischer, I. Conditions for reservoir computing performance using semiconductor lasers with delayed optical feedback. Opt. Exp. 25, 2401–2412 (2017). [DOI: 10.1364/OE.25.002401]
  33. Duport, F., Schneider, B., Smerieri, A., Haelterman, M. & Massar, S. All-optical reservoir computing. Opt. Exp. 20, 22783–22795 (2012). [DOI: 10.1364/OE.20.022783]
  34. Sugano, C., Kanno, K. & Uchida, A. Reservoir computing using multiple lasers with feedback on a photonic integrated circuit. IEEE J. Sel. Top. Quantum Electron. 26, 1500409 (2020). [DOI: 10.1109/JSTQE.2019.2929179]
  35. Antonik, P., Marsal, N., Brunner, D. & Rontani, D. Human action recognition with a large-scale brain-inspired photonic computer. Nat. Mach. Intell. 1, 530–537 (2019). [DOI: 10.1038/s42256-019-0110-8]
  36. Brunner, D., Soriano, M. C., Mirasso, C. R. & Fischer, I. Parallel photonic information processing at gigabyte per second data rates using transient states. Nat. Commun. 4, 1364 (2013). [PMID: 23322052]
  37. Marchisio, A. et al. Deep learning for edge computing: current trends, cross-layer optimizations, and open research challenges. In Proceeding of 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) 553–559 (2019).
  38. Larger, L. et al. Photonic information processing beyond turing: an optoelectronic implementation of reservoir computing. Opt. Express 20, 3241–3249 (2012). [PMID: 22330562]
  39. Larger, L.et al. High-speed photonic reservoir computing using a time-delay-based architecture: Million words per second classification. Phys. Rev. X 7, 011015 (2017).
  40. Appeltant, L. et al. Information processing using a single dynamical node as a complex system. Nat. Commun. 2, 468 (2011). [PMID: 21915110]
  41. Soriano, M. C. et al. Optoelectronic reservoir computing: tackling noise-induced performance degradation. Opt. Express 21, 12–20 (2013). [PMID: 23388891]
  42. Larger, L. & Dudley, J. M. Nonlinear dynamics: Optoelectronic chaos. Nature 465, 41–42 (2010). [PMID: 20445617]
  43. Chembo, Y. K., Brunner, D., Jacquot, M. & Larger, L. Optoelectronic oscillators with time-delayed feedback. Rev. Mod. Phys. 91, 035006 (2019). [DOI: 10.1103/RevModPhys.91.035006]
  44. Murphy, T. E. et al. Complex dynamics and synchronization of delayed-feedback nonlinear oscillators. Phil. Trans. R. Soc. A 368, 343–366 (2010). [PMID: 20008405]
  45. Ortín, S. et al. Aunified framework for reservoir computing and extreme learning machines based on a single time-delayed neuron. Sci. Rep. 5, 14945 (2015). [PMID: 26446303]
  46. Stelzer, F., Röhm, A., Lüdge, K. & Yanchuk, S. Performance boost of time-delay reservoir computing by non-resonant clock cycle. Neural Netw. 124, 158–169 (2020). [PMID: 32006747]
  47. Brockman, G. et al. OpenAI Gym. Preprint at https://arxiv.org/abs/1606.01540 (2016).
  48. Kumar, S. Balancing a CartPole System with Reinforcement Learning - A Tutorial. Preprint at https://arxiv.org/abs/2006.04938 (2020).
  49. Van Hasselt, H., Guez, A., & Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of Thirtieth AAAI Conference on Artifficial Intelligence (2016).
  50. Uchida, A., McAllister, R. & Roy, R. Consistency of nonlinear system response to complex drive signals. Phys. Rev. Lett. 93, 244102 (2004). [PMID: 15697817]
  51. Nakayama, J., Kanno, K. & Uchida, A. Laser dynamical reservoir computing with consistency: an approach of a chaos mask signal. Opt. Express 24, 8679–8692 (2016). [PMID: 27137303]
  52. O’Neill, J., Pleydell-Bouverie, B., Dupret, D. & Csicsvari, J. Play it again: reactivation of waking experience and memory. Trends Neurosci. 33, 220–229 (2010). [PMID: 20207025]
  53. Duport, F., Smerieri, A., Akrout, A., Haelterman, M. & Massar, S. Fully analogue photonic reservoir computer. Sci. Rep. 6, 22381 (2016). [PMID: 26935166]

Grants

  1. JP19H00868/Japan Society for the Promotion of Science
  2. JP19H00868/Japan Society for the Promotion of Science
  3. JP-MJCR17N2/Core Research for Evolutional Science and Technology
  4. JP-MJCR17N2/Core Research for Evolutional Science and Technology