Of rats and robots: A mutual learning paradigm.

Oguzcan Nas, Defne Albayrak, Gunes Unal
Author Information
  1. Oguzcan Nas: Behavioral Neuroscience Laboratory, Department of Psychology, Bo��azi��i University, Istanbul, Turkey.
  2. Defne Albayrak: Behavioral Neuroscience Laboratory, Department of Psychology, Bo��azi��i University, Istanbul, Turkey.
  3. Gunes Unal: Behavioral Neuroscience Laboratory, Department of Psychology, Bo��azi��i University, Istanbul, Turkey. ORCID

Abstract

Robots are increasingly used alongside Skinner boxes to train animals in operant conditioning tasks. Similarly, animals are being employed in artificial intelligence research to train various algorithms. However, both types of experiments rely on unidirectional learning, where one partner-the animal or the robot-acts as the teacher and the other as the student. Here, we present a novel animal-robot interaction paradigm that enables bidirectional, or mutual, learning between a Wistar rat and a robot. The two agents interacted with each other to achieve specific goals, dynamically adjusting their actions based on the positive (rewarding) or negative (punishing) signals provided by their partner. The paradigm was tested in silico with two artificial reinforcement learning agents and in vivo with different rat-robot pairs. In the virtual trials, both agents were able to adapt their behavior toward reward maximization, achieving mutual learning. The in vivo experiments revealed that rats rapidly acquired the behaviors necessary to receive the reward and exhibited passive avoidance learning for negative signals when the robot displayed a steep learning curve. The developed paradigm can be used in various animal-machine interactions to test the efficacy of different learning rules and reinforcement schedules.

Keywords

References

  1. Abbeel, P., & Ng, A. Y. (2004, July 4���8). Apprenticeship learning via inverse reinforcement learning [Paper presentation]. ICML ���04: The Twenty���First International Conference on Machine learning, Banff, Alberta, Canada. https://doi.org/10.1145/1015330.1015430
  2. Abdai, J., Korcsok, B., Korondi, P., & Mikl��si, A. (2018). Methodological challenges of the use of robots in ethological research. Animal Behavior and Cognition, 5(4), 326���340. 10.26451/abc.05.04.02.2018
  3. Akmese, C., Sevinc, C., Halim, S., & Unal, G. (2023). Differential role of GABAergic and cholinergic ventral pallidal neurons in behavioral despair, conditioned fear memory and active coping. Progress in Neuro���Psychopharmacology & Biological Psychiatry, 125, Article 110760. https://doi.org/10.1016/J.PNPBP.2023.110760
  4. Asadpour, M., T��che, F., Caprari, G., Karlen, W., & Siegwart, R. (2006). Robot���animal interaction: Perception and behavior of insbot. International Journal of Advanced Robotic Systems, 3(2), 93���98. https://doi.org/10.5772/5752
  5. Baird, L. C. (1994). Reinforcement learning in continuous time: Advantage updating. International Conference on Neural Networks, 4, 2448���2453. https://doi.org/10.1109/ICNN.1994.374604
  6. Bierbach, D., Landgraf, T., Romanczuk, P., Lukas, J., Nguyen, H., Wolf, M., & Krause, J. (2018). Using a robotic fish to investigate individual differences in social responsiveness in the guppy. Royal Society Open Science, 5(8), Article 181026. https://doi.org/10.1098/RSOS.181026
  7. Boulanger Bertolus, J., Knippenberg, J., Verschueren, A., Le Blanc, P., Brown, B. L., Mouly, A. M., & Doy��re, V. (2015). Temporal behavior in auditory fear conditioning: Stimulus property matters. International Journal of Comparative Psychology, 28(1). 10.46867/IJCP.2015.28.02.04
  8. Bouton, M. E., & Schepers, S. T. (2015). Renewal after the punishment of free operant behavior. Journal of Experimental Psychology: Animal Learning and Cognition, 41(1), 81���90. https://doi.org/10.1037/XAN0000051
  9. Bradski G. (2000). Open CV 2. [Computer software].
  10. Bu��oniu, L., Ernst, D., De Schutter, B., & Babu��ka, R. (2010). Approximate dynamic programming with a fuzzy parameterization. Automatica, 46(5), 804���814. https://doi.org/10.1016/J.AUTOMATICA.2010.02.006
  11. Cazenille, L., Collignon, B., Chemtob, Y., Bonnet, F., Gribovskiy, A., Mondada, F., Bredeche, N., & Halloy, J. (2018). How mimetic should a robotic fish be to socially integrate into zebrafish groups? Bioinspiration & Biomimetics, 13(2), Article 025001. https://doi.org/10.1088/1748-3190/AA8F6A
  12. Chemtob, Y., Cazenille, L., Bonnet, F., Gribovskiy, A., Mondada, F., & Halloy, J. (2020). Strategies to modulate zebrafish collective dynamics with a closed���loop biomimetic robotic system. Bioinspiration & Biomimetics, 15(4), Article 046004. https://doi.org/10.1088/1748-3190/AB8706
  13. Chen, Y., Schomaker, L., & Wiering, M. (2021, February 4���6). An investigation into the effect of the learning rate on overestimation bias of connectionist Q���learning. Proceedings of the 13th International Conference on Agents and Artificial Intelligence, 2, 107���118. https://doi.org/10.5220/0010227301070118
  14. Clifton, J., & Laber, E. (2020). Q���learning: Theory and applications. Annual Review of Statistics and Its Application, 7, 279���301. https://doi.org/10.1146/ANNUREV-STATISTICS-031219-041220
  15. Del Angel Ortiz, R., Contreras, C. M., Guti��rrez���Garcia, A. G., & Gonz��lez, M. F. M. (2016). Social interaction test between a rat and a robot: A pilot study. International Journal of Advanced Robotic Systems, 13(1). https://doi.org/10.5772/62015
  16. Domenger, D., & Schwarting, R. K. W. (2005). Sequential behavior in the rat: A new model using food���reinforced instrumental behavior. Behavioural Brain Research, 160(2), 197���207. https://doi.org/10.1016/J.BBR.2004.12.002
  17. Friedel, J. E., DeHart, W. B., & Odum, A. L. (2017). The effects of 100 dB 1���kHz and 22���kHz tones as punishers on lever pressing in rats. Journal of the Experimental Analysis of Behavior, 107(3), 354���368. https://doi.org/10.1002/JEAB.254
  18. Frohnwieser, A., Murray, J. C., Pike, T. W., & Wilkinson, A. (2016). Using robots to understand animal cognition. Journal of the Experimental Analysis of Behavior, 105(1), 14���22. https://doi.org/10.1002/JEAB.193
  19. Gianelli, S., Harland, B., & Fellous, J.���M. (2018). A new rat���compatible robotic framework for spatial navigation behavioral experiments. Journal of Neuroscience Methods, 294, 40���50. https://doi.org/10.1016/j.jneumeth.2017.10.021
  20. Graubard, B. I., & Korn, E. L. (1994). Regression analysis with clustered data. Statistics in Medicine, 13(5���7), 509���522. https://doi.org/10.1002/SIM.4780130514
  21. Hasan, M. T., Hern��ndez���Gonz��lez, S., Dogbevia, G., Trevi��o, M., Bertocchi, I., Gruart, A., & Delgado���Garc��a, J. M. (2013). Role of motor cortex NMDA receptors in learning���dependent synaptic plasticity of behaving mice. Nature Communications, 4(1), 1���10. https://doi.org/10.1038/ncomms3258
  22. Hassabis, D., Kumaran, D., Summerfield, C., & Botvinick, M. (2017). Neuroscience���inspired artificial intelligence. Neuron, 95(2), 245���258. https://doi.org/10.1016/J.NEURON.2017.06.011
  23. Hayes, A. F., & Cai, L. (2007). Using heteroskedasticity���consistent standard error estimators in OLS regression: An introduction and software implementation. Behavior Research Methods, 39(4), 709���722. https://doi.org/10.3758/BF03192961
  24. Huang, A. C. W., Shyu, B. C., Hsiao, S., Chen, T. C., & He, A. B. H. (2013). Neural substrates of fear conditioning, extinction, and spontaneous recovery in passive avoidance learning: A c���fos study in rats. Behavioural Brain Research, 237(1), 23���31. https://doi.org/10.1016/J.BBR.2012.09.024
  25. Ishii, H., Aoki, T., Moribe, K., Nakasuji, M., Miwa, H., & Takanishi, A. (2003, October 31���November 2). Interactive experiments between creature and robot as a basic research for coexistence between human and robot [Paper presentation]. 12th IEEE International Workshop on Robot and Human Interactive Communication, Millbrae, CA, USA. https://doi.org/10.1109/ROMAN.2003.1251870
  26. Ishii, H., Ogura, M., Kurisu, S., Komura, A., Takanishi, A., Iida, N., & Kimura, H. (2006, September 25���29). Experimental study on task teaching to real rats through interaction with a robotic rat [Paper presentation]. 9th International Conference on Simulation of Adaptive Behavior, SAB 2006, Rome, Italy. https://doi.org/10.1007/11840541_53
  27. Isik, S., & Unal, G. (2023). Open���source software for automated rodent behavioral analysis. Frontiers in Neuroscience, 17, Article 1149027. https://doi.org/10.3389/FNINS.2023.1149027
  28. Jones, S., Paul, E. S., Dayan, P., Robinson, E. S. J., & Mendl, M. (2017). Pavlovian influences on learning differ between rats and mice in a counter���balanced Go/NoGo judgement bias task. Behavioural Brain Research, 331, 214���224. https://doi.org/10.1016/J.BBR.2017.05.044
  29. Kane, G. A., Lopes, G., Saunders, J. L., Mathis, A., & Mathis, M. W. (2020). Real���time, low���latency closed���loop feedback using markerless posture tracking. ELife, 9, 1���29. https://doi.org/10.7554/ELIFE.61909
  30. Kirtay, M., Oztop, E., Kuhlen, A. K., Asada, M., & Hafner, V. V. (2022, August 29���September 2). Trustworthiness assessment in multimodal human���robot interaction based on cognitive load [Paper presentation]. 31st IEEE International Conference on Robot and Human Interactive Communication: Social, Asocial, and Antisocial Robots, Naples, Italy. https://doi.org/10.1109/RO-MAN53752.2022.9900730
  31. Klein, B. A., Stein, J., & Taylor, R. C. (2012). Robots in the service of animal behavior. Communicative & Integrative Biology, 5(5), 466���472. https://doi.org/10.4161/CIB.21304
  32. Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. International Journal of Robotics Research, 32(11), 1238���1274. https://doi.org/10.1177/0278364913495721
  33. Krause, J., Winfield, A. F. T., & Deneubourg, J. L. (2011). Interactive robots in experimental biology. Trends in Ecology & Evolution, 26(7), 369���375. https://doi.org/10.1016/J.TREE.2011.03.015
  34. Laidre, M. E., & Johnstone, R. A. (2013). Animal signals. Current Biology, 23, R829���R833.
  35. Li, L., Ravi, S., & Wang, C. (2022). Editorial: Robotics to understand animal behaviour. Frontiers in Robotics and AI, 9, Article 963416. https://doi.org/10.3389/FROBT.2022.963416
  36. Ludvig, E. A., Bellemare, M. G., & Pearson, K. G. (2010). A primer on reinforcement learning in the brain: Psychological, computational, and neural perspectives. In E. Alonso & E. Mondrag��n (Eds.), Computational Neuroscience for Advancing Artificial Intelligence: Models, Methods and Applications (pp. 111���144). https://doi.org/10.4018/978-1-60960-021-1.CH006
  37. Markelius, A., Sj��berg, S., Lemhauori, Z., Cohen, L., Bergstr��m, M., Lowe, R., & Ca��amero, L. (2023). A human���robot mutual learning system with���affect���grounded language acquisition and���differential outcomes training. In A. A. Abdulaziz, John���John Cabibihan, Nader Meskin, Silvia Rossi, Wanyue Jiang, Hongsheng He, & Shuzhi Sam Ge (Eds.), Social Robotics (pp. 108���122). Springer. https://doi.org/10.1007/978-981-99-8718-4_10
  38. Mathis, A., Mamidanna, P., Cury, K. M., Abe, T., Murthy, V. N., Mathis, M. W., & Bethge, M. (2018). DeepLabCut: Markerless pose estimation of user���defined body parts with deep learning. Nature Neuroscience, 21(9), 1281���1289. https://doi.org/10.1038/s41593-018-0209-y
  39. Mikl��si, ��., & Gerencs��r, L. (2012, December 2���5). Potential application of autonomous and semi���autonomous robots in the study of animal behaviour [Paper presentation]. 3rd IEEE International Conference on Cognitive Infocommunications, Ko��ice, Slovakia. https://doi.org/10.1109/COGINFOCOM.2012.6421952
  40. Mohr, F., & van Rijn, J. N. (2022). Learning curves for decision making in supervised machine learning: A Survey. arXiv, Cornell University. 10.48550/arXiv.2201.12150
  41. Moriarty, O., Roche, M., McGuire, B. E., & Finn, D. P. (2012). Validation of an air���puff passive���avoidance paradigm for assessment of aversive learning and memory in rat models of chronic pain. Journal of Neuroscience Methods, 204(1), 1���8. https://doi.org/10.1016/J.JNEUMETH.2011.10.024
  42. No��, R. (2006). Cooperation experiments: coordination through communication versus acting apart together. Animal Behaviour, 71, 1���18. https://doi.org/10.1016/j.anbehav.2005.03.037
  43. Park, S., Lee, J., Park, K., Kim, J., Song, B., Hong, I., Kim, J., Lee, S., & Choi, S. (2016). Sound tuning of amygdala plasticity in auditory fear conditioning. Scientific Reports, 6(1), 1���14. https://doi.org/10.1038/srep31069
  44. Peng, X. Bin, Coumans, E., Zhang, T., Lee, T. W. E., Tan, J., & Levine, S. (2020, July 12���16). Learning agile robotic locomotion skills by imitating animals [Paper presentation]. Robotics: Science and Systems, Corvalis, Oregon, USA. 10.15607/RSS.2020.XVI.064
  45. Pinter���Wollman, N., Gordon, D. M., & Holmes, S. (2012). Nest site and weather affect the personality of harvester ant colonies. Behavioral Ecology, 23(5), 1022���1029. https://doi.org/10.1093/beheco/ars066
  46. Quinn, L. K., Schuster, L. P., Aguilar���Rivera, M., Arnold, J., Ball, D., Gygi, E., Heath, S., Holt, J., Lee, D. J., Taufatofua, J., Wiles, J., & Chiba, A. A. (2018). When rats rescue robots. Animal Behavior and Cognition, 5(4), 368���379. 10.26451/abc.05.04.04.2018
  47. Rigoli, F., Pezzulo, G., & Dolan, R. J. (2016). Prospective and Pavlovian mechanisms in aversive behaviour. Cognition, 146, 415���425. https://doi.org/10.1016/J.COGNITION.2015.10.017
  48. Rogan, M. T., Staubli, U. V., & LeDoux, J. E. (1997). Fear conditioning induces associative long���term potentiation in the amygdala. Nature, 390, 604���607. https://doi.org/10.1038/37601
  49. Romano, D., Donati, E., Benelli, G., & Stefanini, C. (2018). A review on animal���robot interaction: From bio���hybrid organisms to mixed societies. Biological Cybernetics, 113(3), 201���225. https://doi.org/10.1007/S00422-018-0787-5
  50. Rundus, A. S., Owings, D. H., Joshi, S. S., Chinn, E., & Giannini, N. (2007). Ground squirrels use an infrared signal to deter rattlesnake predation. Proceedings of the National Academy of Sciences of the United States of America, 104(36), 14372���14376. https://doi.org/10.1073/PNAS.0702599104
  51. Shahan, T. A., Sutton, G. M., Nist, A. N., & Davison, M. (2023). Aversive control versus stimulus control by punishment. Journal of the Experimental Analysis of Behavior, 119(1), 104���116. https://doi.org/10.1002/JEAB.805
  52. Shi, Q., Ishii, H., Fumino, S., Konno, S., Kinoshita, S., Takanishi, A., Okabayashi, S., Iida, N., & Kimura, H. (2011, December 7���11). A robot���rat interaction experimental system based on the rat���inspired mobile robot WR���4 [Paper presentation]. 2011 IEEE International Conference on Robotics and Biomimetics, ROBIO 2011, Karon Beach, Thailand. https://doi.org/10.1109/ROBIO.2011.6181319
  53. Shi, Q., Ishii, H., Kinoshita, S., Konno, S., Takanishi, A., Okabayashi, S., Iida, N., & Kimura, H. (2013). A rat���like robot for interacting with real rats. Robotica, 31(8), 1337���1350. https://doi.org/10.1017/S0263574713000568
  54. Shi, Q., Ishii, H., Kinoshita, S., Takanishi, A., Okabayashi, S., Iida, N., Kimura, H., & Shibata, S. (2013). Modulation of rat behaviour by using a rat���like robot. Bioinspiration & Biomimetics, 8(4), Article 046002. https://doi.org/10.1088/1748-3182/8/4/046002
  55. Shi, Q., Ishii, H., Sugahara, Y., Takanishi, A., Huang, Q., & Fukuda, T. (2015). Design and control of a biomimetic robotic rat for interaction with laboratory rats. IEEE/ASME Transactions on Mechatronics, 20(4), 1832���1842. https://doi.org/10.1109/TMECH.2014.2356595
  56. Shi, Q., Ishii, H., Tanaka, K., Sugahara, Y., Takanishi, A., Okabayashi, S., Huang, Q., & Fukuda, T. (2015). Behavior modulation of rats to a robotic rat in multi���rat interaction. Bioinspiration & Biomimetics, 10(5), Article 056011. https://doi.org/10.1088/1748-3190/10/5/056011
  57. Silveira, P. S. P., de Oliveira Siqueira, J., Bernardy, J. L., Santiago, J., Meneses, T. C., Portela, B. S., & Benvenuti, M. F. (2023). Modeling VI and VDRL feedback functions: Searching normative rules through computational simulation. Journal of the Experimental Analysis of Behavior, 119(2), 324���336. https://doi.org/10.1002/JEAB.826
  58. Skelton, R. W., Scarth, A. S., Wilkie, D. M., Miller, J. J., & Phillips, A. G. (1987). Long���term increases in dentate granule cell responsivity accompany operant conditioning. Journal of Neuroscience, 7(10), 3081���3087. https://doi.org/10.1523/JNEUROSCI.07-10-03081.1987
  59. Skinner, B. F. (1932). On the rate of formation of a conditioned reflex. Journal of General Psychology, 7(2), 274���286. https://doi.org/10.1080/00221309.1932.9918467
  60. Skinner, B. F. (1935). The generic nature of the concepts of stimulus and response. The Journal of General Psychology, 12(1), 40���65. https://doi.org/10.1080/00221309.1935.9920087
  61. Son, J. H., Choi, Y. C., & Ahn, H. S. (2014). Bio���insect and artificial robot interaction using cooperative reinforcement learning. Applied Soft Computing, 25, 322���335. https://doi.org/10.1016/J.ASOC.2014.09.002
  62. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). The MIT Press.
  63. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i���109. https://doi.org/10.1037/H0092987
  64. Thorndike, E. L. (1927). The law of effect. The American Journal of Psychology, 39(1/4), 212���222. https://doi.org/10.2307/1415413
  65. Watkins, C. J. C. H. (1989). Learning from delayed rewards [Doctoral dissertation, King's College]. https://www.cs.rhul.ac.uk/~chrisw/new_thesis.pdf
  66. Watkins, C. J. C. H., & Dayan, P. (1992). Q���learning. Machine Learning, 8(3���4), 279���292. https://doi.org/10.1007/BF00992698
  67. Xie, H., Gao, Z., Jia, G., Shimoda, S., & Shi, Q. (2023). Learning rat���like behavioral interaction using a small���scale robotic rat. Cyborg and Bionic Systems, 4, Article 0032. 10.34133/CBSYSTEMS.0032
  68. Xie, H., Jia, G., Al���Khulaqui, M., Gao, Z., Guo, X., Fukuda, T., & Shi, Q. (2022). A motion generation strategy of robotic rat using imitation learning for behavioral interaction. IEEE Robotics and Automation Letters, 7(3), 7351���7358. https://doi.org/10.1109/LRA.2022.3182472
  69. Xu, X., Song, J., Lu, H., He, L., Yang, Y., & Shen, F. (2018, July 23���27). Dual learning for visual question generation [Paper presentation]. 2018 IEEE International Conference on Multimedia and Expo, San Diego, CA, USA. https://doi.org/10.1109/ICME.2018.8486475
  70. Zawadzki, E., Lipson, A., & Leyton���Brown, K. (2014). Empirically evaluating multiagent learning algorithms. arXiv. 10.48550/arXiv.1401.8074
  71. Zhang, Y., & Zhang, J. (2021, October 29���November 1). Dual���task mutual learning for semi���supervised medical image segmentation [Paper presentation]. 4th Chinese Conference on Pattern Recognition and Computer Vision, Beijing, China. https://doi.org/10.1007/978-3-030-88010-1_46

Grants

  1. 22B07M2/Bo��azi��i ��niversitesi

MeSH Term

Animals
Robotics
Rats
Rats, Wistar
Reinforcement, Psychology
Reward
Male
Artificial Intelligence
Learning
Avoidance Learning
Conditioning, Operant

Word Cloud

Created with Highcharts 10.0.0learningparadigmmutualagentsreinforcementusedtrainanimalsoperantconditioningartificialvariousexperimentsinteractionrobottwonegativesignalsvivodifferentrewardratsRobotsincreasinglyalongsideSkinnerboxestasksSimilarlyemployedintelligenceresearchalgorithmsHowevertypesrelyunidirectionalonepartner-theanimalrobot-actsteacherstudentpresentnovelanimal-robotenablesbidirectionalWistarratinteractedachievespecificgoalsdynamicallyadjustingactionsbasedpositiverewardingpunishingprovidedpartnertestedsilicorat-robotpairsvirtualtrialsableadaptbehaviortowardmaximizationachievingrevealedrapidlyacquiredbehaviorsnecessaryreceiveexhibitedpassiveavoidancedisplayedsteepcurvedevelopedcananimal-machineinteractionstestefficacyrulesschedulesrobots:animal���robotoptimization

Similar Articles

Cited By

No available data.