Identification of animal behavioral strategies by inverse reinforcement learning.

Shoichiro Yamaguchi, Honda Naoki, Muneki Ikeda, Yuki Tsukada, Shunji Nakano, Ikue Mori, Shin Ishii
Author Information
  1. Shoichiro Yamaguchi: Integrated Systems Biology Laboratory, Graduate School of Informatics, Kyoto University, Sakyo, Kyoto, Japan.
  2. Honda Naoki: Laboratory of Theoretical Biology, Graduate School of Biostudies, Kyoto University, Yoshidakonoecho, Sakyo, Kyoto, Japan. ORCID
  3. Muneki Ikeda: Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan.
  4. Yuki Tsukada: Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan.
  5. Shunji Nakano: Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan.
  6. Ikue Mori: Group of Molecular Neurobiology, Graduate School of Science, Nagoya University, Furoucho, Chikusa, Nagoya, Aichi, Japan.
  7. Shin Ishii: Integrated Systems Biology Laboratory, Graduate School of Informatics, Kyoto University, Sakyo, Kyoto, Japan.

Abstract

Animals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals' decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL) framework to identify an animal's behavioral strategy from behavioral time-series data. We applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, fed worms prefer, while starved worms avoid the cultivation temperature on a thermal gradient. Our IRL approach revealed that the fed worms used both the absolute temperature and its temporal derivative and that their behavior involved two strategies: directed migration (DM) and isothermal migration (IM). With DM, worms efficiently reached specific temperatures, which explains their thermotactic behavior when fed. With IM, worms moved along a constant temperature, which reflects isothermal tracking, well-observed in previous studies. In contrast to fed animals, starved worms escaped the cultivation temperature using only the absolute, but not the temporal derivative of temperature. We also investigated the neural basis underlying these strategies, by applying our method to thermosensory neuron-deficient worms. Thus, our IRL-based approach is useful in identifying animal strategies from behavioral time-series data and could be applied to a wide range of behavioral studies, including decision-making, in other organisms.

References

  1. Neuron. 2015 Apr 22;86(2):428-41 [PMID: 25864633]
  2. Nat Methods. 2011 Jun 05;8(7):592-8 [PMID: 21642964]
  3. PLoS Comput Biol. 2016 Sep 12;12(9):e1005099 [PMID: 27617747]
  4. Genetics. 2005 Mar;169(3):1437-50 [PMID: 15654086]
  5. Science. 2005 Nov 25;310(5752):1337-40 [PMID: 16311337]
  6. Neuron. 2014 Oct 1;84(1):18-31 [PMID: 25277452]
  7. Science. 2014 Sep 26;345(6204):1616-20 [PMID: 25258080]
  8. Proc Natl Acad Sci U S A. 2008 Aug 5;105(31):11002-7 [PMID: 18667708]
  9. Genetics. 1974 May;77(1):71-94 [PMID: 4366476]
  10. Proc Natl Acad Sci U S A. 2007 Feb 13;104(7):2283-8 [PMID: 17283333]
  11. Nat Neurosci. 2008 Apr;11(4):410-6 [PMID: 18368048]
  12. Science. 1997 Mar 14;275(5306):1593-9 [PMID: 9054347]
  13. Sci Rep. 2016 Dec 12;6:38845 [PMID: 27941920]
  14. Neural Netw. 2002 Jun-Jul;15(4-6):665-87 [PMID: 12371519]
  15. J Neurosci. 2011 Aug 10;31(32):11718-27 [PMID: 21832201]
  16. Nat Neurosci. 2008 Aug;11(8):908-15 [PMID: 18660808]
  17. J Neurosci. 1999 Nov 1;19(21):9557-69 [PMID: 10531458]
  18. Nat Neurosci. 2004 Aug;7(8):887-93 [PMID: 15235607]
  19. J Neurosci. 2016 Mar 02;36(9):2571-81 [PMID: 26936999]
  20. J Neurosci. 1996 Mar 1;16(5):1936-47 [PMID: 8774460]
  21. Biol Cybern. 2013 Aug;107(4):477-90 [PMID: 23832417]
  22. Biol Cybern. 2014 Oct;108(5):603-19 [PMID: 24756167]
  23. Phys Rev Lett. 1996 Jul 22;77(4):635-638 [PMID: 10062864]
  24. Proc Natl Acad Sci U S A. 1975 Oct;72(10):4061-5 [PMID: 1060088]
  25. Science. 2008 May 9;320(5877):803-7 [PMID: 18403676]
  26. J Neurosci. 2008 Nov 19;28(47):12546-57 [PMID: 19020047]
  27. Neurosci Lett. 2005 Apr 29;379(1):37-41 [PMID: 15814195]
  28. J Neurosci Methods. 2006 Jun 30;154(1-2):45-52 [PMID: 16417923]
  29. Nature. 1995 Jul 27;376(6538):344-8 [PMID: 7630402]

MeSH Term

Animals
Behavior, Animal
Caenorhabditis elegans
Computational Biology
Decision Making
Learning
Reinforcement, Psychology
Taxis Response

Word Cloud

Created with Highcharts 10.0.0behavioralwormstemperaturestrategiesfedbehaviorcultivationIdentificationstrategyuseddecision-makinginverseIRLframeworktime-seriesdataappliedthermotacticconstantstarvedapproachabsolutetemporalderivativemigrationDMisothermalIMstudiesanimalAnimalsablereachdesiredstateenvironmentcontrollingvariouspatternscontrolimportantunderstandinganimals'fundamentaldissectinformationprocessingdonenervoussystemHowevermethodsquantifyingfullyestablishedstudydevelopedreinforcement-learningidentifyanimal'sCeleganswithoutfoodpreferavoidthermalgradientrevealedinvolvedtwostrategies:directedefficientlyreachedspecifictemperaturesexplainsmovedalongreflectstrackingwell-observedpreviouscontrastanimalsescapedusingalsoinvestigatedneuralbasisunderlyingapplyingmethodthermosensoryneuron-deficientThusIRL-basedusefulidentifyingwiderangeincludingorganismsreinforcementlearning

Similar Articles

Cited By