• Document: The Successor Representation as a model of behavioural flexibility
  • Size: 2.23 MB
  • Uploaded: 2019-07-21 11:19:12
  • Status: Successfully converted


Some snippets from your converted document:

Research Master in Cognitive Science École Normale Supérieure L’École des Hautes Études en Sciences Sociales Université Paris Descartes Master Thesis The Successor Representation as a model of behavioural flexibility Author: Supervisor: Alexis Ducarouge Olivier Sigaud Laboratory: Institut des Systèmes Intelligents et de Robotique Sorbonne Universités, UPMC Univ Paris 06, CNRS UMR 7222 June 6, 2017 2 Declarations Originality Modeling behavioural aptitudes measured in animals with Reinforcement learning (RL) has been the object of numerous works. Furthermore, the RL Successor Representation (SR) approach is being increasingly studied in order to better understand its particular algorithmic abilities. The originality of this work is to associate both concerns in the same study. Hence, we aim at modelling three decisive cognitive abilities in animals using the SR and three other carefully selected RL models. The goal of this work is twofold: to identify which model is the more likely to account for those fundamental animal abilities and to better understand the main algorithmic flexibilities of SR using some biologically-relevant tasks. We also provide some interesting new ways to apprehend the SR approach. Especially, our work reveals very strong algorithmic links between SR and SAwSu (another associative approach). We also introduce the mathematical aspects of the SR approach following a didactic and comprehensive perspective which has, to the best of our knowledge, never been used to present the SR. It is worth mentioning that a significant part of this work has been submitted and accepted as an article for the 2017 JFPDA conference (Journes Francophones sur la Plan- ification, la Dcision et l’Apprentissage pour la conduite de systmes). Then, a large part of this work will certainly be available on the website of the conference and on the HAL scientific archive soon. Contribution Olivier Sigaud and Alexis Ducarouge defined the scientific question as well as the modelling approach (based on the Successor Representation and other RL models). Olivier Sigaud provided guidance to carry out the bibliographic work, gave critical feedback about how to improve the modelling work and how to interpret it; he also corrected this master thesis. Alexis Ducarouge did the bibliographic review, chose the behavioural tasks to model, implemented the RL models as well as the simulation environment of the behavioural tasks, suggested a non-standard way to introduce the RL framework, interpreted the results, generated the graphs and wrote the report. Acknowledgements I would like to warmly thank Olivier Sigaud who has been a great mentor. He has intro- duced me into research fields I barely knew before starting this internship. He has always been there to help me and relevantly advise me on the best way to conduct my research. I also would like to thank the members of the AMAC team of the ISIR who helped me a lot commenting my work and sharing their experience. 3 Alexis Ducarouge Session : juin Encadrant : Olivier Sigaud Laboratoire : ISIR Langue du mémoire : Anglais Rapporteurs potentiels : Boris Gutkin & Etienne Koechlin PRE-SOUTENANCE DE STAGE De l’intérêt de la « Successor Representation » pour l’apprentissage de la décision I. Contexte et justification : Une interrogation centrale en neuroscience reste la manière dont le cerveau évalue les actions possibles pour une tâche donnée dans un environnement complexe et changeant. Une des approches les plus plébiscitées est l’apprentissage par renforcement, son objectif est d’évaluer la récompense cumulée attendue dans le futur, suite au choix d’une action (Sutton & Barto, 1998). Fort de confirmations expérimentales ainsi que de sa grande capacité en terme de modélisation, il est maintenant avéré pour un grand nombre de chercheurs que l’apprentissage par renforcement joue un rôle prépondérant dans le processus de décision adaptatif chez de nombreux animaux (Daw, 2003). Cette classe d’algorithme se décompose classiquement selon deux approches : model-free et model-based. La première se base sur l’apprentissage par différence temporelle, dont l’erreur de prédiction repose biologiquement sur les réponses dopaminergiques (Houk et al., 1995). Ces modèles sont ainsi considérés comme largement impliqués dans les comportements habituels et automatiques. Cet apprentissage par essai-erreur est ainsi computat

Recently converted files (publicly available):