Reinforcement Learning
Psyc 7215, Spring 2012
T 12:30-3, Muen D424

Matt Jones
Muenzinger D260C
Office hours: T 11-12:30 for sake of having an official time, but I'm always around if you want to schedule a meeting.

Course Overview

Reinforcement learning is a powerful formal framework for modeling learning in a wide range of tasks. In its simplest form, it involves updating knowledge in response to reward or prediction error. This principle is the basis for many neural network models and is grounded in dopaminergic processes in the brain. The full RL framework goes beyond this basic idea, to cover learning in dynamic environments, sequential decision making, planning, and tradeoffs of exploration vs exploitation. In addition to psychology and neuroscience, the theory draws on machine learning and control theory. One reason for the excitement about RL is that it has the potential to make deep connections between research on biological and artificial systems.

Class format

The first part of the semester will be focused on the established mathematical theory of reinforcement learning. We will read most or all of Sutton & Barto's (1998) now-classic textbook. Class meetings during this phase will be somewhat more lecture-style than typical graduate seminars, although discussion is still encouraged. In the second phase of the course, we will read articles applying principles of RL to cognition and neuroscience. Class meetings during this phase will follow a more standard discussion format.


Sutton RS & Barto AG (1998). Reinforcement learning: An introduction. MIT Press.
Free online version:

Further readings are listed below and will be added to the schedule incrementally.


Weekly assignments. Sutton & Barto has many useful thought problems at the end of each section. I will assign roughly 2 per week. Once we transition to psychology/neuroscience articles, each person should bring to each class a brief written reaction to the readings to be discussed. The reactions serve two purposes: as a nominal motivation to ensure everyone reads and carefully thinks about the articles, and as catalysts for the group discussion. Reactions should not be summaries. A few sentences at the beginning to summarize each article are generally useful, both for me to make sure everyone recognizes the critical points and for you to check your own understanding, but the primary content should be your own ideas in response to what you read. These ideas can be anything from connections to other research (from this class or elsewhere); to possible extensions, improvements, or follow-up work; to criticisms of the authors' logic or methods. Exercises and reactions can be printed and brought to class, or (preferred) email to me in advance.

Final project. Final projects will involve coding and simulating an RL model on some task of interest. Ideally, you will model an empirical paradigm you use in your own research, or a variation. However, I can offer plenty of toy domains for people who want suggestions. The primary goal is to get experience implementing a model from start to finish. Projects will be due at the end of the day on May 8 and will consist of your code (in any language you prefer) and a brief writeup describing the task, the model, and the results.


     Weekly assignments  50%
Final project50%


The schedule will be set incrementally, based on the pace of the class.

1/17 --
1/24 No Class --
1/31 S&B Ch 1-2
Schultz et al. (1997)
1.1, 2.1, 2.16
Schultz reaction
2/7 S&B Ch 3 3.1, 3.5, 3.8
2/14 -- -- nfj
2/21 No Class --
2/28 S&B Ch 4-6 4.5, 5.2, 6.5 fue
3/6 -- --
3/13 S&B Ch 7-8 7.2, 8.5, 8.7
3/20 Jones & Canas (2010) reaction
4/3 Gureckis & Love (2009)
Daw et al. (2006)
Gershman & Niv (2010)
One response;
try to span papers
4/10 Zilli & Hasselmo (2008)
O'Reilly & Frank (2006)
Todd Niv & Cohen (2008)
4/17 Dietterich (1998)
Botvinick et al. (2009)
Vigorito & Barto (2010)
4/24 Cohen et al. (2007)
Daw et al. (2006)
Steyvers et al. (2009)
5/1 van Otterlo (2012)
Daw et al. (2005)
Lengyel & Dayan (2008)

