Reinforcement Learning
Psyc 7215, Spring 2012
T 12:30-3, Muen D424

Matt Jones
Muenzinger D260C
Office hours: T 11-12:30 for sake of having an official time, but I'm always around if you want to schedule a meeting.

Course Overview

Reinforcement learning is a powerful formal framework for modeling learning in a wide range of tasks. In its simplest form, it involves updating knowledge in response to reward or prediction error. This principle is the basis for many neural network models and is grounded in dopaminergic processes in the brain. The full RL framework goes beyond this basic idea, to cover learning in dynamic environments, sequential decision making, planning, and tradeoffs of exploration vs exploitation. In addition to psychology and neuroscience, the theory draws on machine learning and control theory. One reason for the excitement about RL is that it has the potential to make deep connections between research on biological and artificial systems.

Class format

The first part of the semester will be focused on the established mathematical theory of reinforcement learning. We will read most or all of Sutton & Barto's (1998) now-classic textbook. Class meetings during this phase will be somewhat more lecture-style than typical graduate seminars, although discussion is still encouraged. In the second phase of the course, we will read articles applying principles of RL to cognition and neuroscience. Class meetings during this phase will follow a more standard discussion format.


Sutton RS & Barto AG (1998). Reinforcement learning: An introduction. MIT Press.
Free online version:

Further readings are listed below and will be added to the schedule incrementally.


Weekly assignments. Sutton & Barto has many useful thought problems at the end of each section. I will assign roughly 2 per week. Once we transition to psychology/neuroscience articles, each person should bring to each class a brief written reaction to the readings to be discussed. The reactions serve two purposes: as a nominal motivation to ensure everyone reads and carefully thinks about the articles, and as catalysts for the group discussion. Reactions should not be summaries. A few sentences at the beginning to summarize each article are generally useful, both for me to make sure everyone recognizes the critical points and for you to check your own understanding, but the primary content should be your own ideas in response to what you read. These ideas can be anything from connections to other research (from this class or elsewhere); to possible extensions, improvements, or follow-up work; to criticisms of the authors' logic or methods. Exercises and reactions can be printed and brought to class, or (preferred) email to me in advance.

Final project. Final projects will involve coding and simulating an RL model on some task of interest. Ideally, you will model an empirical paradigm you use in your own research, or a variation. However, I can offer plenty of toy domains for people who want suggestions. The primary goal is to get experience implementing a model from start to finish. Projects will be due at the end of the day on May 8 and will consist of your code (in any language you prefer) and a brief writeup describing the task, the model, and the results.


     Weekly assignments  50%
Final project50%


The schedule will be set incrementally, based on the pace of the class.

1/17 --
1/24 No Class --
1/31 S&B Ch 1-2
Schultz et al. (1997)
1.1, 2.1, 2.16
Schultz reaction
2/7 S&B Ch 3 3.1, 3.5, 3.8
2/14 -- -- nfj
2/21 No Class --
2/28 S&B Ch 4-6 4.5, 5.2, 6.5 fue
3/6 -- --
3/13 S&B Ch 7-8 7.2, 8.5, 8.7
3/20 Jones & Canas (2010) reaction
4/3 Gureckis & Love (2009)
Daw et al. (2006)
Gershman & Niv (2010)
One response;
try to span papers
4/10 Zilli & Hasselmo (2008)
O'Reilly & Frank (2006)
Todd Niv & Cohen (2008)
4/17 Dietterich (1998)
Botvinick et al. (2009)
Vigorito & Barto (2010)
4/24 Cohen et al. (2007)
Daw et al. (2006)
Steyvers et al. (2009)
5/1 van Otterlo (2012)
Daw et al. (2005)
Lengyel & Dayan (2008)

University Policies (standard on all course syllabi)

CU Policy for Students with Disabilities

If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and www.Colorado.EDU/disabilityservices

CU Sexual Harrassment Policy

The University of Colorado at Boulder policy on Discrimination and Harassment, the University of Colorado policy on Sexual Harassment and the University of Colorado policy on Amorous Relationships apply to all students, staff and faculty. Any student, staff or faculty member who believes (s)he has been the subject of discrimination or harassment based upon race, color, national origin, sex, age, disability, religion, sexual orientation, or veteran status should contact the Office of Discrimination and Harassment (ODH) at 303-492-2127 or the Office of Judicial Affairs at 303-492-5550. Information about the ODH, the above referenced policies and the campus resources available to assist individuals regarding discrimination or harassment can be obtained at

CU Religious Observance Policy

Campus policy regarding religious observances requires that faculty make every effort to deal reasonably and fairly with all students who, because of religious obligations, have conflicts with scheduled exams, assignments or required attendance. Please notify the instructor of anticipated conflicts as early in the semester as possible so that there is adequate time to make necessary arrangements. See full details at

CU Classroom Behavior Policy

Students and faculty each have responsibility for maintaining an appropriate learning environment. Those who fail to adhere to such behavioral standards may be subject to discipline. Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor with the student's legal name. I will gladly honor your request to address you by an alternate name or gender pronoun. Please advise me of this preference early in the semester so that I may make appropriate changes to my records. See policies at

CU Honor Code

All students of the University of Colorado at Boulder are responsible for knowing and adhering to the academic integrity policy of this institution. Violations of this policy may include: cheating, plagiarism, aid of academic dishonesty, fabrication, lying, bribery, and threatening behavior. All incidents of academic misconduct shall be reported to the Honor Code Council (; 303-725-2273). Students who are found to be in violation of the academic integrity policy will be subject to both academic sanctions from the faculty member and non-academic sanctions (including but not limited to university probation, suspension, or expulsion). Other information on the Honor Code can be found at