[Dept. of Computer Science] Reinforcement Learning
Note: This is the last time this course will be given by Marco Wiering. Whether this course will be again given in the future is unknown.
Marco Wiering

NEW!: Examination notes Notes

The examination For the examination all course-material is important (except the material provided by the student presentations). Take a calculator with you. The examination will be held on 9 - 11 - 2007, from 14.00h - 17.00h in BBL 471.

Reinforcement Learning (RL) enables an agent to learn from interacting with an environment through a trial and error process. It can be used to control an agent who has to solve a particular task in an environment. Examples are learning to play games, robot control, elevator control, and network routing. In this course we will study different Reinforcement learning algorithms. Important issues are exploration to gather interesting experiences for the agent, function approximation to generalize over experiences, multi-agent reinforcement learning, and solving partially observable Markov Decision Processes. We use the literature from the book (obliged):

Reinforcement Learning: An Introduction from Sutton and Barto, MIT Press, 1998. Can be obtained from e.g. AS-kwadraat, StoCKI or Broese. See also: Book . Furthermore we use a syllabus with copies of the course-material.

We will study the following subjects during the hearing courses:

date subject study-material
5 September, 15.15u - 17.00u Introduction slides, chapter 1
7 September, 13.15u - 15.00u Evaluative Feedback slides, chapter 2
12 September, 15.15u - 17.00u The Reinforcement Learning Problem slides, chapter 3
14 September, 13.15u - 15.00u Dynamic Programming slides, chapter 4
19 September, 15.15u - 17.00u Monte Carlo Methods slides, chapter 5
21 September, 13.15u - 15.00u Temporal-Difference Learning slides, chapter 6
26 September, 15.15u - 17.00u Neural Networks slides
28 September, 13.15u - 15.00u Eligibility Traces slides, chapter 7
3 October, 15.15u - 17.00u No course ----
5 October, 13.15u - 15.00u Generalization and Function Approximation slides, chapter 8
10 October, 15.15u - 17.00u Generalization, POMDPs, Semi-MDPs slides
12 October, 13.15u - 15.00u Planning and Learning slides, chapter 9
17 October, 15.15u - 17.00u Case Studies chapter 11
19 October, 13.15u - 15.00u Learning in Multi-agent Systems slides
24 October, 15.15u - 17.00u No Course ---
26 October, 13.15u - 15.00u No Course ---
31 October, 15.15u - 17.00u Dimensions, Exercising examination slides, chapter 10
2 November, 13.15u - 15.00u Student Presentations ---

Room The rooms for all hearing courses are:
BBL 426 (Buys-Ballot Laboratorium)

For this course there is one examination. During the examination the student is not allowed to use the book or any other material. There is also a practical exercise for which students should work in groups of 3 or 4 persons to design and construct a reinforcement learning system for a particular application. The practical work should be accompanied with a description of 8 pages (see below). Finally, each group of students has to give a presentation of about 40 minutes describing their practical work. The final note is calculated as 50% of the examination and 50% of the practical work including the presentation. Both notes should be higher than 5.0 in order to pass the course (and the student presentation should be sufficient).

The examination For the examination all course-material is important (except the material provided by the student presentations). Take a calculator with you. The examination will be held on 9 - 11 - 2007, from 14.00h - 17.00h in BBL 471.

Re-examination The theoretical re-examination for people who did not pass the course the first time will be on 4 Janauari 2008, 9.00h - 12.00h in BBL-420.

Additional practice for examination
opdracht 1
opdracht 2
Uitwerking 1
Uitwerking 2

The practical course
The practical course is supervised on fridays from 15.15u-17.00u starting on 21 September. The room for the practical course is: BBL 408. Since the evaluation of the practical course weighs 50% of the final note, it is of course necessary that the students work on hours outside of the supervised practical course to design, implement, and experiment with their practical application. The practical application should be made in groups of 3 or 4 students. Students who want to do it in groups of 2 will be evaluated in the same way as larger groups, and thus have to do more work. Note that each student is expected to work 10 hours a week on the practical course. The following deadlines should be met:
  • Make a group of 3 or 4 students (take into account that you have to program in the same computer language). Deadline: 7 September.
  • Write an initial proposal to describe the RL system and the application on one page and deliver this to Marco Wiering before 14 September 17.00h. Include the following elements:
    • Names and emails of students of the group
    • Computer language
    • Application + Environment (what kind of environmental states, inputs, actions, reward function)
    • Reinforcement learning mechanisms (should be value-function based, do you need function approximators?)
    • Experiments which will be performed.
    it is possible that your initial proposal is not approved (e.g. if the task is too simple such as solving a maze, a bandit problem, or tic-tac-toe). In this case you will be asked to construct another proposal which you have to send before 20 September 17.00h.
  • Design and implement the problem environment with a random agent (or agents). Deadline: 5 October.
  • Design and implement the learning reinforcement learning agent(s) in the environment (it would be good if you would compare different RL algorithms or function approximators). Deadline: 19 October.
  • Perform a wide range of experiments to set the parameters and test and debug your algorithms. Then use the best parameters to get experimental results. Try to repeat your experiments at least 10 times. Deadline: 2 November.
  • Write a report of 8 pages including:
    • Introduction to RL. Problem description, novelty.
    • Description RL algorithm used (and if used Function approximator). Give all necessary equations to implement the chosen algorithm.
    • Description of environment (image), states, actions, goal (reward function).
    • Experimental results (including parameters used)
    • Conclusion
    • References (relate to existing work!)
    The deadline for delivering your practical work (program + report) is 14 November 2007 at 17.00h. Please put a hardcopy of the report in the post-box of Marco Wiering (CGN 3th floor).
  • TIP: for the practicum course a group can also make use of RL-Glue and RL library developed at the university of Alberta by Adam White. For more information about RL-Glue and RL-Library see: RL-Glue

    A nice Bachelor thesis (in Dutch) of Sjoerd van den Dries (CKI-student) about learning to play Othello is here: Othello_leren
    Last update: 12 September 2007