**Note:**This is the last time this course will be given by Marco Wiering. Whether this course will be again given in the future is unknown.**Docent**- Marco Wiering

**NEW!: Examination notes**Notes

**The examination**For the examination all course-material is important (except the material provided by the student presentations). Take a calculator with you. The examination will be held on 9 - 11 - 2007, from 14.00h - 17.00h in BBL 471.

**Contents**-
Reinforcement Learning (RL) enables an agent to learn from interacting with an environment through a trial and error process. It can be
used to control an agent who has to solve a particular task in an environment. Examples are learning to play games, robot control,
elevator control, and network routing. In this course we will study different Reinforcement learning algorithms.
Important issues are exploration to gather interesting experiences for the agent, function approximation to generalize over
experiences, multi-agent reinforcement learning, and solving partially observable Markov Decision Processes.
We use the literature from the book (obliged):

Reinforcement Learning: An Introduction from Sutton and Barto, MIT Press, 1998. Can be obtained from e.g. AS-kwadraat, StoCKI or Broese. See also: Book . Furthermore we use a syllabus with copies of the course-material.

We will study the following subjects during the hearing courses:

date subject study-material 5 September, 15.15u - 17.00u Introduction slides, chapter 1 7 September, 13.15u - 15.00u Evaluative Feedback slides, chapter 2 12 September, 15.15u - 17.00u The Reinforcement Learning Problem slides, chapter 3 14 September, 13.15u - 15.00u Dynamic Programming slides, chapter 4 19 September, 15.15u - 17.00u Monte Carlo Methods slides, chapter 5 21 September, 13.15u - 15.00u Temporal-Difference Learning slides, chapter 6 26 September, 15.15u - 17.00u Neural Networks slides 28 September, 13.15u - 15.00u Eligibility Traces slides, chapter 7 3 October, 15.15u - 17.00u No course ---- 5 October, 13.15u - 15.00u Generalization and Function Approximation slides, chapter 8 10 October, 15.15u - 17.00u Generalization, POMDPs, Semi-MDPs slides 12 October, 13.15u - 15.00u Planning and Learning slides, chapter 9 17 October, 15.15u - 17.00u Case Studies chapter 11 19 October, 13.15u - 15.00u Learning in Multi-agent Systems slides 24 October, 15.15u - 17.00u No Course --- 26 October, 13.15u - 15.00u No Course --- 31 October, 15.15u - 17.00u Dimensions, Exercising examination slides, chapter 10 2 November, 13.15u - 15.00u Student Presentations ---

**Room**The rooms for all hearing courses are:

BBL 426 (Buys-Ballot Laboratorium)

**Examination**-
For this course there is one examination. During the examination the
student is not allowed to use the book or any other material.
There is also a practical exercise for which students should work in groups
of 3 or 4 persons to design and construct a reinforcement learning system
for a particular application. The practical work should be accompanied
with a description of 8 pages (see below).
Finally, each group of students has to give a presentation of about 40 minutes
describing their practical work.
The final note is calculated as 50% of the examination and 50% of the
practical work including the presentation.
Both notes should be higher than 5.0 in order to pass
the course (and the student presentation should be sufficient).

**The examination**For the examination all course-material is important (except the material provided by the student presentations). Take a calculator with you. The examination will be held on 9 - 11 - 2007, from 14.00h - 17.00h in BBL 471.

**Re-examination**The theoretical re-examination for people who did not pass the course the first time will be on 4 Janauari 2008, 9.00h - 12.00h in BBL-420.

**Additional practice for examination**

opdracht 1

opdracht 2

Uitwerking 1

Uitwerking 2

**The practical course**-
The practical course is supervised on fridays from 15.15u-17.00u
starting on 21 September. The room for the practical course is: BBL 408.
Since the evaluation of the practical course weighs 50% of the final note,
it is of course necessary that the students work on hours outside of the supervised
practical course to design, implement, and experiment with their practical application.
The practical application should be made in groups of 3 or 4 students. Students who
want to do it in groups of 2 will be evaluated in the same way as larger groups,
and thus have to do more work. Note that each student is expected to work 10 hours
a week on the practical course.
The following deadlines should be met:
- Make a group of 3 or 4 students (take into account that you have to program in the same computer language). Deadline: 7 September.
- Write an initial proposal to describe the RL system and the application on one page and deliver this to Marco Wiering before 14 September 17.00h. Include the following elements:
- Names and emails of students of the group
- Computer language
- Application + Environment (what kind of environmental states, inputs, actions, reward function)
- Reinforcement learning mechanisms (should be value-function based, do you need function approximators?)
- Experiments which will be performed.

- Design and implement the problem environment with a random agent (or agents). Deadline: 5 October.
- Design and implement the learning reinforcement learning agent(s) in the environment (it would be good if you would compare different RL algorithms or function approximators). Deadline: 19 October.
- Perform a wide range of experiments to set the parameters and test and debug your algorithms. Then use the best parameters to get experimental results. Try to repeat your experiments at least 10 times. Deadline: 2 November.
- Write a report of 8 pages including:
- Introduction to RL. Problem description, novelty.
- Description RL algorithm used (and if used Function approximator). Give all necessary equations to implement the chosen algorithm.
- Description of environment (image), states, actions, goal (reward function).
- Experimental results (including parameters used)
- Conclusion
- References (relate to existing work!)

A nice Bachelor thesis (in Dutch) of Sjoerd van den Dries (CKI-student) about learning to play Othello is here: Othello_leren