Courses in previous years: [ 2000 | 2001 | 2002 | 2003 | 2004 | 2005]

T-61.6020 Special Course in Computer and Information Science II L:
Reinforcement Learning—Theory and Applications 6 cr

Instructor	D.Sc.(Tech.) Ville Könönen
Course format	Seminar course
Credits (ECTS)	6
Semester	Spring 2006 (during periods III and IV)
Seminar sessions	On Tuesdays at 14-16 in lecture hall T4 in the computer science building Konemiehentie 2, Otaniemi, Espoo
Language	English if at least one English speaking participant
Web	http://www.cis.hut.fi/Opinnot/T-61.6020/
Email	t616020@cis.hut.fi

Introduction

Reinforcement learning has attained lots of attention in recent years. Although reinforcement learning methods and procedures were earlier considered to be too ambitious and to lack a firm foundation, they have now been established as practical methods for solving task requiring decision making and planning activities. The key concept in reinforcement learning model is the agent that learns, by interacting with its environment, statistical properties of its environment in trial and error manner. In this seminar course, the focus is on principal reinforcement learning models and the mathematical theory behind them. Some extensions, for example systems with multiple simultaneous learners, are also studied along with relevant state-of-the-art applications of reinforcement learning.

Topic List

Basic reinforcement learning models
Temporal difference learning and stochastic approximation
Extensions to the basic reinforcement learning model
Applications

Requirements for Passing the Course

The course consists of seminar sessions and a small project work. The grading scheme for the course is failed–passed. Passing the course requires active participation for seminars (at least 70%) and the accepted project work.

Course Material

The course material consists of two main text and several research articles:

Sutton, R.S. and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press. More engineering oriented reading on reinforcement learning. Also available online here.
Bertsekas, D.P. and Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Athena Scientific. This is a basic text on stochastic approximation emphasizing to reinforcement learning.
Kaelbling, L.P., Littman, M.L., and Cassandra, A.R. (1998). Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence, 101: 99–134.
Darrell, T. and Pentland, A. (1996). Active gesture Recognition using Partially Observable Markov Decision processes. Technical Report No. 367 MIT Media Laboratory Perceptual Computing Section.
Könönen, V. (2004). Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches. PhD thesis. Helsinki University of Technology.
Littman, M.L. (1994). Markov Games as a Framework for Multi-Agent Reinforcement Learning. Proceeding of the Seventeenth International Conference on Machine Learning (ICML-1994), New Brunswick, NJ, 157–163
Hu, J. and Wellman, M.P. (2003). Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research 4:1039–1069.
Könönen, V.J. (2004). Asymmetric Multiagent Reinforcement Learning. Web Intelligence and Agent Systems: An International Journal (WIAS) 2(2): 105–121.
Crites, R.H. and Barto, A.G. (1996). Improving Elevator Performance Using Reinforcement Learning. Advances in Neural Information processing Systems 8. MIT Press, Cambridge, MA, 1017–1023.
Schraudolph, N.N., Dayan, P., and Sejnowski, T.J. (1994). Temporal Difference Learning of Position Evaluation in the Game of Go. Advances in Neural Information processing Systems 6. Morgan Kaufmann, San Francisco; CA, 817–824.
Thrun, S. (1995). Learning to Play the Game of Chess. Advances in Neural Information processing Systems 7. MIT Press, Cambridge, MA, 1069–1076.
Tesauro, G. (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3): 58–68.

Short descriptions of the material can be found in the following table:

Material	A short description
Chapter 3 in [1]	Basic definitions and ideas behind modern Reinforcement Learning (RL) methods.
Chapter 4 in [1]	Describes the connection between RL and a certain dynamic programming task.
Chapter 5 in [1]	Monte Carlo methods for solving Markov Decision Processes.
Chapter 6 in [1]	Temporal Difference methods for solving Markov Decision Processes.
4.1.,4.3., and 5.6. in [2]	Stochastic gradient methods and their convergence with Q-learning as an example. Quite theoretical material. However, going through all details is not necessary; the goal is to understand the connection between the theory and RL methods.
[3]	Tutorial of Partially Observable Markov Decision Processes (POMDPs).
[4]	An example system using POMDPs.
Chapter 3 in [5]	A basic introduction to Game Theory.
[6]	The first application of Markov Games to multiagent RL.
[7]	Generalization of [6] to the general-sum problems.
[8]	Sequential equilibrium approach to general-sum learning problems.
[9]	Elevator control application.
[10]	RL approach to the game of Go.
[11]	NeuroChess Chess player.
[12]	The backgammon player TD-Gammon.

Timetable

In this course, we have seven seminar sessions. Detailed timetable is as follows:

Date	Material	Presenter
24.1.	Introduction Lecture	Ville Könönen
31.1.	Chapter 3 and 4 in [1]	Jaakko Väyrynen & Vibhor Kumar
7.2.	Chapter 5 and 6 in [1]	Ville Viitaniemi & Paul Wagner
14.2.	4.1.,4.3., and 5.6. in [2]	Elif Özge Özdamar
21.2.	[3] and [4]	Jarkko Salojärvi & Kaius Perttilä
28.2.	Chapter 3 in [5] and [6]	Joni Pajarinen & Yongnan Ji
14.3.	[7] and [10]	Lauri Lyly & Chen Shanzhen

Project Work

Each participant should carry out one small project work. There are three possibilies that can be found in the following links:

[project 1] [project 2] [project 3]

The requirement for passing the project work is the written report (in English) that contains a description of the project; for example used tools, etc. In addition there are several questions in each project description. The final report should be mailed to the instructor. The deadline for the project work is 4.4.2006.

Attendance

Seminar is mainly intended for graduate students. However, advanced undergraduate students having reasonable knowledge of statistical machine learning methods may also participate.

Welcome!

T-61.6020 Special Course in Computer and Information Science II L: Reinforcement Learning—Theory and Applications 6 cr