Courses in previous years: [ 2000 | 2001 | 2002 | 2003 | 2004 | 2005]

Reinforcement Learning—Theory and Applications 6 cr

Instructor | D.Sc.(Tech.) Ville Könönen |
---|---|

Course format | Seminar course |

Credits (ECTS) | 6 |

Semester | Spring 2006 (during periods III and IV) |

Seminar sessions | On Tuesdays at 14-16 in lecture hall T4 in the computer science building Konemiehentie 2, Otaniemi, Espoo |

Language | English if at least one English speaking participant |

Web | http://www.cis.hut.fi/Opinnot/T-61.6020/ |

t616020@cis.hut.fi |

Reinforcement learning has attained lots of attention in recent years. Although reinforcement learning methods and procedures were earlier considered to be too ambitious and to lack a firm foundation, they have now been established as practical methods for solving task requiring decision making and planning activities. The key concept in reinforcement learning model is the agent that learns, by interacting with its environment, statistical properties of its environment in trial and error manner. In this seminar course, the focus is on principal reinforcement learning models and the mathematical theory behind them. Some extensions, for example systems with multiple simultaneous learners, are also studied along with relevant state-of-the-art applications of reinforcement learning.

- Basic reinforcement learning models
- Temporal difference learning and stochastic approximation
- Extensions to the basic reinforcement learning model
- Applications

The course consists of seminar sessions and a small project work. The grading scheme for the course is failed–passed. Passing the course requires active participation for seminars (at least 70%) and the accepted project work.

The course material consists of two main text and several research articles:

- Sutton, R.S. and Barto, A.G. (1998).
*Reinforcement Learning: An Introduction*, MIT Press. More engineering oriented reading on reinforcement learning. Also available online here. - Bertsekas, D.P. and Tsitsiklis, J.N. (1996).
*Neuro-Dynamic Programming*. Athena Scientific. This is a basic text on stochastic approximation emphasizing to reinforcement learning. - Kaelbling, L.P., Littman, M.L., and Cassandra, A.R. (1998).
*Planning and Acting in Partially Observable Stochastic Domains*. Artificial Intelligence,**101**: 99–134. - Darrell, T. and Pentland, A. (1996).
*Active gesture Recognition using Partially Observable Markov Decision processes*. Technical Report No. 367 MIT Media Laboratory Perceptual Computing Section. - Könönen, V. (2004).
*Multiagent Reinforcement Learning in Markov Games: Asymmetric and Symmetric Approaches*. PhD thesis. Helsinki University of Technology. - Littman, M.L. (1994).
*Markov Games as a Framework for Multi-Agent Reinforcement Learning*. Proceeding of the Seventeenth International Conference on Machine Learning (ICML-1994), New Brunswick, NJ, 157–163 - Hu, J. and Wellman, M.P. (2003).
*Nash Q-Learning for General-Sum Stochastic Games*. Journal of Machine Learning Research**4**:1039–1069. - Könönen, V.J. (2004).
*Asymmetric Multiagent Reinforcement Learning*. Web Intelligence and Agent Systems: An International Journal (WIAS)**2**(2): 105–121. - Crites, R.H. and Barto, A.G. (1996).
*Improving Elevator Performance Using Reinforcement Learning*. Advances in Neural Information processing Systems 8. MIT Press, Cambridge, MA, 1017–1023. - Schraudolph, N.N., Dayan, P., and Sejnowski, T.J. (1994).
*Temporal Difference Learning of Position Evaluation in the Game of Go*. Advances in Neural Information processing Systems 6. Morgan Kaufmann, San Francisco; CA, 817–824. - Thrun, S. (1995).
*Learning to Play the Game of Chess*. Advances in Neural Information processing Systems 7. MIT Press, Cambridge, MA, 1069–1076. - Tesauro, G. (1995).
*Temporal Difference Learning and TD-Gammon*. Communications of the ACM**38**(3): 58–68.

Short descriptions of the material can be found in the following table:

Material | A short description |
---|---|

Chapter 3 in [1] | Basic definitions and ideas behind modern Reinforcement Learning (RL) methods. |

Chapter 4 in [1] | Describes the connection between RL and a certain dynamic programming task. |

Chapter 5 in [1] | Monte Carlo methods for solving Markov Decision Processes. |

Chapter 6 in [1] | Temporal Difference methods for solving Markov Decision Processes. |

4.1.,4.3., and 5.6. in [2] | Stochastic gradient methods and their convergence with Q-learning as an example. Quite theoretical material. However, going through all details is not necessary; the goal is to understand the connection between the theory and RL methods. |

[3] | Tutorial of Partially Observable Markov Decision Processes (POMDPs). |

[4] | An example system using POMDPs. |

Chapter 3 in [5] | A basic introduction to Game Theory. |

[6] | The first application of Markov Games to multiagent RL. |

[7] | Generalization of [6] to the general-sum problems. |

[8] | Sequential equilibrium approach to general-sum learning problems. |

[9] | Elevator control application. |

[10] | RL approach to the game of Go. |

[11] | NeuroChess Chess player. |

[12] | The backgammon player TD-Gammon. |

In this course, we have seven seminar sessions. Detailed timetable is as follows:

Date | Material | Presenter |
---|---|---|

24.1. | Introduction Lecture | Ville Könönen |

31.1. | Chapter 3 and 4 in [1] | Jaakko Väyrynen & Vibhor Kumar |

7.2. | Chapter 5 and 6 in [1] | Ville Viitaniemi & Paul Wagner |

14.2. | 4.1.,4.3., and 5.6. in [2] | Elif Özge Özdamar |

21.2. | [3] and [4] | Jarkko Salojärvi & Kaius Perttilä |

28.2. | Chapter 3 in [5] and [6] | Joni Pajarinen & Yongnan Ji |

14.3. | [7] and [10] | Lauri Lyly & Chen Shanzhen |

Each participant should carry out one small project work. There are three possibilies that can be found in the following links:

[project 1] [project 2] [project 3]The requirement for passing the project work is the written report (in English) that contains a description of the project; for example used tools, etc. In addition there are several questions in each project description. The final report should be mailed to the instructor. The deadline for the project work is 4.4.2006.

Seminar is mainly intended for graduate students. However, advanced undergraduate students having reasonable knowledge of statistical machine learning methods may also participate.

Welcome!

You are at: CIS → /Opinnot/T-61.6020/2006/index.shtml

Page maintained by webmaster at cis.hut.fi, last updated Tuesday, 14-Mar-2006 18:32:05 EET