Solving Deep Memory POMDPs with Recurrent Policy Gradients
2007
Conference Paper
ei
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a Long Short-Term Memory architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.
Author(s): | Wierstra, D. and Förster, A. and Peters, J. and Schmidhuber, J. |
Book Title: | ICANN‘07 |
Journal: | Artificial Neural Networks: ICANN 2007 |
Pages: | 697-706 |
Year: | 2007 |
Month: | September |
Day: | 0 |
Publisher: | Springer |
Department(s): | Empirical Inference |
Bibtex Type: | Conference Paper (inproceedings) |
DOI: | 10.1007/978-3-540-74690-4_71 |
Event Name: | International Conference on Artificial Neural Networks |
Event Place: | Porto, Portugal |
Address: | Berlin, Germany |
Digital: | 0 |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
PDF
|
BibTex @inproceedings{4719, title = {Solving Deep Memory POMDPs with Recurrent Policy Gradients}, author = {Wierstra, D. and F{\"o}rster, A. and Peters, J. and Schmidhuber, J.}, journal = {Artificial Neural Networks: ICANN 2007}, booktitle = {ICANN‘07}, pages = {697-706}, publisher = {Springer}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {Berlin, Germany}, month = sep, year = {2007}, doi = {10.1007/978-3-540-74690-4_71}, month_numeric = {9} } |