Solving Deep Memory POMDPs with Recurrent Policy Gradients

2007

Conference Paper

ei

This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a Long Short-Term Memory architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.

Author(s):	Wierstra, D. and Förster, A. and Peters, J. and Schmidhuber, J.
Book Title:	ICANN‘07
Journal:	Artificial Neural Networks: ICANN 2007
Pages:	697-706
Year:	2007
Month:	September
Day:	0
Publisher:	Springer

Department(s):	Empirical Inference
Bibtex Type:	Conference Paper (inproceedings)

DOI:	10.1007/978-3-540-74690-4_71
Event Name:	International Conference on Artificial Neural Networks
Event Place:	Porto, Portugal

Address:	Berlin, Germany
Digital:	0
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

Links:	PDF PDF

BibTex @inproceedings{4719, title = {Solving Deep Memory POMDPs with Recurrent Policy Gradients}, author = {Wierstra, D. and F{\"o}rster, A. and Peters, J. and Schmidhuber, J.}, journal = {Artificial Neural Networks: ICANN 2007}, booktitle = {ICANN‘07}, pages = {697-706}, publisher = {Springer}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, address = {Berlin, Germany}, month = sep, year = {2007}, doi = {10.1007/978-3-540-74690-4_71}, month_numeric = {9} }

People

Jan Peters

Research Group Leader

Alumni

Solving Deep Memory POMDPs with Recurrent Policy Gradients

2007

Conference Paper

ei

People

Latest News

Links

Contact Us