Publications on Dynamic Programming/Markov Decision Processes

Barto, Andrew , S.J. Bradtke and Satinder Singh( baveja@cs.colorado.edu)

Learning to Act using Real-Time Dynamic Programming
Learning to Act using Real-Time Dynamic Programming ( gzipped Postscript - 520 KB) Abstract:
Learning methods based on dynamic programming (DP) are receiving increasing attention in artificial...

Bhulai, Sandjai

E-mail: sbhulai@cs.vu.nl
Markov Decision Processes: the control of high-dimensional systems.
Ph.D. Thesis, Vrije Universiteit Amsterdam, 2002 (PDF - 1.1 MB) Abstract:
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...

Bhulai, Sandjai ( sbhulai@cs.vu.nl)

Markov Decision Processes: the control of high-dimensional systems.
Ph.D. Thesis, Vrije Universiteit Amsterdam, 2002 (PDF - 1.1 MB) Abstract:
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...

Diuk, Carlos , Alexander Strehl, Michael Littman

E-mail: cdiuk@cs.rutgers.edu
A Hierarchical Approach to Efficient Reinforcement Learning in Deterministic Domains
AAMAS 2006 (PDF - 140KB) Abstract:
Factored representations, model-based learning, and hierar- chies are well-studied techniques for i...

Ernst, Damien , Pierre Geurts and Louis Wehenkel

E-mail: dernst@ulg.ac.be
Iteratively extending time horizon reinforcement learning
Proceedings of ECML 2003 (Postscript - 6 KB) Abstract:
Reinforcement learning aims to determine an (infinite time horizon) optimal control policy fro...

Gabor, Zoltan , Zs. Kalmar and Cs. Szepesvari ( szepes@mindmaker.kfkipark.hu)

Multi-criteria Reinforcement Learning
Technical Report TR-98-115, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1998 ( gzipped Postscript - 153 Kb) Abstract:
This is a longer version of the paper published in ICML'98. We consider multi-criteria sequential...

Goldsmith, Judy

E-mail: goldsmit at cs.uky.edu
Papers
various Abstract:
...

Guestrin, Carlos , Daphne Koller, Ronald Parr

E-mail: guestrin@cs.stanford.edu
Max-norm Projections for Factored MDPs
AAAI Spring Symposium, Stanford, California, March 2001 (Postscript - 323KB) Abstract:
Markov Decision Processes (MDPs) provide a coherent mathematical framework for planning under uncert...

Littman, Michael, Thomas L. Dean and Leslie Pack Kaelbling

On the complexity of solving Markov decision problems
Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence (UAI--95) (Postscript - 256KB)
Abstract: Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP...

Munos, Remi , Andrew Moore

E-mail: munos@cs.cmu.edu
Variable resolution discretization for high-accuracy solutions of optimal control problems
IJCAI'99 ( gzipped Postscript - 315KB) Abstract:
State abstraction is of central importance in reinforcement learning and Markov Decision Processes. ...

Ormoneit, Dirk , Saunak Sen

E-mail: ormoneit@stat.stanford.edu
Kernel-Based Reinforcement Learning
Department of Statistics, Stanford University, Technical Report No. 1999-8 (Postscript - 260 KB) Abstract:
Kernel-based methods have recently attracted increased attention in the machine learning literature...

Pouget, A. , Deffayet, C. and Sejnowski, T. J.

In: G. Tesauro, D. Touretzky and J. Alspector (Eds.) Advances in Neural Information Processing Systems 7, MIT Press, Cambridge, MA, 125-132 (1995). Abstract:
(no abstract available)...

Schmidhuber, Jurgen , M. Wiering

E-mail: juergen@isdia.ch
HQ-Learning: Discovering Markovian subgoals for non-Markovian reinforcement learning
Technical Report IDSIA-95-96, October 1996 ( gzipped Postscript - 111 KB) Abstract:
To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical ex...

Singh, Satinder ( bajeva@cs.colorado.edu)

Learning to Solve Markov Decision Processes
Ph.D. thesis, University of Massachusetts, Amherst, 1994 ( gzipped Postscript - 676 KB) Abstract:
ABSTRACT: This dissertation is about building learning control architectures for agents embe...

Strens, Malcolm ( mjstrens@qinetiq.com)

A Bayesian Framework for Reinforcement Learning
International Conference on Machine Learning, 2000 (pdf - 83KB) Abstract:
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estim...

Sutton, Richard ( rich@cs.umass.edu)

Planning by incremental dynamic programming
( gzipped Postscript - 55 KB) Abstract:
This paper presents the basic results and ideas of dynamic programming as they relate most d...

Szepesvari, Csaba ( szepes@mindmaker.kfkipark.hu)

General Framework for Reinforcement Learning
Proceedings of ICANN'95 Paris, France, Oct. 1995, Vol. II., pp. 165-170 ( gzipped Postscript - ??) Abstract:
In this article we propose a general framework for sequential decision making. The framework is base...

Szepesvari, Csaba ( szepes@mindmaker.kfkipark.hu)

Dynamic Concept Model Learns Optimal Policies
Proceedings of IEEE WCCI ICNN'94 Vol. III. pp. 1738-1742. Orlando, Florida, June 1994 ( gzipped Postscript - ??) Abstract:
Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and...

Szepesvári, Csaba ( szepes@mindmaker.kfkipark.hu)

Non-Markovian Policies in Sequential Decision Problems Non-Markovian Policies in Sequential Decision Problems
Acta Cybernetica, to appear (1998) ( gzipped Postscript - ) Abstract:
In this article we prove the validity of the Bellman Optimality Equation and related results for seq...

Yin, ChangMing

E-mail: cmyin@cs167.net
Forgetting Algorithm for Q-learning
unpublished (Microsoft Word - 120kb) Abstract:
...

Gabor, Zoltan, Zs. Kalmár and Cs. Szepesvári ( szepes@mindmaker.kfkipark.hu)

Multi-criteria Reinforcement Learning
Proceedings of International Conference of Machine Learning, 1998 ( gzipped Postscript - 103 Kb) Abstract:
We consider multi-criteria sequential decision making problems where the vector-valued evaluations a...

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN

E-mail: zhaogang@ieee.org
RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 171kb) Abstract:
This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for explo...

ZHAO, Gang , Shoji TATSUMI,Ruoying SUN

E-mail: zhaogang@ieee.org
Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (pdf - 172kb) Abstract:
In this paper, based on discussing different exploration methods, replacing the pre-action-selector ...