Publications on Average-reward/Undiscounted methods



Arruda, Edilson F., Fragoso, Marcelo D. (efarruda@gmail.com)
Time Aggregated Markov Decision Processes via Standard Dynamic Programming
Operations Research Letters (2011) Abstract:
This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.

Beleznay, Ferenc , Tamas Grobler, Csaba Szepesvari( beleznay@cs.elte.hu)
Comparing Value-Function Estimation Algorithms in Undiscounted Problems
unpublished ( gzipped Postscript - 104) Abstract:
We compare scaling properties of several value-function estimation algorithms. In particular, we pr...

Boutilier, Craig , Martin L. Puterman ( cebly@cs.ubc.co)
Process-Oriented Planning and Average-Reward Optimality
IJCAI-95 ( gzipped Postscript - 47 KB) Abstract:
We argue that many AI planning problems should be viewed as process-oriented, where the aim...

Garcia, Frédérick , Seydina Ndiaye( fgarcia@toulouse.inra.fr)
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98 ( gzipped Postscript - 96 KB) Abstract:
In this article we consider the particular framework of non-stationary finite-horizon Markov Decis...

Mahadevan, Sridhar ( mahadeva@cps.msu.edu)
Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results
Machine Learning , Special Issue on Reinforcement Learning (edited by Leslie Kaebling), vol. 22, pp. 159-196, 1996. (compressed Postscript - ) Abstract:
This paper presents a detailed study of average reward reinforcement learning, an undiscounted opti...

Ok, DoKyeong , Prasad Tadepalli( tadepall@cs.orst.edu)
Auto-exploratory average reward reinforcement learning
Proceedings of AAAI-96 (Postscript - 130 KB) Abstract:
We introduce a model-based average-reward Reinforcement Learning method called H-learning and compa...

Singh, Satinder ( baveja@cs.colorado.edu)
Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
Proceedings of the Twelth National Conference on Artificial Intelligence ( gzipped Postscript - 85 KB) Abstract:
Reinforcement learning (RL) has become a central paradigm for solving learning-control problems in ...

Tadepalli, Prasad , DoKyeong Ok
E-mail: tadepall@cs.orst.edu
Scaling up average reward reinforcement learning by approximating the domain models and the value function
Proceedings of the Thirteenth International Conference on Machine Learning, pages 471-479. Morgan Kaufmann, 1996 (Postscript - ) Abstract:
Almost all the work in Average-reward Reinforcement Learning (ARL) so far has focused on table-base...

Tadepalli, Prasad , DoKyeong Ok
E-mail: tadepall@cs.orst.edu
Model-based Average Reward Reinforcement Learning
Artificial Intelligenec (Postscript - 53 pages) Abstract:
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving re...