Publications on Theoretical Analysis

Abe, Naoki , Alan Biermann and Philip M. Long

E-mail: nabe@us.ibm.com
Reinforcement learning with immediate rewards and linear hypotheses
Algorithmica (Postscript - 350KB) Abstract:
We perform theoretical analysis of algorithms for reinforcement learning with immediate rewards usi...

Beleznay, Ferenc , Tamas Grobler, Csaba Szepesvari

E-mail: beleznay@cs.elte.hu
Comparing Value-Function Estimation Algorithms in Undiscounted Problems
unpublished ( gzipped Postscript - 104) Abstract:
We compare scaling properties of several value-function estimation algorithms. In particular, we pr...

Bhulai, Sandjai

E-mail: sbhulai@cs.vu.nl
Markov Decision Processes: the control of high-dimensional systems
Ph.D. Thesis, Vrije Universiteit, 2002 (Postscript - ) Abstract:
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...

Borkar, Vivek , Vijaymohan R. Konda

E-mail: borkar@csa.iisc.ernet.in
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences (Postscript - 561 KB) Abstract:
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...

Dietterich, Thomas

E-mail: tgd@cs.orst.edu
The MAXQ Method for Hierarchical Reinforcement Learning
Proceedings of the International Conference on Machine Learning, 1998 ( gzipped Postscript - 53KB) Abstract:
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decompo...

Dietterich, Thomas

E-mail: tgd@cs.orst.edu
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
journal version; under review ( gzipped Postscript - 359KB) Abstract:
This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decompo...

Dimitrakakis, Christos , Samy Bengio

E-mail: olethros@myrealbox.com
Gradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation) ( gzipped Postscript - 185KB) Abstract:
The exploration-exploitation trade-off that arises when one considers simple point estimates of exp...

Dimitrakakis, Christos ( dimitrak@idiap.ch)

Nearly optimal exploration-exploitation decision thresholds
ICANN 2006 ( gzipped Postscript - 170Kb) Abstract:
While in general trading off exploration and exploitation in reinforcement learning is hard, under ...

Gabor, Zoltan , Zs. Kalmar and Cs. Szepesvari

E-mail: szepes@mindmaker.kfkipark.hu
Multi-criteria Reinforcement Learning
Technical Report TR-98-115, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1998 ( gzipped Postscript - 155 KB) Abstract:
This is a longer version of the paper published in ICML'98. We consider multi-criteria sequential...

Garcia, Frédérick , Florent Serre

E-mail: fgarcia@toulouse.inra.fr
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000 ( gzipped Postscript - 78383 KB) Abstract:
We propose in this paper an asymptotic approximation of online TD(lambda) with accumulating eligib...

Garcia, Frédérick , Seydina Ndiaye

E-mail: fgarcia@toulouse.inra.fr
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98 ( gzipped Postscript - 96 KB) Abstract:
In this article we consider the particular framework of non-stationary finite-horizon Markov Decis...

Goldsmith, Judy , Michael L. Littman, Martin Mundhenk

E-mail: goldsmit at cs.uky.edu
The Complexity of Plan Existence and Evaluation in Probabilistic Domains
Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI--97) (Postscript - 277KB) Abstract:
We examine the computational complexity of testing and finding small plans in probabilistic plannin...

Kalmár, Zsolt , Cs. Szepesvári and A. Lorincz

E-mail: kalmar@mindmaker.kfkipark.hu
Module-Based Reinforcement Learning: Experiments with a Real Robot
Machine Learning ( gzipped Postscript - 755 KB) Abstract:
The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, ...

Konda, Vijaymohan , Vivek S. Borkar

E-mail: konda@mit.edu
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization (Postscript - 580 KB) Abstract:
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...

Likas, Aristidis

E-mail: arly@cs.uoi.gr
A Reinforcement Learning Approach to On-line Clustering
Neural Computation, to appear ( gzipped Postscript - 80KB) Abstract:
A general technique is proposed for embedding on-line clustering algorithms based on competitive l...

Littman, Michael

E-mail: mlittman@cs.duke.edu
Probabilistic Propositional Planning: Representations and Complexity
Proceedings of the Fourteenth National Conference on Artificial Intelligence (Postscript - 360KB) Abstract:
Many representations for probabilistic propositional planning problems have been studied. This pap...

Littman, Michael , Csaba Szepesvári( mlittman@cs.duke.edu)

A generalized reinforcement-learning model: Convergence and applications
Proceedings of the Thirteenth International Conference on Machine Learning (Postscript - 170KB) Abstract:
Reinforcement learning is the process by which an autonomous agent uses its experience interacting ...

Littman, Michael

E-mail: mlittman@cs.duke.edu
Memoryless policies: Theoretical limitations and practical results
From Animals to Animats 3: Proceedings of the Third International Conference on Simulation of Adaptive Behavior (Postscript - 416KB) Abstract:
One form of adaptive behavior is "goal-seeking" in which an agent acts so as to minimize the time i...

Littman, Michael

E-mail: mlittman@cs.duke.edu
An optimization-based categorization of reinforcement learning environments
Abstract:
This paper proposes a categorization of reinforcement learning environments based on the optimizati...

Matt, Andreas , Georg Regensburger

E-mail: andreas.matt@uibk.ac.at
"Reinforcement Learning for Several Environments: Theory and Applications"
A joint PhD thesis by Andreas Matt and Georg Regensburger Abstract:
Until now reinforcement learning has been applied to learn the optimal behavior for a single environ...

Munos, Remi

E-mail: munos@cs.cmu.edu
Reinforcement Learning for Continuous Stochastic Control Problems
Neural Information Processing Systems, 1997 (Postscript - 809KB) Abstract:
This paper is concerned with the problem of Reinforcement Learning for continuous state space and ...

Munos, Remi

E-mail: munos@cs.cmu.edu
A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method
IJCAI'1997 (compressed Postscript - 225KB) Abstract:
In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal contr...

Munos, Remi

E-mail: munos@cs.cmu.edu
A Convergent Reinforcement Learning algorithm in the continuous case : the Finite-Element Reinforcement Learning
International Conference on Machine Learning, 1996 (Postscript - 197KB) Abstract:
This paper presents a direct reinforcement learning algorithm, called Finite-Element Reinforcem...

Munos, Remi

E-mail: munos@cs.cmu.edu
A general convergence method for Reinforcement Learning in the continuous case
European Conference on Machine Learning, 1998 (compressed Postscript - 230KB) Abstract:
In this paper, we propose a general method for designing convergent Reinforcement Learning algorit...

Munos, Remi , Leemon Baird, Andrew Moore

E-mail: munos@cs.cmu.edu
Gradient Descent Approaches to Neural-Net-Based Solutions of the Hamilton-Jacobi-Bellman Equation.
IJCNN'99 ( gzipped Postscript - 128KB) Abstract:
In this paper we investigate new approaches to dynamic-programming-based optimal control of contin...

Munos, Remi

E-mail: remi.munos@polytechnique.fr
Error Bounds for Approximate Policy Iteration
Icml 2003 ( gzipped Postscript - 80 KB) Abstract:
In Dynamic Programming, convergence of algorithms such as Value Iteration or Policy Iteration resul...

Randlov, Jette

E-mail: randlov@nbi.dk
Shaping in Reinforcement Learning by Changing the Physics of the Problem
ICML-2000 ( gzipped Postscript - 65 ) Abstract:
Children learn to ride a bicycle by using training wheels. They are actually trying to learn one ta...

Reynolds, Stuart

E-mail: sir@cs.bham.ac.uk
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
UKCI'02 ( gzipped Postscript - 80) Abstract:
This paper shows that general discounted return estimating reinforcement learning algorithms ca...

Reynolds, Stuart

E-mail: sir@cs.bham.ac.uk
Optimistic Initial Q-values and the max Operator
UKCI'01 ( gzipped Postscript - 80) Abstract:
This paper provides a surprising new insight into the role of the max operator used by reinforcement...

Reynolds, Stuart

E-mail: sir@cs.bham.ac.uk
Reinforcement Learning with Exploration
PhD Thesis, School of Computer Science, The University of Birmingham, B15 2TT, UK ( gzipped Postscript - 1.1MB) Abstract:
Reinforcement Learning (RL) techniques may be used to find optimal controllers for multistep decisio...

Schmidhuber, Juergen ( juergen@idsia.ch)

Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers (HTML - 200 KB) Abstract:
The ultimate predictive world model is Solomonoff's Bayesian induction scheme based on the univers...

Singh, Satinder , Richard Sutton

E-mail: baveja@cs.colorado.edu
Reinforcement Learning with Replacing Eligibility Traces
Machine Learning ( gzipped Postscript - ) Abstract:
...

Singh, Satinder , Peter Dayan

E-mail: baveja@cs.colorado.edu
Analytical Mean Squared Error Curves for Temporal Difference Learning
Machine Learning ( gzipped Postscript - ) Abstract:
...

Singh, Satinder , Tommi Jaakkola, Michael Jordan

E-mail: baevja@cs.colorado.edu
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Proceedings of the Eleventh International Machine Learning Conference ( gzipped Postscript - 59 KB) Abstract:
...

Singh, Satinder , T. Jaakkola, M.L. Littman and Cs. Szepesvari( baveja@cs.colorado.edu)

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Machine Learning, to appear, 1998 ( Postscript - 210 KB) Abstract:
An important application of reinforcement learning (RL) is to finite-state control problems and one ...

Szepesvari, Csaba ( szepes@mindmaker.kfkipark.hu)

The Asymptotic Convergence-Rate of Q-learning
NIPS'97 ( gzipped Postscript - 64 KB) Abstract:
In this paper we show that for discounted MDPs with discount factor \gamma>1/2 the asymptotic rate o...

Szepesvari, Csaba ( szepes@mindmaker.kfkipark.hu)

Learning and Exploitation do not Conflict Under Minimax Optimality
Proceedings of 9th European Conference of Machine Learning, pp. 242-249, 1997 ( gzipped Postscript - 40 KB) Abstract:
We show that adaptive real time dynamic programming extended with the action selection strategy whic...

Szepesvari, Csaba ( szepes@mindmaker.kfkipark.hu)

Some basic facts concerning minimax sequential decision processes
Technical Report TR-96-100, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1996. ( gzipped Postscript - 65 KB) Abstract:
It is shown that for discounted minimax sequential decision processes the evaluation function of a s...

Szepesvari, Csaba

E-mail: szepes@mindmaker.kfkipark.hu
General Framework for Reinforcement Learning
Proceedings of ICANN'95 Paris, France, Oct. 1995, Vol. II., pp. 165-170 ( gzipped Postscript - ??) Abstract:
In this article we propose a general framework for sequential decision making. The framework is base...

Szepesvari, Csaba

E-mail: szepes@mindmaker.kfkipark.hu
Dynamic Concept Model Learns Optimal Policies
Proceedings of IEEE WCCI ICNN'94 Vol. III. pp. 1738-1742. Orlando, Florida, June 1994 ( gzipped Postscript - ??) Abstract:
Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and...

Szepesvari, Csaba

E-mail: szepes@mindmaker.hu
Efficient Approximate Planning in Continuous Space Markovian Decision Problems
unpublished ( gzipped Postscript - 128) Abstract:
In this article we consider Monte-Carlo planning algorithms for planning in continuous state-space, ...

Tran, Ronan , Kana Miza

E-mail: ronan82ilmfa@yahoo.com
Vietnam Travel Guide - The updated insiders
unpublished (html - ) Abstract:
Vietnam Travel & Tourism Guide with in-depth guidebook information, photos, maps, hotels, tours, air...

Tsitsiklis, John , Ben Van Roy

E-mail: jnt@mit.edu
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control, Vol. 42, No. 5, May 1997, pp. 674-690. (Postscript - 2 MB) Abstract:
We discuss the temporal-difference learning algorithm, as applied to approximating cost-to-go funct...

Tsitsiklis, John , Benjamin Van Roy

E-mail: jnt@mit.edu
Feature-Based Methods for Large Scale Dynamic Programming
Machine Learning, Vol. 22, 1996, pp. 59-94. ( PDF - 2.9 MB) Abstract:
We develop a methodological framework and present a few different ways in which dynamic programmin...

Van Roy, Benjamin

E-mail: bvr@stanford.edu
Learning and Value Function Approximation in Complex Decision Processes
PhD Thesis (Postscript - 1691 KB) Abstract:
In principle, a wide variety of sequential decision problems -- ranging from dynamic resource alloc...

Woergotter, Florentin , Bernd Porr

E-mail: worgott@chaos.gwdg.de
Temporal sequence learning, prediction and control - A review of different models and their relation to biological mechanisms
Neural Computation, 17: 245-319 Abstract:
A review of RL in view of its relation to classical conditioning and the biophysics of the underlyin...

Zhuang, X. ( x_sys@tom.com)

The Strategy Entropy of Reinforcement Learning in Discrete State Space
conference proceedings Abstract:
In this paper, the concept of entropy is introduced into reinforcement learning. The definitions of ...

Zhuang, Xiaodong

E-mail: windok@21cn.com
MULTI-SCALE REINFORCEMENT LEARNING WITH FUZZY STATE
conference proceedings (Compressed PDF - 207KB) Abstract:
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...

gabor, zoltan , Zs. Kalmár and Cs. Szepesvári

E-mail: szepes@mindmaker.kfkipark.hu
Multi-criteria Reinforcement Learning
Proceedings of International Conference of Machine Learning, 1998 ( gzipped Postscript - 103 KB) Abstract:
We consider multi-criteria sequential decision making problems where the vector-valued evaluations a...