Publications on Theoretical Analysis
Abe,
Naoki
, Alan Biermann and Philip M. LongE-mail: nabe@us.ibm.com
Reinforcement learning with immediate rewards and linear hypotheses
Algorithmica
(Postscript - 350KB)
Abstract:
We perform theoretical analysis of algorithms for reinforcement
learning with immediate rewards usi...
Beleznay,
Ferenc
, Tamas Grobler, Csaba SzepesvariE-mail: beleznay@cs.elte.hu
Comparing Value-Function Estimation Algorithms in Undiscounted Problems
unpublished
( gzipped Postscript - 104)
Abstract:
We compare scaling properties of several value-function estimation algorithms.
In particular, we pr...
Bhulai,
Sandjai
E-mail: sbhulai@cs.vu.nl
Markov Decision Processes: the control of high-dimensional systems
Ph.D. Thesis, Vrije Universiteit, 2002
(Postscript - )
Abstract:
We develop algorithms for the computation of (nearly) optimal decision rules in high-dimensional sys...
Borkar,
Vivek
, Vijaymohan R. KondaE-mail: borkar@csa.iisc.ernet.in
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences
(Postscript - 561 KB)
Abstract:
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...
Dietterich,
Thomas
E-mail: tgd@cs.orst.edu
The MAXQ Method for Hierarchical Reinforcement Learning
Proceedings of the International Conference on Machine Learning, 1998
( gzipped Postscript - 53KB)
Abstract:
This paper presents a new approach to hierarchical reinforcement
learning based on the MAXQ decompo...
Dietterich,
Thomas
E-mail: tgd@cs.orst.edu
Hierarchical Reinforcement Learning with the MAXQ Value
Function Decomposition
journal version; under review
( gzipped Postscript - 359KB)
Abstract:
This paper presents a new approach to hierarchical
reinforcement learning based on the MAXQ decompo...
Dimitrakakis,
Christos
, Samy BengioE-mail: olethros@myrealbox.com
Gradient Estimates of Return
IDIAP Research Report (abridged version presented at PASCAL workshop on principled methods of trading exploration and exploitation)
( gzipped Postscript - 185KB)
Abstract:
The exploration-exploitation trade-off that arises when one considers
simple point estimates of exp...
Dimitrakakis,
Christos
( dimitrak@idiap.ch)
Nearly optimal exploration-exploitation decision thresholds
ICANN 2006
( gzipped Postscript - 170Kb)
Abstract:
While in general trading off exploration and exploitation in
reinforcement learning is hard, under ...
Gabor,
Zoltan
, Zs. Kalmar and Cs. Szepesvari E-mail:
szepes@mindmaker.kfkipark.hu
Multi-criteria Reinforcement Learning
Technical Report TR-98-115, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1998
( gzipped Postscript - 155 KB)
Abstract:
This is a longer version of the paper published in ICML'98.
We consider multi-criteria sequential...
Garcia,
Frédérick
, Florent SerreE-mail: fgarcia@toulouse.inra.fr
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000
( gzipped Postscript - 78383 KB)
Abstract:
We propose in this paper an asymptotic approximation of
online TD(lambda) with accumulating eligib...
Garcia,
Frédérick
, Seydina NdiayeE-mail: fgarcia@toulouse.inra.fr
A Learning Rate Analysis of Reinforcement Learning Algorithms in Fine-Horizon
ICML'98
( gzipped Postscript - 96 KB)
Abstract:
In this article we consider the particular framework of non-stationary
finite-horizon Markov Decis...
Goldsmith,
Judy
, Michael L. Littman, Martin MundhenkE-mail: goldsmit at cs.uky.edu
The Complexity of Plan Existence and Evaluation in Probabilistic Domains
Proceedings of the Thirteenth Annual Conference on Uncertainty in
Artificial Intelligence (UAI--97)
(Postscript - 277KB)
Abstract:
We examine the computational complexity of testing and finding small
plans in probabilistic plannin...
Kalmár,
Zsolt
, Cs. Szepesvári and A. LorinczE-mail: kalmar@mindmaker.kfkipark.hu
Module-Based Reinforcement Learning: Experiments with a Real Robot
Machine Learning
( gzipped Postscript - 755 KB)
Abstract:
The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, ...
Konda,
Vijaymohan
, Vivek S. Borkar E-mail: konda@mit.edu
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
(Postscript - 580 KB)
Abstract:
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...
Likas,
Aristidis
E-mail: arly@cs.uoi.gr
A Reinforcement Learning Approach to On-line Clustering
Neural Computation, to appear
( gzipped Postscript - 80KB)
Abstract:
A general technique is proposed for
embedding on-line clustering algorithms based on competitive
l...
Littman,
Michael
E-mail: mlittman@cs.duke.edu
Probabilistic Propositional Planning: Representations and Complexity
Proceedings of the Fourteenth
National Conference on Artificial Intelligence
(Postscript - 360KB)
Abstract:
Many representations for probabilistic propositional planning problems
have been studied. This pap...
Littman,
Michael
, Csaba Szepesvári( mlittman@cs.duke.edu)
A generalized
reinforcement-learning model: Convergence and applications
Proceedings of the Thirteenth International Conference on Machine
Learning
(Postscript - 170KB)
Abstract:
Reinforcement learning is the process by which an autonomous agent
uses its experience interacting ...
Littman,
Michael
E-mail: mlittman@cs.duke.edu
Memoryless policies: Theoretical limitations and practical results
From Animals to Animats 3: Proceedings
of the Third International Conference on Simulation of Adaptive
Behavior
(Postscript - 416KB)
Abstract:
One form of adaptive behavior is "goal-seeking" in which an agent acts
so as to minimize the time i...
Littman,
Michael
E-mail: mlittman@cs.duke.edu
An optimization-based categorization of reinforcement learning environments
Abstract:
This paper proposes a categorization of reinforcement learning
environments based on the optimizati...
Matt,
Andreas
, Georg RegensburgerE-mail: andreas.matt@uibk.ac.at
"Reinforcement Learning for Several Environments: Theory and Applications"
A joint PhD thesis by Andreas Matt and Georg Regensburger
Abstract:
Until now reinforcement learning has been applied to learn the optimal behavior for a single environ...
Munos,
Remi
E-mail: munos@cs.cmu.edu
Reinforcement Learning for Continuous Stochastic Control Problems
Neural Information Processing Systems, 1997
(Postscript - 809KB)
Abstract:
This paper is concerned with the problem of Reinforcement Learning for continuous state
space and ...
Munos,
Remi
E-mail: munos@cs.cmu.edu
A convergent Reinforcement Learning algorithm in the continuous case based on a Finite Difference method
IJCAI'1997
(compressed Postscript - 225KB)
Abstract:
In this paper, we propose a convergent Reinforcement Learning algorithm for solving optimal
contr...
Munos,
Remi
E-mail: munos@cs.cmu.edu
A Convergent Reinforcement Learning algorithm in the continuous case : the Finite-Element Reinforcement Learning
International
Conference on Machine Learning, 1996
(Postscript - 197KB)
Abstract:
This paper presents a direct reinforcement learning algorithm, called Finite-Element
Reinforcem...
Munos,
Remi
E-mail: munos@cs.cmu.edu
A general convergence method for Reinforcement Learning in the continuous case
European Conference on Machine Learning, 1998
(compressed Postscript - 230KB)
Abstract:
In this paper, we propose a general method for designing convergent Reinforcement Learning
algorit...
Munos,
Remi
, Leemon Baird, Andrew MooreE-mail: munos@cs.cmu.edu
Gradient Descent Approaches to Neural-Net-Based Solutions of
the Hamilton-Jacobi-Bellman Equation.
IJCNN'99
( gzipped Postscript - 128KB)
Abstract:
In this paper we investigate new approaches to
dynamic-programming-based optimal control of contin...
Munos,
Remi
E-mail: remi.munos@polytechnique.fr
Error Bounds for Approximate Policy Iteration
Icml 2003
( gzipped Postscript - 80 KB)
Abstract:
In Dynamic Programming, convergence of algorithms such as Value Iteration
or Policy Iteration resul...
Randlov,
Jette
E-mail: randlov@nbi.dk
Shaping in Reinforcement Learning by Changing the Physics of the Problem
ICML-2000
( gzipped Postscript - 65 )
Abstract:
Children learn to ride a bicycle by using training wheels.
They are actually trying to learn one ta...
Reynolds,
Stuart
E-mail: sir@cs.bham.ac.uk
The Stability of General Discounted Reinforcement Learning with Linear Function Approximation
UKCI'02
( gzipped Postscript - 80)
Abstract:
This paper shows that general discounted return estimating
reinforcement learning algorithms ca...
Reynolds,
Stuart
E-mail: sir@cs.bham.ac.uk
Optimistic Initial Q-values and the max Operator
UKCI'01
( gzipped Postscript - 80)
Abstract:
This paper provides a surprising new insight into the role of the max operator used by reinforcement...
Reynolds,
Stuart
E-mail: sir@cs.bham.ac.uk
Reinforcement Learning with Exploration
PhD Thesis, School of Computer Science, The University of Birmingham, B15 2TT, UK
( gzipped Postscript - 1.1MB)
Abstract:
Reinforcement Learning (RL) techniques may be used to find optimal controllers for multistep decisio...
Schmidhuber,
Juergen
( juergen@idsia.ch)
Theory of universal optimal reinforcement learning machines (with links to work by Marcus Hutter and Schmidhuber)
Several journal papers and conference papers
(HTML - 200 KB)
Abstract:
The ultimate predictive
world model is Solomonoff's Bayesian induction scheme based on
the univers...
Singh,
Satinder
, Richard SuttonE-mail: baveja@cs.colorado.edu
Reinforcement Learning with Replacing Eligibility Traces
Machine Learning
( gzipped Postscript - )
Abstract:
...
Singh,
Satinder
, Peter DayanE-mail: baveja@cs.colorado.edu
Analytical Mean Squared Error Curves for Temporal Difference Learning
Machine Learning
( gzipped Postscript - )
Abstract:
...
Singh,
Satinder
, Tommi Jaakkola, Michael JordanE-mail: baevja@cs.colorado.edu
Learning Without State-Estimation in Partially Observable Markovian Decision Processes
Proceedings of the Eleventh International Machine Learning Conference
( gzipped Postscript - 59 KB)
Abstract:
...
Singh,
Satinder
, T. Jaakkola, M.L. Littman and Cs. Szepesvari( baveja@cs.colorado.edu)
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
Machine Learning, to appear, 1998
( Postscript - 210 KB)
Abstract:
An important application of reinforcement learning (RL) is to finite-state control problems and one ...
Szepesvari,
Csaba
( szepes@mindmaker.kfkipark.hu)
The Asymptotic Convergence-Rate of Q-learning
NIPS'97
( gzipped Postscript - 64 KB)
Abstract:
In this paper we show that for discounted MDPs with discount factor \gamma>1/2 the asymptotic rate o...
Szepesvari,
Csaba
( szepes@mindmaker.kfkipark.hu)
Learning and Exploitation do not Conflict Under Minimax Optimality
Proceedings of 9th European Conference of Machine Learning, pp. 242-249, 1997
( gzipped Postscript - 40 KB)
Abstract:
We show that adaptive real time dynamic programming extended with the action selection strategy whic...
Szepesvari,
Csaba
( szepes@mindmaker.kfkipark.hu)
Some basic facts concerning minimax sequential decision processes
Technical Report TR-96-100, "Attila József" University, Research Group on Artificial Intelligence Szeged, HU-6700, 1996.
( gzipped Postscript - 65 KB)
Abstract:
It is shown that for discounted minimax sequential decision processes the evaluation function of a s...
Szepesvari,
Csaba
E-mail: szepes@mindmaker.kfkipark.hu
General Framework for Reinforcement Learning
Proceedings of ICANN'95 Paris, France, Oct. 1995, Vol. II., pp. 165-170
( gzipped Postscript - ??)
Abstract:
In this article we propose a general framework for sequential decision making. The framework is base...
Szepesvari,
Csaba
E-mail: szepes@mindmaker.kfkipark.hu
Dynamic Concept Model Learns Optimal Policies
Proceedings of IEEE WCCI ICNN'94 Vol. III. pp. 1738-1742. Orlando, Florida, June 1994
( gzipped Postscript - ??)
Abstract:
Reinforcement learning is a flourishing field of neural methods. It has a firm theoretical basis and...
Szepesvari,
Csaba
E-mail: szepes@mindmaker.hu
Efficient Approximate Planning in Continuous Space Markovian Decision Problems
unpublished
( gzipped Postscript - 128)
Abstract:
In this article we consider Monte-Carlo planning algorithms for planning in continuous state-space, ...
Tran,
Ronan
, Kana MizaE-mail: ronan82ilmfa@yahoo.com
Vietnam Travel Guide - The updated insiders
unpublished
(html - )
Abstract:
Vietnam Travel & Tourism Guide with in-depth guidebook information, photos, maps, hotels, tours, air...
Tsitsiklis,
John
, Ben Van RoyE-mail: jnt@mit.edu
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control,
Vol. 42, No. 5, May 1997, pp. 674-690.
(Postscript - 2 MB)
Abstract:
We discuss the temporal-difference learning algorithm, as applied to
approximating cost-to-go funct...
Tsitsiklis,
John
, Benjamin Van RoyE-mail: jnt@mit.edu
Feature-Based Methods for Large Scale Dynamic Programming
Machine Learning, Vol. 22,
1996, pp. 59-94.
( PDF - 2.9 MB)
Abstract:
We develop a methodological framework and present a few
different ways in which dynamic programmin...
Van Roy,
Benjamin
E-mail: bvr@stanford.edu
Learning and Value Function Approximation in Complex Decision Processes
PhD Thesis
(Postscript - 1691 KB)
Abstract:
In principle, a wide variety of sequential decision problems
-- ranging from dynamic resource alloc...
Woergotter,
Florentin
, Bernd PorrE-mail: worgott@chaos.gwdg.de
Temporal sequence learning, prediction and control - A review of different models and their relation to biological mechanisms
Neural Computation, 17: 245-319
Abstract:
A review of RL in view of its relation to classical conditioning and the biophysics of the underlyin...
Zhuang,
X.
( x_sys@tom.com)
The Strategy Entropy of Reinforcement Learning in Discrete State Space
conference proceedings
Abstract:
In this paper, the concept of entropy is introduced into reinforcement learning. The definitions of ...
Zhuang,
Xiaodong
E-mail: windok@21cn.com
MULTI-SCALE REINFORCEMENT LEARNING WITH FUZZY STATE
conference proceedings
(Compressed PDF - 207KB)
Abstract:
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...
gabor,
zoltan
, Zs. Kalmár and Cs. Szepesvári E-mail: szepes@mindmaker.kfkipark.hu
Multi-criteria Reinforcement Learning
Proceedings of International Conference of Machine Learning, 1998
( gzipped Postscript - 103 KB)
Abstract:
We consider multi-criteria sequential decision making problems where the vector-valued evaluations a...