Publications on Temporal Difference Learning
Bandera,
C.
, V. Francisco, B. Jose, M. Harmon, and L. BairdResidual q-learning applied to visual attention.
Proceedings of the Thirteenth International Conference on Machine Learning, pages 20-27. Morgan Kaufmann, 1996
( HTML)
Abstract:
Foveal vision features imagers with graded acuity coupled with context
sensitive sensor gaze contro...
Borkar,
Vivek
, Vijaymohan R. Konda( borkar@csa.iisc.ernet.in)
Actor-Critic algorithm as multi-time scale stochastic approximation algorithm
'Sadhana', Indian Academy of Sciences
(Postscript - 561 KB)
Abstract:
The actor-critic algorithm of Barto et al for simulation-based optimization of Markov decision proce...
Boyan,
Justin
, A. Moore( Justin.Boyan@cs.cmu.edu)
Learning evaluation functions for large acyclic domains
Proceedings of the Thirteenth International Conference on Machine Learning, pages 63-70. Morgan Kaufmann, 1996.
(Postscript - 147 KB)
Abstract:
Some of the most successful recent applications of reinforcement
learning have used neural network...
Boyan,
Justin
, Michael L. Littman( jab+@cs.cmu.edu)
Packet Routing in Dynamically Changing Networks: A Reinforcement
Learning Approach
Advances in
Neural Information Processing Systems
(Postscript - 155KB)
Abstract:
This paper describes the Q-routing algorithm for packet routing, in
which a reinforcement learning ...
Coulom,
Rémi
E-mail: Remi.Coulom@imag.fr
Reinforcement Learning Using Neural Networks, with Applications to Motor Control
PhD thesis
(html - 1Mb)
Abstract:
This thesis is a study of practical methods to estimate value functions with feedforward neural netw...
Coulom,
Rémi
E-mail: Remi.Coulom@free.fr
Feedforward Neural Networks in Reinforcement Learning Applied to High-dimensional Motor Control
Proceedings of ALT2002
(pdf - 139 Kb)
Abstract:
Local linear function approximators are often preferred to feedforward neural networks to estimate v...
Dietterich,
Thomas
, W. ZhangE-mail: tgd@cs.orst.edu
A Reinforcement
Learning Approach to Job-shop Scheduling
Proceedings of IJCAI95
( gzipped Postscript - )
Abstract:
We apply reinforcement learning methods to learn domain-specific
heuristics for job shop scheduling...
Francois,
Rivest
, Doina PrecupE-mail: rivestfr@iro.umontreal.ca
Combining TD-learning with Cascade-correlation Networks
ICML 2003
Abstract:
Using neural networks to represent value
functions in reinforcement learning algorithms
often invo...
Gadaleta,
Sabino
, Gerhard Dangelmayr( sabino@math.colostate.edu)
Optimal Chaos Control through reinforcement learning
Chaos, 9, 775, 1999
Abstract:
A general purpose chaos control algorithm based on reinforcement learning is
introduced and applie...
Garcia,
Frédérick
, Florent Serre( fgarcia@toulouse.inra.fr)
Efficient Asymptotic Approximation in Temporal Difference Learning
European Conference on Artificial Intelligence ECAI'2000
( gzipped Postscript - 78383 KB)
Abstract:
We propose in this paper an asymptotic approximation of
online TD(lambda) with accumulating eligib...
Ghory,
Imran
E-mail: imran@bits.bris.ac.uk
Reinforcement Learning in Board Games
Technical Report CSTR-04-004, Department of Computer Science, University of Bristol, May 2004.
(pdf - 1097439 bytes)
Abstract:
This project investigates the application of the TD(lambda) reinforcement learning algorithm and neu...
Konda,
Vijaymohan
, Vivek S. Borkar ( konda@mit.edu )
Learning Algorithms for Markov Decision Processes
SIAM Journal on Control and Optimization
(Postscript - 619 KB)
Abstract:
Algorithms learning the optimal policy of a Markov decision process based on simulated transitions a...
Leslie,
David
, E. J. Collins( dleslie@stats.ox.ac.uk)
Individual Q-learning in normal form games
unpublished
(PDF - 210K)
Abstract:
The single-agent multi-armed bandit problem can be solved by an
agent that learns the values of e...
Littman,
Michael
( mlittman@cs.duke.edu)
Markov games as a framework for multi-agent reinforcement learning
Proceedings of the Eleventh International
Conference on Machine Learning
(Postscript - 83KB)
Abstract:
In the Markov decision process (MDP) formalization of reinforcement
learning, a single adaptive age...
Preux,
Philippe
( ppreux@grappa.univ-lille3.fr)
Propagation of Q-values in Tabular TD(lambda)
proceedings of the ECML, 2002
( gzipped Postscript - 75 KB)
Abstract:
In this paper, we propose a new idea for tabular TD(lambda) algorithm.
In TD learning, rewards ar...
Reynolds,
Stuart
( sir@cs.bham.ac.uk)
Optimistic Initial Q-values and the max Operator
UKCI'01
( gzipped Postscript - 80)
Abstract:
This paper provides a surprising new insight into the role of the max operator used by reinforcement...
Reynolds,
Stuart
( sir@cs.bham.ac.uk)
Experience Stack Reinforcement Learning for Off-Policy Control
Cognitive Science Technical Report number CSRP-02-1, School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK. January 2002
( gzipped Postscript - 235)
Abstract:
This paper introduces a novel method for allowing backwards replay to be applied as an online learni...
Singh,
Satinder
, Richard Sutton( baveja@cs.colorado.edu)
Reinforcement Learning with Replacing Eligibility Traces
Machine Learning
( gzipped Postscript - )
Abstract:
...
Singh,
Satinder
, Peter Dayan( baveja@cs.colorado.edu)
Analytical Mean Squared Error Curves for Temporal Difference Learning
Machine Learning
( gzipped Postscript - )
Abstract:
...
Sutton,
Richard
( rich@cs.umass.edu)
Learning to predict by the method of temporal differences
Machine Learning, 3:9-44, 1988
( gzipped Postscript - 121 KB)
Abstract:
This article introduces a class of incremental learning procedures
specialized for prediction - tha...
Tesauro,
Gerald
( tesauro@watson.ibm.com)
Temporal Difference Learning and TD-Gammon
unpublished
(HTML - )
Abstract:
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-l...
Tesauro,
Gerald
( tesauro@watson.ibm.com)
TD-Gammon,
a self-teaching backgammon program, achieves master-level play
unpublished
(compressed Postscript - 120 KB)
Abstract:
TD Gammon is a neural network that is able to teach itself to play
backgammon soley by playing agai...
Thrun,
Sebastian
( thrun+@heaven.learning.cs.cmu.edu)
Learning to Play the Game of Chess
Advances in Neural Information
Processing Systems (NIPS) 7, 1995.
Abstract:
This paper presents NeuroChess, a program which learns to play chess from the final
outcome of game...
Tsitsiklis,
John
, Ben Van Roy( jnt@mit.edu)
An Analysis of Temporal-Difference Learning with Function Approximation
IEEE Transactions on Automatic Control,
Vol. 42, No. 5, May 1997, pp. 674-690.
(Postscript - 2 MB)
Abstract:
We discuss the temporal-difference learning algorithm, as applied to
approximating cost-to-go funct...
Wilson,
Stewart
E-mail: wilson@smith.rowland.org
Generalization in the XCS classifier system
Genetic Programming 1998: Proceedings of the Third Annual Conference. San Francisco, CA: Morgan Kaufmann.
( HTML)
Abstract:
This paper studies two changes to XCS, a classifier system in which
fitness is based on prediction...
Xu,
Xin
, Han-gen He and Dewen HuE-mail: xuxin_mail@263.net
Efficient Reinforcement Learning Using Recursive Least-Squares Methods
Journal of Artificial Intelligence Research, Vol.16,2002, pp:259-292
( gzipped Postscript - 700)
Abstract:
The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptiv...
Yin,
ChangMing
( cmyin@cs167.net)
unpublished
(Postscript - )
Abstract:
...
ZHAO,
Gang
, Shoji TATSUMI,Ruoying SUN( zhaogang@ieee.org)
RTP-Q: A Reinforcement Learning System with Time Constraints Exploration Planning for Accelerating the Learning Rate
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
(pdf - 171kb)
Abstract:
This paper proposes a RTP-Q reinforcement learning system which varies an efficient method for explo...
ZHAO,
Gang
, Shoji TATSUMI,Ruoying SUN( zhaogang@ieee.org)
Convergence of the Q-ae Learning on Deterministic MDPs and Its Efficiency on the Stochastic Environment
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences
(pdf - 172kb)
Abstract:
In this paper, based on discussing different exploration methods, replacing the pre-action-selector ...
Zhuang,
Xiaodong
( windok@21cn.com)
MULTI-SCALE REINFORCEMENT LEARNING WITH FUZZY STATE
conference proceedings
(Compressed PDF - 207KB)
Abstract:
In this paper, multi-scale reinforcement learning is presented based on fuzzy state. The concept of ...