Reinforcement Learning Repository at UMass, Amherst
Demos and Implementation (Domains)
This section contains programs which demonstrate
reinforcement learning in action, as an illustration of the concepts and
common algorithms. These programs might provide a useful
starting place for the implementation of reinforcement learning to solve
problems and advance research in this area. Wherever possible, source
code is included.
Please note that use of this software is restricted; you must
read this license agreement and agree to its terms
before downloading any
software from this site. Downloading the software is considered consent
to the terms.
If you would like to contribute source code or make
improvement of what is included here, contact
Bruno Castro da Silva or
Simulation of the cart and pole dynamic
a procedure for learning to balance the pole. Both are described in
Barto, Sutton, and Anderson, "Neuronlike Adaptive Elements That Can
Difficult Learning Control Problems," IEEE Trans. Syst., Man, Cybern.,
Vol. SMC-13, pp. 834--846, Sept.--Oct. 1983 Written by Rich Sutton.
Source code: cpole.tar
(16 K, requires C compiler)
Interactive Java demonstration illustrating the
gained by applying RL to the problem of Dynamic Channel
Allocation in Cellular Telephones, by Satinder Singh at the University
Fortran simulation of an elevator, written by James Lewis, and provided by
Christos Cassandras at UMASS ECE Dept. The reinforcement learning
addition to the elevator simulation was implemented by Bob Crites, CS
Dept. UMass. and John McNulty
and is described in the paper
Elevator Performance Using Reinforcement Learning.
Source code: elevator.tar.gz
(284 K) or
K). Both require a C compiler and the f2c library
to convert Fortran to c, as it incorporates c random number handling
This program is a simulation of learning the goal of moving to a
user-defined square of a grid. It uses Q-learning, and was written by
Source code:grid.tar (72 K;
requires C compiler and X11 libraries)
Interactive Demo of Q-learning
A Java swing
applet that allows the user to construct a grid by specifying danger and
target cells, and then modify various learning paramaeters. Upon completion
of learning, the learned policy is represented as arrows overlaying the grid.
Documentation is available
A french version is available
Written by Thierry Masson.
Requirements: JDK 1.3 or higher.
Source code: TM_QLearnerDemo_Src_only.jar (38.8 K).
Classes: TM_QLearnerDemo.jar (22.6 K).
Least-Squares Policy Iteration
MatLab implementation of Least-Squares Policy Iteration (LSPI) algorithm. Documentation and background is available here.
Written by Michail G. Lagoudakis and Ronald Parr.
Requirements: MATLAB V.5 or higher.
Source code: lspi.tar.gz (10.9 K),
chain.tar.gz (12.2 K), pendulum.tar.gz (26.1 K).
CSIM simulation of a production system which integrates SMART, a
model-free average-reward algorithm, to determine the optimal machine
maintenance policy. It was written by Nicholas Marchalleck and Abhijit
Gosavi, and is described in Self-Improving
Factory Simulation using Continuous-Time Average-Reward Reinforcement
Learning by Mahadevan et. al.
Source code: maint.tar
(268 K; requires CSIM v.17 and C++
MDP Q-learning: implements Q-learning on a given MDP, using
Source code: mdp-q.tar
(64 K, requires GNU C compiler)
Simulation of a car learning the proper acceleration to get up a mountain.
It uses Q-learning with CMAC as a function approximator. It is described
in (among other papers) Generalization
in Reinforcement Learning: Successful Examples Using Sparse Coarse
Coding by Rich Sutton, and was developed by Sridhar
Source code: mcar.tar
(157K; requires X11 libraries and C++ compiler)
Network Routing: Demonstrates a RL network-routing algorithm
written by Justin Boyan and Michael Littman. Described in
Packet Routing in Dynamically Changing Networks: A Reinforcement
Neural Information Processing Systems (Postscript - 155KB)
Source code: network-router.tar
(222 K); requires C compiler,
wish windowing shell (part of Tcl)
Proposed Standard for Reinforcement Learning Software
developed by Rich Sutton and Juan Carlos Santamaria, is intended to
facilitate RL research and development, and is available for C++ and
Proposed Standard Interface
A proposed standard interface
for RL systems written in C++. It provides standard interface
classes for an agent, an environment, a function approximator,
and states and actions. Written by Bohdana Ratitch.
Source code: si-classes.tar
(72 K, requires C++ compiler, documentation).
Programs written by Bohdana Ratitch using this proposed
standard interface are the following. All require a C++ compiler.
Further compiler information can be found
Reinforcement Learning Toolbox
Reinforcement Learning Toolbox is a set of classes implementing
a variety of reinforcement learning algorithms,
including TD-lambda, actor critic, prioritzed sweeping, and hierarchical learning. The toolbox also permits logging and error recognition.
Written by Gerhard Neumann and Stephan Neumann.
Download (written in C++, available for both Windows and Linux).
of the robot-on-a grid problem. Different parameters can be modified by
the user, such as the selection strategy and learning method.
Written by Gilad Mishne.
(2478 K, Windows executable), Source code: ML2_project_src.zip
(19 K), Sample grids: ML2_project_grids.zip
Rumpus Gridworld Simulator
language independent simulator
that uses TCP/IP ports for interaction with client
Requires the scripting language Ruby.
allows a gridworld to be specified using a bitmap format and
supports both local and
unique state descriptions as well as deterministic and
Written by Torbjorn Dahl.
from the Neuroinformatics Group at the University of
simulates a control loop for
closed loop control. Although originally designed
for training and testing Reinforcement Learning controllers, it also applies to other learning
and non-learning controller concepts.
Currently availabe plants:
Acrobot, bicycle, cart pole, cart double pole, pole, mountain car and maze.
Currently availabe controllers:
linear controller, Reinforcement learning Q table, neural network based Q controller.
It comes with many useful features, e.g. graphical display and statistics output,
a documentation, and many demos for quick starting.
is an open source set of Java classes
for quickly experimenting with single- and multi-agent reinforcement
learning schemes (new problems or new algorithms)
by Francesco De Comité.
Version 2, with with a major refactoring of classes, English
renamings and synthetic documentation released in November 2006.
Connectionist Q-learning Java Framework
The Free Connectionist Q-learning Java
Framework is an open source Java library for developing
learning systems using reinforcement learning and neural networks,
by Dominik Kapusta.
of RL and RNNs learning
in the Neuropilot Domain, including value-gradient learning.
The Pinball Domain
Pinball domain is
a fairly challenging 4-dimensional continuous and dynamic reinforcement
domain. The goal is to maneuver the blue ball into the red hole, while
avoiding (or using, since the ball is dynamic and collisions are
elastic) the obstacles. The dynamics of the ball and the presence of
obstacles result in a domain with sharp discontinuities, and the
location and shape of the obstacles can be specified so you can make
the domain as hard or as easy as you want.
The source code is in Java, and includes full documentation,
RL-Glue interface, and GUI
programs for editing obstacle
configurations, viewing saved trajectories, etc.