All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.TDLambda

java.lang.Object
   |
   +----sim.Experiment
           |
           +----sim.TDLambda

public class TDLambda
extends Experiment
Perform Temporal Difference learning, TD(lambda), with a given Markov Decision Process or Markov chain and function approximator. If the MDP is a Markov chain, then one can set the exploration factor to 0 and perform standard TD(lambda) for predicting the value of the states. Given an MDP then the object implements TD(lambda) such that anytime the system explores the trace is set to 0. This object has a decay factor for the exploration rate, so that one can explore extensively in the initial stages of learning and reduce the exploration rate in latter stages of learning. The derivative calculations with respect to the inputs have not been fully implemented here.

This code is (c) 1996 Mance E. Harmon <harmonme@aa.wpafb.af.mil>, http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely. If the code is modified, please state so in the comments.

Version:
1.05, 17 June 97
Author:
Mance E. Harmon

Variable Index

 o action
An action possible in the MDP
 o dEdIn
gradient of mean squared error wrt inputs
 o dEdWeights
gradient of mean squared error wrt weights
 o dEdWeightsSum
gradient of mean squared error summed for all training examples
 o dEdWeightsV1
gradient of mean squared error wrt weights of maximum advantage in successor state
 o desiredOutputs
The correct output that the function approximator learns to give
 o dt
The time step size used in transitioning from state x(t) to x(t+1)
 o error
a noisy estimate of the error being gradient descended on
 o expDecay
The exploration decay rate.
 o explore
The exploration rate
 o function
the function approximator whose weights will be trained
 o gamma
The discount factor
 o hessian
hessian of mean squared error wrt weights
 o incremental
The mode of learning: incremental or epoch-wise.
 o inputs
The input vector to the function approximator
 o lambda
The weighting factor for gradients.
 o logSmoothedError
log base 10 of the smoothed error
 o mdp
the mdp to control
 o oldState
A copy of the original state.
 o outputs
The output vector from the function approximator
 o random
The random number generator
 o rate
the learning rate, a small positive number
 o seed
the random number seed
 o smoothedError
an exponentially smoothed estimate of the error
 o smoothingFactor
the constant used to smooth the error (near 1 = long halflife)
 o state
The state of the MDP
 o tcounter
When doing epoch-wise training (not updating the weights until the end of a trajectory, this variable keeps track of the number of transitions.
 o time
current time (increments once per weight change
 o tolerance
stop learning when smoothed error < tolerance
 o trace
The weighted average of the gradients.
 o valueKnown
A flag stating whether or not we know for certain the value of a state.
 o weights
all the weights in the function approximator as a column vector

Constructor Index

 o TDLambda()

Method Index

 o BNF(int)
Return the BNF description of how to parse the parameters of this object.
 o evaluate()
return the scalar output for the current dInput vector
 o findGradient()
update the fGradient vector based on the current fInput vector
 o findHessian()
update the fHessian vector based on the current fInput vector
 o getGradient()
The gradient of f(x) with respect to x (a column vector)
 o getHessian()
The hessian of f(x) with respect to x (a square matrix)
 o getInput()
The input x sent to the function f(x) (a column vector)
 o initialize(int)
Initialize, either partially or completely.
 o parse(Parser, int)
Parse the input file to get the parameters for this object.
 o run()
This runs the simulation.
 o setWatchManager(WatchManager, String)
Register all variables with this WatchManager.
 o unparse(Unparser, int)
Output a description of this object that can be parsed with parse().

Variables

 o mdp
 protected MDP mdp
the mdp to control

 o function
 protected FunApp function
the function approximator whose weights will be trained

 o seed
 protected IntExp seed
the random number seed

 o weights
 protected MatrixD weights
all the weights in the function approximator as a column vector

 o dEdWeights
 protected MatrixD dEdWeights
gradient of mean squared error wrt weights

 o dEdWeightsSum
 protected MatrixD dEdWeightsSum
gradient of mean squared error summed for all training examples

 o dEdIn
 protected MatrixD dEdIn
gradient of mean squared error wrt inputs

 o dEdWeightsV1
 protected MatrixD dEdWeightsV1
gradient of mean squared error wrt weights of maximum advantage in successor state

 o trace
 protected MatrixD trace
The weighted average of the gradients. The weighting factor is lambda.

 o hessian
 protected MatrixD hessian
hessian of mean squared error wrt weights

 o inputs
 protected MatrixD inputs
The input vector to the function approximator

 o outputs
 protected MatrixD outputs
The output vector from the function approximator

 o state
 protected MatrixD state
The state of the MDP

 o action
 protected MatrixD action
An action possible in the MDP

 o explore
 protected NumExp explore
The exploration rate

 o expDecay
 protected NumExp expDecay
The exploration decay rate. A value of 0.9 means a half-life of approximately 7, and a value 0.99 means a half-life of approximately 70.

 o gamma
 protected NumExp gamma
The discount factor

 o desiredOutputs
 protected MatrixD desiredOutputs
The correct output that the function approximator learns to give

 o dt
 protected NumExp dt
The time step size used in transitioning from state x(t) to x(t+1)

 o oldState
 protected MatrixD oldState
A copy of the original state.

 o incremental
 protected boolean incremental
The mode of learning: incremental or epoch-wise.

 o valueKnown
 protected PBoolean valueKnown
A flag stating whether or not we know for certain the value of a state.

 o lambda
 protected NumExp lambda
The weighting factor for gradients.

 o random
 protected Random random
The random number generator

 o error
 protected PDouble error
a noisy estimate of the error being gradient descended on

 o smoothedError
 protected PDouble smoothedError
an exponentially smoothed estimate of the error

 o smoothingFactor
 protected NumExp smoothingFactor
the constant used to smooth the error (near 1 = long halflife)

 o tolerance
 protected NumExp tolerance
stop learning when smoothed error < tolerance

 o logSmoothedError
 protected PDouble logSmoothedError
log base 10 of the smoothed error

 o time
 protected PInt time
current time (increments once per weight change

 o rate
 protected NumExp rate
the learning rate, a small positive number

 o tcounter
 protected int tcounter
When doing epoch-wise training (not updating the weights until the end of a trajectory, this variable keeps track of the number of transitions.

Constructors

 o TDLambda
 public TDLambda()

Methods

 o setWatchManager
 public void setWatchManager(WatchManager wm,
                             String name)
Register all variables with this WatchManager. This will be called after all parsing is done. setWatchManager should be overridden and forced to call the same method on all the other objects in the experiment.

Overrides:
setWatchManager in class Experiment
 o BNF
 public String BNF(int lang)
Return the BNF description of how to parse the parameters of this object.

Overrides:
BNF in class Experiment
 o unparse
 public void unparse(Unparser u,
                     int lang)
Output a description of this object that can be parsed with parse().

Overrides:
unparse in class Experiment
See Also:
Parsable
 o parse
 public Object parse(Parser p,
                     int lang) throws ParserException
Parse the input file to get the parameters for this object.

Throws: ParserException
parser didn't find the required token
Overrides:
parse in class Experiment
 o run
 public void run()
This runs the simulation. The function returns when the simulation is completely done. As the simulation is running, it should call the watchManager.update() function periodically so all the display windows can be updated.

Overrides:
run in class Experiment
 o getInput
 public MatrixD getInput()
The input x sent to the function f(x) (a column vector)

 o getGradient
 public MatrixD getGradient()
The gradient of f(x) with respect to x (a column vector)

 o evaluate
 public double evaluate()
return the scalar output for the current dInput vector

 o findGradient
 public void findGradient()
update the fGradient vector based on the current fInput vector

 o getHessian
 public MatrixD getHessian()
The hessian of f(x) with respect to x (a square matrix)

 o findHessian
 public void findHessian()
update the fHessian vector based on the current fInput vector

 o initialize
 public void initialize(int level)
Initialize, either partially or completely.

Overrides:
initialize in class Experiment
See Also:
initialize

All Packages  Class Hierarchy  This Package  Previous  Next  Index