All Packages Class Hierarchy This Package Previous Next Index
Class sim.TDLambda
java.lang.Object
|
+----sim.Experiment
|
+----sim.TDLambda
- public class TDLambda
- extends Experiment
Perform Temporal Difference learning, TD(lambda), with a given Markov Decision
Process or Markov chain and function approximator. If the MDP is a Markov chain, then one
can set the exploration factor to 0 and perform standard TD(lambda) for predicting the
value of the states. Given an MDP then the object implements TD(lambda) such that anytime
the system explores the trace is set to 0. This object has a decay factor for the
exploration rate, so that one can explore extensively in the initial stages of learning
and reduce the exploration rate in latter stages of learning. The derivative
calculations with respect to the inputs have not been fully implemented here.
This code is (c) 1996 Mance E. Harmon
<harmonme@aa.wpafb.af.mil>,
http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.05, 17 June 97
- Author:
- Mance E. Harmon
-
action
- An action possible in the MDP
-
dEdIn
- gradient of mean squared error wrt inputs
-
dEdWeights
- gradient of mean squared error wrt weights
-
dEdWeightsSum
- gradient of mean squared error summed for all training examples
-
dEdWeightsV1
- gradient of mean squared error wrt weights of maximum advantage in successor state
-
desiredOutputs
- The correct output that the function approximator learns to give
-
dt
- The time step size used in transitioning from state x(t) to x(t+1)
-
error
- a noisy estimate of the error being gradient descended on
-
expDecay
- The exploration decay rate.
-
explore
- The exploration rate
-
function
- the function approximator whose weights will be trained
-
gamma
- The discount factor
-
hessian
- hessian of mean squared error wrt weights
-
incremental
- The mode of learning: incremental or epoch-wise.
-
inputs
- The input vector to the function approximator
-
lambda
- The weighting factor for gradients.
-
logSmoothedError
- log base 10 of the smoothed error
-
mdp
- the mdp to control
-
oldState
- A copy of the original state.
-
outputs
- The output vector from the function approximator
-
random
- The random number generator
-
rate
- the learning rate, a small positive number
-
seed
- the random number seed
-
smoothedError
- an exponentially smoothed estimate of the error
-
smoothingFactor
- the constant used to smooth the error (near 1 = long halflife)
-
state
- The state of the MDP
-
tcounter
- When doing epoch-wise training (not updating the weights until the end of a trajectory,
this variable keeps track of the number of transitions.
-
time
- current time (increments once per weight change
-
tolerance
- stop learning when smoothed error < tolerance
-
trace
- The weighted average of the gradients.
-
valueKnown
- A flag stating whether or not we know for certain the value of a state.
-
weights
- all the weights in the function approximator as a column vector
-
TDLambda()
-
-
BNF(int)
- Return the BNF description of how to parse the parameters of this object.
-
evaluate()
- return the scalar output for the current dInput vector
-
findGradient()
- update the fGradient vector based on the current fInput vector
-
findHessian()
- update the fHessian vector based on the current fInput vector
-
getGradient()
- The gradient of f(x) with respect to x (a column vector)
-
getHessian()
- The hessian of f(x) with respect to x (a square matrix)
-
getInput()
- The input x sent to the function f(x) (a column vector)
-
initialize(int)
- Initialize, either partially or completely.
-
parse(Parser, int)
- Parse the input file to get the parameters for this object.
-
run()
- This runs the simulation.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
unparse(Unparser, int)
- Output a description of this object that can be parsed with parse().
mdp
protected MDP mdp
- the mdp to control
function
protected FunApp function
- the function approximator whose weights will be trained
seed
protected IntExp seed
- the random number seed
weights
protected MatrixD weights
- all the weights in the function approximator as a column vector
dEdWeights
protected MatrixD dEdWeights
- gradient of mean squared error wrt weights
dEdWeightsSum
protected MatrixD dEdWeightsSum
- gradient of mean squared error summed for all training examples
dEdIn
protected MatrixD dEdIn
- gradient of mean squared error wrt inputs
dEdWeightsV1
protected MatrixD dEdWeightsV1
- gradient of mean squared error wrt weights of maximum advantage in successor state
trace
protected MatrixD trace
- The weighted average of the gradients. The weighting factor is lambda.
hessian
protected MatrixD hessian
- hessian of mean squared error wrt weights
inputs
protected MatrixD inputs
- The input vector to the function approximator
outputs
protected MatrixD outputs
- The output vector from the function approximator
state
protected MatrixD state
- The state of the MDP
action
protected MatrixD action
- An action possible in the MDP
explore
protected NumExp explore
- The exploration rate
expDecay
protected NumExp expDecay
- The exploration decay rate. A value of 0.9 means a half-life of approximately 7, and
a value 0.99 means a half-life of approximately 70.
gamma
protected NumExp gamma
- The discount factor
desiredOutputs
protected MatrixD desiredOutputs
- The correct output that the function approximator learns to give
dt
protected NumExp dt
- The time step size used in transitioning from state x(t) to x(t+1)
oldState
protected MatrixD oldState
- A copy of the original state.
incremental
protected boolean incremental
- The mode of learning: incremental or epoch-wise.
valueKnown
protected PBoolean valueKnown
- A flag stating whether or not we know for certain the value of a state.
lambda
protected NumExp lambda
- The weighting factor for gradients.
random
protected Random random
- The random number generator
error
protected PDouble error
- a noisy estimate of the error being gradient descended on
smoothedError
protected PDouble smoothedError
- an exponentially smoothed estimate of the error
smoothingFactor
protected NumExp smoothingFactor
- the constant used to smooth the error (near 1 = long halflife)
tolerance
protected NumExp tolerance
- stop learning when smoothed error < tolerance
logSmoothedError
protected PDouble logSmoothedError
- log base 10 of the smoothed error
time
protected PInt time
- current time (increments once per weight change
rate
protected NumExp rate
- the learning rate, a small positive number
tcounter
protected int tcounter
- When doing epoch-wise training (not updating the weights until the end of a trajectory,
this variable keeps track of the number of transitions.
TDLambda
public TDLambda()
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
This will be called after all parsing is done.
setWatchManager should be overridden and forced to
call the same method on all the other objects in the experiment.
- Overrides:
- setWatchManager in class Experiment
BNF
public String BNF(int lang)
- Return the BNF description of how to parse the parameters of this object.
- Overrides:
- BNF in class Experiment
unparse
public void unparse(Unparser u,
int lang)
- Output a description of this object that can be parsed with parse().
- Overrides:
- unparse in class Experiment
- See Also:
- Parsable
parse
public Object parse(Parser p,
int lang) throws ParserException
- Parse the input file to get the parameters for this object.
- Throws: ParserException
- parser didn't find the required token
- Overrides:
- parse in class Experiment
run
public void run()
- This runs the simulation. The function returns when the simulation
is completely done. As the simulation is running, it should call
the watchManager.update() function periodically so all the display
windows can be updated.
- Overrides:
- run in class Experiment
getInput
public MatrixD getInput()
- The input x sent to the function f(x) (a column vector)
getGradient
public MatrixD getGradient()
- The gradient of f(x) with respect to x (a column vector)
evaluate
public double evaluate()
- return the scalar output for the current dInput vector
findGradient
public void findGradient()
- update the fGradient vector based on the current fInput vector
getHessian
public MatrixD getHessian()
- The hessian of f(x) with respect to x (a square matrix)
findHessian
public void findHessian()
- update the fHessian vector based on the current fInput vector
initialize
public void initialize(int level)
- Initialize, either partially or completely.
- Overrides:
- initialize in class Experiment
- See Also:
- initialize
All Packages Class Hierarchy This Package Previous Next Index