All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.errFun.ReinforcementLearning

java.lang.Object
   |
   +----sim.errFun.ErrFun
           |
           +----sim.errFun.RLErrFun
                   |
                   +----sim.errFun.ReinforcementLearning

public class ReinforcementLearning
extends RLErrFun
Used to define a reinforcement learning experiment. The parameters passed to this define the exploration policy, the funApp parameter update policy (incremental or epoch), the mdp, the function approximator, and the reinforcement learning algorithm to be used. NOTES: Use caution when training on trajectories with a low exploration factor. This could lead to very long trajectories that could cause the system to appear hung. The exploration factor is handled in this code for all cases except when 'statesOnly' is true while training on trajectories. In this case, the rlAlgorithm must use the class variable 'exploration' passed into this to implement the exploration strategy. After a policy has been learned it can be observed by setting incremental=true, trajectories=true, and exploration=0.

This code is (c) 1997 Mance E. Harmon <harmonme@aa.wpafb.af.mil>, http://www.cs.cmu.edu/~baird
The source and object code may be redistributed freely. If the code is modified, please state so in the comments.

Version:
1.02, 26 June 97
Author:
Mance E. Harmon

Variable Index

 o adaptivePhi
Flag set to true if phi is adaptive.
 o batchIndex
which of the batch elements is currently being processed.
 o directVector
The gradient associated with the direct method update.
 o gradient
the average of the gradients from all numberOfStates calls to RLerrFun
 o numberOfStates
Used to cache the number of states in the given mdp
 o resGradVector
The gradient associated with the residual gradient method update.
 o rlAlgorithm
The RL algorithm to use
 o rlAlgorithmGradient
the gradient from a single call to rlAlgorithm
 o rnd
A copy of the random number generator passed into evaluate()
 o saPairs
The number of state/action pairs in the mdp for a given dt (assuming statesOnly=false).

Constructor Index

 o ReinforcementLearning()

Method Index

 o BNF(int)
Return the BNF description of how to parse the parameters of this object.
 o calcPhi(boolean, Random)
 o evaluate(Random, boolean, boolean, boolean)
return the scalar output for the current dInput vector
 o findGradient()
update the gradient vector based on the current fInput vector.
 o getGradient()
The gradient of f(x) with respect to x (a column vector)
 o initialize(int)
Initialize, either partially or completely.
 o initVects(MDP, RLErrFun)
Used to initialize the inputs, state, and action vectors in all RL algorithm objects (not ReinforcementLearning).
 o parse(Parser, int)
Parse the input file to get the parameters for this object.
 o setWatchManager(WatchManager, String)
Register all variables with this WatchManager.
 o unparse(Unparser, int)
Output a description of this object that can be parsed with parse().

Variables

 o rlAlgorithm
 protected RLErrFun rlAlgorithm
The RL algorithm to use

 o numberOfStates
 protected int numberOfStates
Used to cache the number of states in the given mdp

 o batchIndex
 protected PInt batchIndex
which of the batch elements is currently being processed.

 o rlAlgorithmGradient
 protected MatrixD rlAlgorithmGradient
the gradient from a single call to rlAlgorithm

 o gradient
 protected MatrixD gradient
the average of the gradients from all numberOfStates calls to RLerrFun

 o saPairs
 protected int saPairs
The number of state/action pairs in the mdp for a given dt (assuming statesOnly=false).

 o directVector
 protected MatrixD directVector
The gradient associated with the direct method update. Used in calculating an adaptive phi.

 o resGradVector
 protected MatrixD resGradVector
The gradient associated with the residual gradient method update. Used in calculating and adaptive phi.

 o adaptivePhi
 protected boolean adaptivePhi
Flag set to true if phi is adaptive.

 o rnd
 protected Random rnd
A copy of the random number generator passed into evaluate()

Constructors

 o ReinforcementLearning
 public ReinforcementLearning()

Methods

 o setWatchManager
 public void setWatchManager(WatchManager wm,
                             String name)
Register all variables with this WatchManager. This will be called after all parsing is done. setWatchManager should be overridden and forced to call the same method on all the other objects in the experiment.

Overrides:
setWatchManager in class ErrFun
 o BNF
 public String BNF(int lang)
Return the BNF description of how to parse the parameters of this object.

Overrides:
BNF in class ErrFun
 o unparse
 public void unparse(Unparser u,
                     int lang)
Output a description of this object that can be parsed with parse().

Overrides:
unparse in class ErrFun
See Also:
Parsable
 o parse
 public Object parse(Parser p,
                     int lang) throws ParserException
Parse the input file to get the parameters for this object.

Throws: ParserException
parser didn't find the required token
Overrides:
parse in class ErrFun
 o evaluate
 public double evaluate(Random rnd,
                        boolean willFindDeriv,
                        boolean willFindHess,
                        boolean rememberNoise)
return the scalar output for the current dInput vector

Overrides:
evaluate in class ErrFun
 o findGradient
 public void findGradient()
update the gradient vector based on the current fInput vector. Assumes that evaluate() was already called on this vector.

Overrides:
findGradient in class ErrFun
 o getGradient
 public MatrixD getGradient()
The gradient of f(x) with respect to x (a column vector)

Overrides:
getGradient in class ErrFun
 o initVects
 public void initVects(MDP mpd,
                       RLErrFun rl)
Used to initialize the inputs, state, and action vectors in all RL algorithm objects (not ReinforcementLearning).

Overrides:
initVects in class RLErrFun
 o calcPhi
 public void calcPhi(boolean batch,
                     Random rnd)
 o initialize
 public void initialize(int level)
Initialize, either partially or completely.

Overrides:
initialize in class ErrFun
See Also:
initialize

All Packages  Class Hierarchy  This Package  Previous  Next  Index