All Packages Class Hierarchy This Package Previous Next Index
Class sim.errFun.ReinforcementLearning
java.lang.Object
|
+----sim.errFun.ErrFun
|
+----sim.errFun.RLErrFun
|
+----sim.errFun.ReinforcementLearning
- public class ReinforcementLearning
- extends RLErrFun
Used to define a reinforcement learning experiment. The parameters passed to this define
the exploration policy, the funApp parameter update policy (incremental or epoch), the mdp,
the function approximator, and the reinforcement learning algorithm to be used.
NOTES: Use caution when training on trajectories with a low exploration factor. This could lead to
very long trajectories that could cause the system to appear hung.
The exploration factor is handled in this code for all cases except when 'statesOnly' is true while training on
trajectories. In this case, the rlAlgorithm must use the class variable 'exploration' passed into this to implement
the exploration strategy.
After a policy has been learned it can be observed by setting incremental=true, trajectories=true, and exploration=0.
This code is (c) 1997 Mance E. Harmon
<harmonme@aa.wpafb.af.mil>,
http://www.cs.cmu.edu/~baird
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.02, 26 June 97
- Author:
- Mance E. Harmon
-
adaptivePhi
- Flag set to true if phi is adaptive.
-
batchIndex
- which of the batch elements is currently being processed.
-
directVector
- The gradient associated with the direct method update.
-
gradient
- the average of the gradients from all numberOfStates calls to RLerrFun
-
numberOfStates
- Used to cache the number of states in the given mdp
-
resGradVector
- The gradient associated with the residual gradient method update.
-
rlAlgorithm
- The RL algorithm to use
-
rlAlgorithmGradient
- the gradient from a single call to rlAlgorithm
-
rnd
- A copy of the random number generator passed into evaluate()
-
saPairs
- The number of state/action pairs in the mdp for a given dt (assuming statesOnly=false).
-
ReinforcementLearning()
-
-
BNF(int)
- Return the BNF description of how to parse the parameters of this object.
-
calcPhi(boolean, Random)
-
-
evaluate(Random, boolean, boolean, boolean)
- return the scalar output for the current dInput vector
-
findGradient()
- update the gradient vector based on the current fInput vector.
-
getGradient()
- The gradient of f(x) with respect to x (a column vector)
-
initialize(int)
- Initialize, either partially or completely.
-
initVects(MDP, RLErrFun)
- Used to initialize the inputs, state, and action vectors in all RL algorithm objects (not ReinforcementLearning).
-
parse(Parser, int)
- Parse the input file to get the parameters for this object.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
unparse(Unparser, int)
- Output a description of this object that can be parsed with parse().
rlAlgorithm
protected RLErrFun rlAlgorithm
- The RL algorithm to use
numberOfStates
protected int numberOfStates
- Used to cache the number of states in the given mdp
batchIndex
protected PInt batchIndex
- which of the batch elements is currently being processed.
rlAlgorithmGradient
protected MatrixD rlAlgorithmGradient
- the gradient from a single call to rlAlgorithm
gradient
protected MatrixD gradient
- the average of the gradients from all numberOfStates calls to RLerrFun
saPairs
protected int saPairs
- The number of state/action pairs in the mdp for a given dt (assuming statesOnly=false).
directVector
protected MatrixD directVector
- The gradient associated with the direct method update. Used in calculating an adaptive phi.
resGradVector
protected MatrixD resGradVector
- The gradient associated with the residual gradient method update. Used in calculating and adaptive phi.
adaptivePhi
protected boolean adaptivePhi
- Flag set to true if phi is adaptive.
rnd
protected Random rnd
- A copy of the random number generator passed into evaluate()
ReinforcementLearning
public ReinforcementLearning()
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
This will be called after all parsing is done.
setWatchManager should be overridden and forced to
call the same method on all the other objects in the experiment.
- Overrides:
- setWatchManager in class ErrFun
BNF
public String BNF(int lang)
- Return the BNF description of how to parse the parameters of this object.
- Overrides:
- BNF in class ErrFun
unparse
public void unparse(Unparser u,
int lang)
- Output a description of this object that can be parsed with parse().
- Overrides:
- unparse in class ErrFun
- See Also:
- Parsable
parse
public Object parse(Parser p,
int lang) throws ParserException
- Parse the input file to get the parameters for this object.
- Throws: ParserException
- parser didn't find the required token
- Overrides:
- parse in class ErrFun
evaluate
public double evaluate(Random rnd,
boolean willFindDeriv,
boolean willFindHess,
boolean rememberNoise)
- return the scalar output for the current dInput vector
- Overrides:
- evaluate in class ErrFun
findGradient
public void findGradient()
- update the gradient vector based on the current fInput vector.
Assumes that evaluate() was already called on this vector.
- Overrides:
- findGradient in class ErrFun
getGradient
public MatrixD getGradient()
- The gradient of f(x) with respect to x (a column vector)
- Overrides:
- getGradient in class ErrFun
initVects
public void initVects(MDP mpd,
RLErrFun rl)
- Used to initialize the inputs, state, and action vectors in all RL algorithm objects (not ReinforcementLearning).
- Overrides:
- initVects in class RLErrFun
calcPhi
public void calcPhi(boolean batch,
Random rnd)
initialize
public void initialize(int level)
- Initialize, either partially or completely.
- Overrides:
- initialize in class ErrFun
- See Also:
- initialize
All Packages Class Hierarchy This Package Previous Next Index