All Packages Class Hierarchy This Package Previous Next Index
Class sim.errFun.AdvantageLearning
java.lang.Object
|
+----sim.errFun.ErrFun
|
+----sim.errFun.RLErrFun
|
+----sim.errFun.AdvantageLearning
- public class AdvantageLearning
- extends RLErrFun
Perform Advantage learning with a given Markov Decision
Process, function approximator, and gradient-descent algorithm. The derivative
calculations with respect to the inputs have not been fully implemented here. This code
does work with both stochastic and deterministic systems.
This code is (c) 1996 Mance E. Harmon
<harmonme@aa.wpafb.af.mil>,
http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely.
If the code is modified, please state so in the comments.
- Version:
- 1.03, 21 July 97
- Author:
- Mance E. Harmon
-
dEdWeightsA
- gradient of mean squared error wrt weights of advantage
-
dEdWeightsA0
- gradient of mean squared error wrt weights of maximum advantage in original state
-
dEdWeightsA1
- gradient of mean squared error wrt weights of maximum advantage in successor state
-
k
- The scaling factor used in the advantage learning algorithm
-
oldAction
- A copy of the original action.
-
oldState
- A copy of the original state.
-
rnd
- The random number generator that will be used for this object.
-
AdvantageLearning()
-
-
BNF(int)
- Return the BNF description of how to parse the parameters of this object.
-
evaluate(Random, boolean, boolean, boolean)
- return the scalar output for the current dInput vector
-
findGradient()
- update the fGradient vector based on the current fInput vector
-
initVects(MDP, RLErrFun)
- Create inputs, state, and action vectors.
-
parse(Parser, int)
- Parse the input file to get the parameters for this object.
-
setWatchManager(WatchManager, String)
- Register all variables with this WatchManager.
-
unparse(Unparser, int)
- Output a description of this object that can be parsed with parse().
dEdWeightsA1
protected MatrixD dEdWeightsA1
- gradient of mean squared error wrt weights of maximum advantage in successor state
dEdWeightsA0
protected MatrixD dEdWeightsA0
- gradient of mean squared error wrt weights of maximum advantage in original state
dEdWeightsA
protected MatrixD dEdWeightsA
- gradient of mean squared error wrt weights of advantage
k
protected NumExp k
- The scaling factor used in the advantage learning algorithm
oldState
protected MatrixD oldState
- A copy of the original state.
oldAction
protected MatrixD oldAction
- A copy of the original action.
rnd
protected Random rnd
- The random number generator that will be used for this object. This is a copy of the generator passed to evaluate()
AdvantageLearning
public AdvantageLearning()
setWatchManager
public void setWatchManager(WatchManager wm,
String name)
- Register all variables with this WatchManager.
This will be called after all parsing is done.
setWatchManager should be overridden and forced to
call the same method on all the other objects in the experiment.
- Overrides:
- setWatchManager in class ErrFun
BNF
public String BNF(int lang)
- Return the BNF description of how to parse the parameters of this object.
- Overrides:
- BNF in class ErrFun
unparse
public void unparse(Unparser u,
int lang)
- Output a description of this object that can be parsed with parse().
- Overrides:
- unparse in class ErrFun
- See Also:
- Parsable
parse
public Object parse(Parser p,
int lang) throws ParserException
- Parse the input file to get the parameters for this object.
- Throws: ParserException
- parser didn't find the required token
- Overrides:
- parse in class ErrFun
initVects
public void initVects(MDP mdp,
RLErrFun rl)
- Create inputs, state, and action vectors. Also, create any vectors that might be specific to this module.
- Overrides:
- initVects in class RLErrFun
evaluate
public double evaluate(Random rnd,
boolean willFindDeriv,
boolean willFindHess,
boolean rememberNoise)
- return the scalar output for the current dInput vector
- Overrides:
- evaluate in class ErrFun
findGradient
public void findGradient()
- update the fGradient vector based on the current fInput vector
- Overrides:
- findGradient in class ErrFun
All Packages Class Hierarchy This Package Previous Next Index