All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.errFun.AdvantageLearning

java.lang.Object
   |
   +----sim.errFun.ErrFun
           |
           +----sim.errFun.RLErrFun
                   |
                   +----sim.errFun.AdvantageLearning

public class AdvantageLearning
extends RLErrFun
Perform Advantage learning with a given Markov Decision Process, function approximator, and gradient-descent algorithm. The derivative calculations with respect to the inputs have not been fully implemented here. This code does work with both stochastic and deterministic systems.

This code is (c) 1996 Mance E. Harmon <harmonme@aa.wpafb.af.mil>, http://www.cs.cmu.edu/~baird/java
The source and object code may be redistributed freely. If the code is modified, please state so in the comments.

Version:
1.03, 21 July 97
Author:
Mance E. Harmon

Variable Index

 o dEdWeightsA
gradient of mean squared error wrt weights of advantage
 o dEdWeightsA0
gradient of mean squared error wrt weights of maximum advantage in original state
 o dEdWeightsA1
gradient of mean squared error wrt weights of maximum advantage in successor state
 o k
The scaling factor used in the advantage learning algorithm
 o oldAction
A copy of the original action.
 o oldState
A copy of the original state.
 o rnd
The random number generator that will be used for this object.

Constructor Index

 o AdvantageLearning()

Method Index

 o BNF(int)
Return the BNF description of how to parse the parameters of this object.
 o evaluate(Random, boolean, boolean, boolean)
return the scalar output for the current dInput vector
 o findGradient()
update the fGradient vector based on the current fInput vector
 o initVects(MDP, RLErrFun)
Create inputs, state, and action vectors.
 o parse(Parser, int)
Parse the input file to get the parameters for this object.
 o setWatchManager(WatchManager, String)
Register all variables with this WatchManager.
 o unparse(Unparser, int)
Output a description of this object that can be parsed with parse().

Variables

 o dEdWeightsA1
 protected MatrixD dEdWeightsA1
gradient of mean squared error wrt weights of maximum advantage in successor state

 o dEdWeightsA0
 protected MatrixD dEdWeightsA0
gradient of mean squared error wrt weights of maximum advantage in original state

 o dEdWeightsA
 protected MatrixD dEdWeightsA
gradient of mean squared error wrt weights of advantage

 o k
 protected NumExp k
The scaling factor used in the advantage learning algorithm

 o oldState
 protected MatrixD oldState
A copy of the original state.

 o oldAction
 protected MatrixD oldAction
A copy of the original action.

 o rnd
 protected Random rnd
The random number generator that will be used for this object. This is a copy of the generator passed to evaluate()

Constructors

 o AdvantageLearning
 public AdvantageLearning()

Methods

 o setWatchManager
 public void setWatchManager(WatchManager wm,
                             String name)
Register all variables with this WatchManager. This will be called after all parsing is done. setWatchManager should be overridden and forced to call the same method on all the other objects in the experiment.

Overrides:
setWatchManager in class ErrFun
 o BNF
 public String BNF(int lang)
Return the BNF description of how to parse the parameters of this object.

Overrides:
BNF in class ErrFun
 o unparse
 public void unparse(Unparser u,
                     int lang)
Output a description of this object that can be parsed with parse().

Overrides:
unparse in class ErrFun
See Also:
Parsable
 o parse
 public Object parse(Parser p,
                     int lang) throws ParserException
Parse the input file to get the parameters for this object.

Throws: ParserException
parser didn't find the required token
Overrides:
parse in class ErrFun
 o initVects
 public void initVects(MDP mdp,
                       RLErrFun rl)
Create inputs, state, and action vectors. Also, create any vectors that might be specific to this module.

Overrides:
initVects in class RLErrFun
 o evaluate
 public double evaluate(Random rnd,
                        boolean willFindDeriv,
                        boolean willFindHess,
                        boolean rememberNoise)
return the scalar output for the current dInput vector

Overrides:
evaluate in class ErrFun
 o findGradient
 public void findGradient()
update the fGradient vector based on the current fInput vector

Overrides:
findGradient in class ErrFun

All Packages  Class Hierarchy  This Package  Previous  Next  Index