All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class sim.mdp.GridWorld

java.lang.Object
   |
   +----sim.mdp.MDP
           |
           +----sim.mdp.GridWorld

public class GridWorld
extends MDP
A Markov Decision Process or Markov Game that takes a state and action and returns a new state and a reinforcement. It can be either deterministic or nondeterministic. If the next state is fed back in as the state, it can run a simulation. If the state is repeatedly randomized, it can be used for learning with random transitions.

This code is (c) 1996 Leemon Baird and Mance Harmon <leemon@cs.cmu.edu>, http://www.cs.cmu.edu/~baird
The source and object code may be redistributed freely. If the code is modified, please state so in the comments.

Version:
1.04, 25 June 97
Author:
Mance Harmon

Variable Index

 o action
an action vector (created in parse())
 o count1
Counters used in epoch-wise training
 o count2
 o granFactor
The depth in both the x and y dimension of the gridworld.
 o nextState
the state vector resulting from doing action in state (created in parse())
 o random
The random number generator
 o state
a state vector (created in parse())
 o watchManager
the WatchManager that variables here may be registered with

Constructor Index

 o GridWorld()

Method Index

 o actionSize()
Return the number of elements in the action vector.
 o BNF(int)
 o findValAct(MatrixD, MatrixD, FunApp, MatrixD, PBoolean)
Find the value and best action of this state.
 o findValue(MatrixD, MatrixD, PDouble, FunApp, PDouble, MatrixD, PDouble, PBoolean, NumExp, Random)
Find the max over action for where V(x') is the value of the successor state given state x, R is the reinforcement, gamma is the discount factor.
 o getAction(MatrixD, MatrixD, Random)
Return the next action possible in a state given the last action performed.
 o getState(MatrixD, PDouble, Random)
Return the next state to be used for training in an epoch-wise system.
 o getWatchManager()
Return the WatchManager set by setWatchManager().
 o initialAction(MatrixD, MatrixD, Random)
Return the initial action possible in a state.
 o initialState(MatrixD, Random)
Return an initial state used for the start of epoch-wise training or for training on trajectories.
 o nextState(MatrixD, MatrixD, MatrixD, PDouble, PBoolean, Random)
Find a next state given a state and action, and return the reinforcement received.
 o numActions(MatrixD)
Return the number of actions in a given state.
 o numPairs(PDouble)
Return the number of state/action pairs in the MDP for a given dt.
 o numStates(PDouble)
The number of states for this MDP is determined by the granularity factor that is passed in as a parameter.
 o parse(Parser, int)
Parse the input file to get the parameters for this object.
 o randomAction(MatrixD, MatrixD, Random)
Generates a random action from those possible.
 o randomState(MatrixD, Random)
Generates a random state from those possible and returns it in the vector passed in.
 o setWatchManager(WatchManager, String)
Register all variables with this WatchManager.
 o stateSize()
Return the number of elements in the state vector.
 o unparse(Unparser, int)
Output a description of this object that can be parsed with parse().

Variables

 o watchManager
 protected WatchManager watchManager
the WatchManager that variables here may be registered with

 o state
 protected MatrixD state
a state vector (created in parse())

 o action
 protected MatrixD action
an action vector (created in parse())

 o nextState
 protected MatrixD nextState
the state vector resulting from doing action in state (created in parse())

 o random
 protected Random random
The random number generator

 o granFactor
 protected IntExp granFactor
The depth in both the x and y dimension of the gridworld.

 o count1
 protected int count1
Counters used in epoch-wise training

 o count2
 protected int count2

Constructors

 o GridWorld
 public GridWorld()

Methods

 o setWatchManager
 public void setWatchManager(WatchManager wm,
                             String name)
Register all variables with this WatchManager. Override this if there are internal variables that should be registered here.

Overrides:
setWatchManager in class MDP
 o getWatchManager
 public WatchManager getWatchManager()
Return the WatchManager set by setWatchManager().

Overrides:
getWatchManager in class MDP
 o numStates
 public int numStates(PDouble dt)
The number of states for this MDP is determined by the granularity factor that is passed in as a parameter. A granularity of 10 would produce a state space containing 121 states: sqr(granularity+1)

Overrides:
numStates in class MDP
 o stateSize
 public int stateSize()
Return the number of elements in the state vector. In this case the state is a point (x,y) in a 2D Euclidean space.

Overrides:
stateSize in class MDP
 o initialState
 public void initialState(MatrixD state,
                          Random random) throws MatrixException
Return an initial state used for the start of epoch-wise training or for training on trajectories. The start state for this MDP is the lower left corner of the 2D grid (0,0).

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
initialState in class MDP
 o getState
 public void getState(MatrixD state,
                      PDouble dt,
                      Random random) throws MatrixException
Return the next state to be used for training in an epoch-wise system. This method is different than nextState() in that nextState() returns the state transitioned to as a function of the dynamics of the system. This object simply returns another state to be trained upon when performing epoch-wise training.

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
getState in class MDP
 o actionSize
 public int actionSize()
Return the number of elements in the action vector. The action vector is of length 1 and has 4 possible values: 0 - East, 0.25 - North, 0.5 - West, 0.75 - South.

Overrides:
actionSize in class MDP
 o initialAction
 public void initialAction(MatrixD state,
                           MatrixD action,
                           Random random) throws MatrixException
Return the initial action possible in a state. This method is used when one has to iterate over all possible actions in a given state. Given a state, this method should return the initial action possible in the given state.

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
initialAction in class MDP
 o getAction
 public void getAction(MatrixD state,
                       MatrixD action,
                       Random random) throws MatrixException
Return the next action possible in a state given the last action performed. This performs the same function as that of getState() in the sense that this serves as an iterator over actions instead of states.

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
getAction in class MDP
 o numActions
 public int numActions(MatrixD state)
Return the number of actions in a given state. For this MDP this number is constant for all states. There are 4 actions possible in each state: 0 - East, 0.25 - North, 0.5 - West, 0.75 - South.

Overrides:
numActions in class MDP
 o numPairs
 public int numPairs(PDouble dt)
Return the number of state/action pairs in the MDP for a given dt. This is used for epoch-wise training. An epoch would consist of all state/action pairs for a given MDP and is a function of the step size dt. For this MDP we have a continuum of state/action pairs because we have a continuum of states. The value returned from this method will be the pseudo-epoch size passed in to this MDP is the parameter called epochSize.

Overrides:
numPairs in class MDP
 o randomAction
 public void randomAction(MatrixD state,
                          MatrixD action,
                          Random random) throws MatrixException
Generates a random action from those possible. Accepts a state and passes back an action.

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
randomAction in class MDP
 o randomState
 public void randomState(MatrixD state,
                         Random random) throws MatrixException
Generates a random state from those possible and returns it in the vector passed in. This returns a vector of length 2. Each element is in the range [0,1].

Throws: MatrixException
Vector passed in was wrong length.
Overrides:
randomState in class MDP
 o nextState
 public double nextState(MatrixD state,
                         MatrixD action,
                         MatrixD newState,
                         PDouble dt,
                         PBoolean valueKnown,
                         Random random) throws MatrixException
Find a next state given a state and action, and return the reinforcement received. All 3 should be vectors (single-column matrices). The duration of the time step, dt, is also returned. Most MDPs will generally make this a constant, given in the parsed string. The goal state is the upper right corner of the grid world (x>1-dt, y>1-dt).

Throws: MatrixException
if sizes aren't right.
Overrides:
nextState in class MDP
 o findValAct
 public double findValAct(MatrixD state,
                          MatrixD action,
                          FunApp f,
                          MatrixD outputs,
                          PBoolean valueKnown) throws MatrixException
Find the value and best action of this state. This returns the value of a given state as a double. This also destroys the action that is passed in by replacing it with the best action. This method always returns a value that is a function of state/action pairs. The value associated with these state/action pairs might be Q-values or advantages, but it is not important to know which learning algorithm is being used. This method should simply find the min or max value as a function of the state/action pairs in the given state. For example, if Q-learning is the learning algorithm, then one would find the max Q-value for the given state and return that value. The action associated with that Q-value would be passed back. The state/action pair with the max Q-value should be evaluated last so that findGradients() can be called from within the learning algorithm without having to call function.evaluate().

Throws: MatrixException
column vectors are wrong size or shape
Overrides:
findValAct in class MDP
 o findValue
 public double findValue(MatrixD state,
                         MatrixD action,
                         PDouble gamma,
                         FunApp f,
                         PDouble dt,
                         MatrixD outputs,
                         PDouble reinforcement,
                         PBoolean valueKnown,
                         NumExp explorationFactor,
                         Random random) throws MatrixException
Find the max over action for where V(x') is the value of the successor state given state x, R is the reinforcement, gamma is the discount factor. This method is used in the object ValIteration (value iteration). The max value over actions () is returned. The state associated with the optimal action is return 1-explorationFactor percent of the time. Otherwise, a random next state is returned. The next state is passed back in state.

Throws: MatrixException
column vectors are wrong size or shape
Overrides:
findValue in class MDP
 o BNF
 public String BNF(int lang)
Overrides:
BNF in class MDP
 o unparse
 public void unparse(Unparser u,
                     int lang)
Output a description of this object that can be parsed with parse(). Also creates the state/action/nextState vectors

Overrides:
unparse in class MDP
See Also:
Parsable
 o parse
 public Object parse(Parser p,
                     int lang) throws ParserException
Parse the input file to get the parameters for this object.

Throws: ParserException
parser didn't find the required token
Overrides:
parse in class MDP

All Packages  Class Hierarchy  This Package  Previous  Next  Index