All of this code is (c) 1996 by the respective authors, is freeware, and may be freely distributed. If modifications are made, please say so in the comments.
The definition of this simulation is as follows:
MDP - a linear-quadratic regulator. State space is a section of the number line from [-1,1]. An agent
sits on this number line and has two actions possible: move left or move right. The act of moving left
corresponds to an input to the neural network of -1. The act of moving right corresponds to an input
to the neural network of 1. The state is the position on the number line. The cost function is the
position on the number line squared after performing an action. The goal is to minimize the cost. There
is no absorbing state.
Function Approximator: single-hidden-layer sigmoidal network with 8 nodes
Learning algorithm: Backprop
RL algorithm: Residual Gradient Advantage Learning
Displays: 1) 2D graph of log error vs. learning time, 2) 3D graph of value function.
Back to WebSim.