WebSim

See the source HTML code for this page to see the simulation definition that is parsed and executed by the WebSim applet below.

All of this code is (c) 1996 by the respective authors, is freeware, and may be freely distributed. If modifications are made, please say so in the comments.

The definition of this simulation is as follows:

MDP - a linear-quadratic regulator. State space is a section of the number line from [-1,1]. An agent sits on this number line and has two actions possible: move left or move right. The act of moving left corresponds to an input to the neural network of -1. The act of moving right corresponds to an input to the neural network of 1. The state is the position on the number line. The cost function is the position on the number line squared after performing an action. The goal is to minimize the cost. There is no absorbing state.

Function Approximator: single-hidden-layer sigmoidal network with 8 nodes

Learning algorithm: Backprop

RL algorithm: Residual Gradient Advantage Learning

Displays: 1) 2D graph of log error vs. learning time, 2) 3D graph of value function.

Back to WebSim.


This Java applet requires a Java-aware browser such as Netscape 2.0 for Solaris/Win95/WinNT.