WebSim


All of this code is (c) 1996 by the respective authors, is freeware, and may be freely distributed. If modifications are made, please say so in the comments.


This page requires a Java-aware browser and may take several minutes to download. Please be patient.

Simulation Definition:

See the source HTML code for this page to see the simulation definition that is parsed and executed by the WebSim applet.

MDP:
a linear-quadratic regulator. State space is a section of the number line from [-1,1]. An imaginary cart sits on this number line and has two actions possible: move left or move right. The act of moving left corresponds to an input to the neural network of -1. The act of moving right corresponds to an input to the neural network of 1. The state is the position on the number line. The cost function is the position on the number line squared after performing an action. The goal is to minimize the cost. There is no absorbing state.

Function Approximator: a single-hidden-layer sigmoidal network with 8 nodes in the hidden layer.

Learning algorithm: Backprop

RL algorithm: Residual Gradient QLearning

Displays:
1) Variables and Rates (upper left corner)
2) 2D graph of log error vs. learning time (upper right corner)
3) 3D graph of value function (lower left corner)
4) 3D graph of policy (lower right corner)

The 3D graphs can be rotated on two different axis by clicking and dragging inside or outside of the box.

Value Function Display: After learning, the value function will look like a "U". Remember that the value of a state is the maximum Q-value in the given state. Also, the definition of a Q-value is the sum of the reinforcements recieved when performing the corresponding action followed by optimal policy thereafter. The X-axis corresponds to state space. The Z-axis (height) is the value in each state. The Y-axis (depth) has no meaning.

Policy Display: The policy for this system is clear. When the "cart" is left of 0, the RL system should perform action 1. If the "cart" is right of 0 the RL system should perform action -1. The X-axis corresponds to state space. The Y-axis is the policy in each state. The Z-axis has no meaning.

Back to Tutorial.


This Java applet requires a Java-aware browser such as Netscape 2.0 for Solaris/Win95/WinNT.

*****************/ ">
This Java applet requires a Java-aware browser such as Netscape 2.0 for Solaris/Win95/WinNT.