All of this code is (c) 1996 by the respective authors, is freeware, and may be freely distributed. If modifications are made, please say so in the comments.
This illustrates gradient descent on the average of two different error functions. There are two sigmoidal, multilayer perceptrons, both with 3 inputs and two outputs. (Instead of a specific bias parameter, there is an input that is always 1.0). Each layer is fully connected to the next, and weights never connect nonadjacent layers. Both networks have two hidden layers. The first network has 5 and 4 nodes in the hidden layers closer to the the input and output respectively. The second network has 6 and 3 nodes. These two topologies happen to have exactly the same number of weights, since 3*5+5*4+4*1 = 3*6+6*3+3*1 = 39. In both networks, these weights are indexed in some arbitrary order. Both networks are trained to learn the same saddle-shaped function, but with one constraint: weights with the same index are constrained to be identical. Gradient descent is then performed on the average of two different mean-squared-error functions, to try to find a weight vector that will make both networks correct. It is perhaps surprising that it does this so well, causing both functions to look almost identical during learning. You almost have to look at the numbers on the Z axis to notice that the two functions are slightly different before they converge.
Click and drag just outside the boundary of the cube to rotate the image of the cube, or click and drag inside the the boundary of the cube to rotate the cube itself. Click and drag on the contour plot to zoom in on a smaller region.
Back to WebSim.