I asked a question recently regarding the right way to code a backpropagation network with three input units, four hidden units, and one output unit (all units logistic).
I got this working and then changed it (required by prof) to a 3 input unit, 2 hidden unit, 1 output unit network with backprop. Now I need to implement momentum to make the code better.
I have the following functions, but when I run them, I get a terrible total cost. Can anyone tell me what I'm doing wrong?
The code is written in python and implements the numpy module
def backProp4(a,b,g,w1,w2,dw,dv): #a - learning rate #b - see conditionals below #w1 - the initial matrix of input-hidden unit weights #w2 - the initial matrix of hidden-output unit weights #four training cases XX=numpy.asarray([[1.0,0.0,0.0],[1.0,0.0,1.0],[1.0,1.0,0.0],[1.0,1.0,1.0]]) X=XX[(b-1),:] X=res1(X) #calculating the input of each hidden unit #Zin contains the total input in each of the four hidden units Z = numpy.dot(X,w1) Z=Z[0] Zin = numpy.random.rand(1,2) Zin=Zin[0] for i in range(2): Zin[i]=Z[i] #sigm(Z) is vector-wise application of the function y = 1./(1+exp(-x)) #input of the output unit Yin and final output Yout are calculated Zout = numpy.transpose(sigmv(Z)) Yin = numpy.dot(Zout,w2) Yout = sigm(Yin) #t is the matrix containing boolean values, for X1, output value should #be 0 (false) and so on. This is consistent with "exclusive or" t = numpy.random.rand(4,) t[0] = 0 t[1] = 1 t[2] = 1 t[3] = 0 #here the backprop alogrithm is started #sigp is the derivative of the sigmoid function, y=sigm(x)*(1-sigm(x)) delw=numpy.random.randn(2,1) d = (t[(b-1)] - Yout)*sigp(Yin) for i in range(2): delw[i] = -g*d+a*dw[i] #calculating change in the input-hidden unit weights delv=numpy.random.randn(3,2) for i in range(3): for j in range(2): delv[i,j]=-g*d+a*dv[i,j] #updating hidden-output unit weights for i in range(2): w2[i] = w2[i]+delw[i] #updating input-hidden unit weights for i in range(3): for j in range(2): w1[i,j]=w1[i,j]+delv[i,j] W1=w1 W2=w2 z=numpy.dot(X,W1) a=numpy.dot((numpy.transpose(sigmv(z[0]))),W2) c=sigm(a) cost=(1.0/2.0)*(c-t[(b-1)])**2 return (w1,w2,c,cost,delw,delv) def res1(x): n = x.shape y = numpy.random.rand(n[0],1) for i in range(n[0]): y[i]=x[i] return numpy.transpose(y)
this is the function by which I run the previous function
import numpy def epoch4(max_epoch): W1 = numpy.random.randn(3,2) W2 = numpy.random.randn(2,1) for epoch in range(1,(max_epoch+1)): total_cost = 0 for idx in range(1,5): f=backProp2(0.5,idx,W1,W2) delw=f[4] delv=f[5] n=backProp4(0.5,idx,0.5, W1, W2,delw,delv) W1=n[0] W2=n[1] #delw=n[4] #delv=n[5] c=n[2] #total_cost = total_cost + cost print c print (total_cost/4)
[link][comment]