Hello,
I'm having some problems with my backpropagation implementation. Background: I have feature vectors (inputs) of 40 elements each. I have only 1 output value which is to be a score from 0 to 100 (per feature vector). So I have 40 input nodes, 20 hidden nodes, and 1 output node.
My feature vectors have values from 0 to +1300, so when I take my weight matrix (initialized uniformly with the value 0.5) and multiply it by a layer's output, then take the sigmoid (activation function), I get numbers that saturate to 1. So my network doesn't learn. I've since tried dividing all feature numbers by 1000 to get over this, but I'm unsure if this is correct.
In terms of actually using the network: do I just save my trained-up weight matrices, and then do the feed-forward part of the algorithm with new data I wish to score? One last question - a recommended learning rate? From gradient descent I've seen people use 0.01 and the like, but some online examples of backprop I saw them use 0.9 and I don't know why.
I can post my MATLAB code if it will help. Thanks.
[link][4 comments]