Hi, basic questions:
I am experimenting with machine learning, trying to predict computer generated images (containing only a character, centered and scaled) with a maxout neural network using the pylearn2 ready to use model. I have about 1000 40x40 pixel examples per class (and about 200 classes from different kanji characters) in the training set, generated the same way as the test set, basically the same example with some small difference in shape and noise added.
The problem I'm having is that either the net fits the training set almost perfectly in very little time, while the test set the error starts to diverge very quickly, something like 0.3% error on the training set and 90% on the test set, or I try to generate a lot more data and I don't actually know if it works or not simply because it takes too much to run on my CPU (no nvidia GPU :( )
So:
If my network is learning perfectly the training set but not the test set, could it mean that it actually lacks the power to represent the actual complex function? This behaviour seems to happen even when I have a single hidden layer with 30 hidden units.
Do I actually need a GPU to be able to experiment with these things? As you may have guessed I'm just experimenting and really have no real knowledge of this stuff, so I'd say the problem is between the desk and the chair in this case :p Intuitively I would think that I should be able to get something much better regardless of the hardware.
Related to 2, I'm not sure how I should try to search for hyperparameters. To be honest, I'm approaching it trying to change things and seeing if the situation improves, but it's probably the worst way to
[link][7 comments]