Hi
I made two different auto-encoder networks. One with maxout units (with the shared weights) and another maxout units (where the weights are not shared, like LWTA).
My problem is that my cost value ("mean squared error") is not close to zero when the data size is small (eg. Training set of 2 sapmles).
In order to optimize my network, I tried to make a learning curve (which is why I need the cost value of a small training set) and came across the problem.
I have tried linear, sigmoid and softplus for the output layer... and still get a big value and don't know why
[link][comment]