Hi Is it smart to use the cross entropy loss function when the activation function used is Relu and is unbounded?
The input is the MNIST data, so binary
It is for an auto-encoder
side note (the reason for my question):
I made two different auto-encoder networks. One with maxout units (with the shared weights) and another with maxout units (where the weights are not shared, like LWTA).
My problem is that my cost value ("mean squared error") is not close to zero when the data size is small (eg. Training set of 2 sapmles).
In order to optimize my network, I tried to make a learning curve (which is why I need the cost value of a small training set) and came across the problem.
I have tried linear, sigmoid and softplus for the output layer... and still get a big value and don't know why
[link][10 comments]