I feel like this might be a stupid question, but I've been doing a bit of reading and while I understand the design of convolutional neural networks, I'm curious as to the objective function in regards to training them using gradient descent (or stochastic gradient descent).
Say I feed my input image into the neural network, given 3 classes, the output layer would have 3 nodes. How do I get the objective function from this? It seems glossed over in all the papers I read.
For example, say my three outputs are:
output_layer[0] = 0.853 output_layer[1] = 0.221 output_layer[2] = -0.19
And the image is actually of class 1. Would that just be a 0 for the objective function? Do I just sum up 0/1s over all images depending on if the highest output was for the correct class?
It seems like binary values here would be quite bad in terms of actually training something using gradient descent, as the search space would have a lot of flat areas. I feel like I'm missing some some common knowledge here, but I haven't come across what's actually done at this step anywhere.
[link][5 comments]