Sorry for a newbie question. I haven't quite understood the correct meaning of what it means to take the partial derivative of a cost function with respect to the parameters of the model, say theta. Suppose these parameters include 2 arrays of dimensions axb and cxd.
When we do stochastic gradient descent, then what will be the partial derivative of J wrt theta (let's call it x)? Mathematically this is a symbolic expression. So when I program SGD, what should be the term x.
What happens if J is a scalar, and what happens if J is a matrix? If in any case it is a matrix, what will be its dimensions?
Thanks in advance! Hope I can clarify this issue soon
[link][4 comments]