Hi,
This is a very basic problem, which I am not sure I am thinking in the right way. Suppose I define a autoencoder with a function f which calculates reconstruction error of one example. However, the error function (say g) I want is mean of reconstruction errors of m (where m might be 20,30, basically more than 1) examples. Now if I write a function grad(f,w), can directly averaging 10 instances of this (weights staying constant) lead to grad(g,w) ? Gradient appears to be a linear function , wikipedia:"The gradient is linear in the sense that if f and g are two real-valued functions differentiable at the point a ∈ Rn, and α and β are two constants, then αf + βg is differentiable at a, and moreover Grad (alpha * f+ beta * g)(a) = alpha Grad f(a) + beta * Grad g (a). " However, I am still not sure if this would be right. Please help.
[link][2 comments]