The expected (squared) prediction error is
EPE(f) = E(Y - f(X))2
Now suppose that f(x) = xT b. The book (EoSL) says that by substituting and differentiating, we end up with
b = [E(XXT )]-1 E(XY)
How? Here's what I get:
E[(Y-XT b)2 ] = E[Y2 + (XT b)2 - 2YXT b]
By differentiating w.r.t. b I obtain
2E[(XT b)X] - 2E[YX]
If the first term were 2E[XXT b] I'd end up with the expression of the book, but it isn't! Any idea?
[link][2 comments]