I'm trying to implement this for a ratings dataset that's similar to Netflix, and I've gotten it to "work", in that the training error is driven toward 0 after each iteration / more features are added.
My question is, how do I determine how well this will predict a test dataset (of users with a handful of ratings that weren't used for training)? Funk glosses over that part and it's really confusing me.
My thought process: you start with the following matrices:
(1) User Feature Matrix (Users x Num_Features)
(2) Movies Feature Matrix (Movies x Num_Features)
(3) Ratings Matrix (Movies x Users)
And so the only way to update the first two matrices is by updating them iteration by iteration in the algorithm. I don't think you can train those matrices, AND THEN introduce a completely new user afterwards and estimate what his feature matrix looks like.
What I've tried now is removing 20% of the ratings from the Rating Matrix, moving them into a Test Matrix, and filling in all the empty cells in both with 0's (which are ignored in the cost function). It's not working.
[link][3 comments]