Most of my third year (3 years in the UK for a BA) electives were in the field of statistical analysis and model testing and Im pretty comfortable with mathematics and python.
I just can't find an "easy" in. I decided to put this in /r/machinelearning rather than /r/statistics because I think thats where the problem lies; Until looking through some of the kaggle interviews I have never heard the expression "random forest" which seems key to some of the entries. Wikipedia didn't help. Before then I had thought about Monte Carlo (obviously I know of it because its most relevant to physics) for some of the kaggle competitions but no interview mentions this as a relevant winning strategy probably because the learning and testing datasets are small.
I already have some experience with C and python with little to none in mathematica or R. Would you recommend a change to either of the latter? Or just struggle through with python? Most of my university experience with stats was done in Excel with the built in tools and a few custom sheets provided by the lecturer.
A book Ive managed to look at is "Data Mining Third Edition by Morgan Kaufman" and it doesnt seem to be very relevant or conducive to get up and running quickly; I am under no illusions about winning any competition but I do want to at least try some of them as a learning exercise.
TL:DR Can you recommend any books or websites that allow a "quick and dirty" introduction to kaggle related concepts such as random forests?
[link] [15 comments]