Hey everyone, I've written a recommendation engine that does some neat stuff (ILP relational OTF subgraph building) and some boring traditional stuff (w00t matts correlation, bayesian statistics). Without getting too technical, I was wondering if anyone knows a generic source for these. I already found the movieslens one, which worked, but I'd like something that people disagree about a lot more.
The problem with the movielens dataset was that a very small number of movies had a very large percentage of likes and almost everyone agreed that the good movies were really, really good. So in order to optimize for a peak matts correlation (after intelligently breaking apart the graph into a training and testing set) the types of recommendations that came out (while correct) made little sense to humans. For example, when I ignored metadata like genre, Shrek was very high up for people that liked The Matrix. That recommendation would surprise people because those movies are impossibly dissimilar, but the overlap between people that liked The Matrix was very high into liking Shrek (though not the other way around, obviously).
Anyways, I'm pretty amped about what I've built and I'm looking for more validation. Ideally the dataset would have both positive and negative signals, but it's not strictly required. I've got a couple NLP modules in here too, so don't shy away from text heavy content either.
Thanks in advance :D
[link] [3 comments]