Hello,
Here is an interesting problem I have been thinking about - find a similarity metric over subreddits. E.g., from my subscribed subs, /r/bicycling and /r/motorcycles are closer to each other than either is to /r/machinelearning, which, in turn, is close to /r/maths and /r/programming.
So, the problem is to find a way to scan the content in the subs (possibly only post titles and maybe text of self posts) and come up with a model that posits a distance.
This seems like an unsupervised learning problem to me, possibly using something like a bag of words model. But I suppose the models can trained in a supervised manner as well, since most subreddits have a section for related subs in the sidebar. I'm hoping if people more experienced than myself can think of promising models to apply on this problem?
Applications would be many - automatically recommending subreddits for a new post, suggesting related reddits to users and so on.
btw, I'm not planning to build it or anything right now, since I haven't really looked into which models would be best to use. Just a thought experiment for now.
Edit: changed "distance metric" to "similarity metric", since the former is a content-free phrase.
[link] [16 comments]