Quantcast
Viewing all articles
Browse latest Browse all 62858

Learning the structure of reddit?

Hello,

Here is an interesting problem I have been thinking about - find a similarity metric over subreddits. E.g., from my subscribed subs, /r/bicycling and /r/motorcycles are closer to each other than either is to /r/machinelearning, which, in turn, is close to /r/maths and /r/programming.

So, the problem is to find a way to scan the content in the subs (possibly only post titles and maybe text of self posts) and come up with a model that posits a distance.

This seems like an unsupervised learning problem to me, possibly using something like a bag of words model. But I suppose the models can trained in a supervised manner as well, since most subreddits have a section for related subs in the sidebar. I'm hoping if people more experienced than myself can think of promising models to apply on this problem?

Applications would be many - automatically recommending subreddits for a new post, suggesting related reddits to users and so on.

btw, I'm not planning to build it or anything right now, since I haven't really looked into which models would be best to use. Just a thought experiment for now.

Edit: changed "distance metric" to "similarity metric", since the former is a content-free phrase.

submitted by ohell
[link] [16 comments]

Viewing all articles
Browse latest Browse all 62858

Trending Articles